On Wed, Sep 18, 2013 at 2:24 AM, Guest, Simon <Simon.Guest@agresearch.co.nz> wrote:
Hi Bjoern,
I can see man years of effort being spent on solving this problem within Galaxy. I was going to title this email "Danger, Will Robinson", but I didn't want to be disrespectful. I think the path being embarked upon, tool dependency packaging, tool versioning, reproducibility, and long term archive of source tarballs is going to lead inevitably to creation of a new Linux distribution, which I guess will be called Galaxy Linux.
It is potentially broader than that - some people are trying to cover Mac OS X as well, and there are already Galaxy installations which send jobs to Windows machines but that isn't something the Tool Shed currently tackles or aims to tackle (as far as I know).
The packaging and archival you are talking about is exactly the service provided by a Linux distribution. There's well established infrastructure to handle this, and years of experience have gone into solving the problems well. Surely the number of Linux distributions in the world now exceeds 100, but I don't see that the world will become a better place if we increase that number by one more.
Of course, but that isn't really what the Galaxy team want either.
We at AgResearch can't be alone in having to pick a Linux distribution to run from the short list supported by our hardware vendor. I can't see Galaxy Linux being on that list anytime soon. So we have to make Galaxy run on the particular distribution we have here. For us that's CentOS 6.
We are also using CentOS, which for a while was dictated by our IT department, but I think things are more flexible now. Given most (non-cloud) Galaxy installations will be connected to pre-existing clusters, rarely will the Galaxy administrators be in a position to dictate which flavour of Linux the cluster or grid should run. i.e. Galaxy can't pick on Linux distribution as the only supported platform.
Now, I see scary mention of platform independence as a goal for Galaxy packaging, which I interpret as "will run on any Linux distribution". I think that's essentially infeasible. All you can do is write install scripts which you hope are portable (by following as many best practices as you know about), and then work patiently with users on strange platforms, to adapt each install script to work on that platform also. I think this is not a good use of anyone's time.
In general I agree it is an open ended problem, and I have spent more of my time than I expected on this. However, in many cases is it quite feasible - where the authors of the tool being wrapped for Galaxy already provide neutral Linux binaries which should work on any recent distribution, or use a standard configure/make system for compiling with only 'core' header files needed.
How many Linux distributions do the Galaxy community actually care about today? The RHEL family is surely important, as is Ubuntu LTS. Anything else? I'd be quite interested to understand this, as it provides a context for the discussion, and ensures we're not just solving a hypothetical problem.
If you broaden that to the RHEL family (which includes CentOS) and the Debian family (which includes Ubuntu and Bio-Linux) then I suspect that is a majority.
I'm just starting work on a native packaging infrastructure for Galaxy, that will enable tool dependencies to use defined versions of natively installed packages. That frees me up to make my packages work nicely on the RHEL family. It looks like the RPMs themselves (including SRPMs obviously) will be hosted by the CentOS project before too long. Once they're there, they can easily be archived forever. Anyone else on that platform is welcome to use the same infrastructure. Then, all we really need is someone to handle the packaging effort for the other major Linux distributions (a small number, I hope), and the problem is essentially solved. Getting the Bio-Linux team interested in multi-version packaging would be a great next step.
If any major Linux distributions could handle multiple versions of tools installed in parallel via their packaging infrastructure it would be great - at least for open source tools. Non-open source tools would still be problematic and need either manual install or scripting of some kind, as now.
I'll be posting here when I have progress to report on my native packaging effort.
cheers, Simon
That sounds promising, Thank you, Peter