Setting the environment for each tool just prior to running the tool is clearly better.
What I'm doing currently is loading an environment module just before running a tool in Galaxy.
On our HPC cluster, I have configured Galaxy to source the following from our modules system: 1) Global file as defined in universe_wsgi.ini environment_setup_file=setup-modules-for-galaxy.sh ^^ This setups the module environment so commands like "module load " work. It does something similar to: source /data/apps/modules/Modules/default/init/bash 2>&1 &> /dev/null 2) Configured tool_dependency_dir as described here: http://wiki.g2.bx.psu.edu/Admin/Config/Tool%20Dependencies tool_dependency_dir=tool_dependency For each requirement as specified the xml, galaxy will automatically look in tool_dependency/<name>/default/env.sh Each dependancy's env.sh then use our favorite command: module load blah/version Cumbersome? Yup! But I believe its a nice elegant way of getting thing to work with no mucking around in the xml file.
This will enable my native EMBOSS wrapper to run the defined version of the (system-installed) EMBOSS suite. However, it would only be worth me writing this extension if there's a likelihood of getting it into Galaxy. Is there?
I truly like the idea and think it would. However, stepping into the Galaxy devs shoes, a solution like this might only be best for those using modules (small subset of users). Hopefully others can give their thoughts. The con I see in this is that Galaxy already has the env.sh solution which in my setup described above, does support modules....
In the meantime, there's work going on in Galaxy land to package every application wanted in Galaxy, by writing install scripts, encapsulating environment variables in tool XML files, etc. (There is difficulty here in writing install scripts which work on every platform, which I have commented on previously.)
I saw that post last week and while I like the idea, folks like me who realllllly do care about the version of GCC we use (and compile our own versions of gcc) will not use it. However, that is going off topic now :-) Will wait and see what others say... -- Adam Brenner Computer Science, Undergraduate Student Donald Bren School of Information and Computer Sciences Research Computing Support Office of Information Technology http://www.oit.uci.edu/rcs/ University of California, Irvine www.ics.uci.edu/~aebrenne/ aebrenne@uci.edu On Thu, Sep 12, 2013 at 4:03 PM, Guest, Simon <Simon.Guest@agresearch.co.nz> wrote:
Another vote for the excellent modules system from us at AgResearch
Galaxy's dependency injection works in a similar way, without requiring all the external dependencies. We don't need unload since we compose a unique environment for every tool execution. This is much cleaner that loading everything into the environment before running Galaxy, we want tools to be executed with as clean an environment as possible.
Setting the environment for each tool just prior to running the tool is clearly better.
What I'm doing currently is loading an environment module just before running a tool in Galaxy. So far, I have done this in the tool XML file by prefixing the command with a module load command, e.g.
<command>. /etc/profile; module load EMBOSS/5.0.0; infoseq -sequence [snipped] </command>
The ugly ". /etc/profile" is needed for technical reasons. I don't like this, as it's a bit hacky, but what it achieves is useful and flexible.
I envisage an extension to the tool XML syntax which would let me write it like this, say:
<requirements><requirement type="module" version="5.0.0">EMBOSS</requirement></requirements> <command>infoseq -sequence [snipped] </command>
This will enable my native EMBOSS wrapper to run the defined version of the (system-installed) EMBOSS suite. However, it would only be worth me writing this extension if there's a likelihood of getting it into Galaxy. Is there?
Here's where I'm coming from. I already have most of the applications I want installed on my system. I am working on updating the RPMs to support side-by-side installs of multiple versions, with version selection by the environment modules facility. This is a small change (and improvement) on what I have today. It's a work in progress, but it's a manageable amount of work. It's looking likely that this multi-version packaging approach may be adopted by an official CentOS Scientific Repo, and this could change the landscape in terms of packaging of scientific applications for major Linux distributions. Who knows, that's all in the future.
In the meantime, there's work going on in Galaxy land to package every application wanted in Galaxy, by writing install scripts, encapsulating environment variables in tool XML files, etc. (There is difficulty here in writing install scripts which work on every platform, which I have commented on previously.)
The big question is, will the Galaxy team accept contributions to Galaxy to support different ways of solving this packaging problem? In particular, allowing for environment modules to be loaded as part of running a tool, and having this functionality in the toolshed?
cheers, Simon
======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. =======================================================================