I have implemented a cool idea Brad Chapman had the recent BOSC Codefest. It would save me tons of effort related to maintaining separate proteomics module definitions for the tool shed and CloudBioLinux for Galaxy-P. I thought I would throw this out there and see if anyone has any comments:
From the pull request:
CloudBioLinux contains custom fabric install procedures for dozens of bioinformatics packages and new ones are added all the time. It also contains code used to automatically setup Galaxy env.sh files to support multiple versions of given software. While it started off as a way to configure a particular distribution of Linux, it now supports many distributions and these custom install procedures in particular are not tied to any particular varaint of Linux or even Linux itself. This changeset adds the ability for Tool Shed tools to quickly and easily leverage this wealth of Galaxy module ready software. In particular it adds a new action type - 'cloudbiolinux_install'. I have created the repository 'ngscbltest' in 'Next Gen Mappers' section on the Test Tool Shed to demonstrate and test this functionality. Here is an example from 'tool_dependencies.xml' in that repository: <tool_dependency> <package name="tophat2" version="2.0.8b"> <install version="1.0"> <actions> <action type="cloudbiolinux_install" cbl_revision="4bd005355991a5c8b855ea790ae75e8178ede8c1" tool_name="tophat2" tool_version="2.0.8b" /> </actions> </install> <readme>Tophat 2.</readme> </package> ... </tool_dependency> When installed into a Galaxy instance, this example will cause the version '2.0.8b' to be passed to the 'install_tophat2' function in CloudBioLinux and configure it to install into the directory expected by the Tool Shed client code. Additonally, CloudBioLinux will setup an env.sh file for this installation. All of this will be installed as a 'tophat2' package. In the above example,a particulr revision of CloudBioLinux was specified for the sake of reproducibility. That attribute is optional however and will default to 'master'. Likewise, reasonble defaults for the attributes 'tool_name' and 'tool_version' can be inferred from context. As a demonstration, the following package definition would result in Tophat 1.3.3 being installed as the package 'tophat'. <package name="tophat" version="1.3.3"> <install version="1.0"> <actions> <action type="cloudbiolinux_install"/> </actions> </install> <readme>Tophat 1.3.3.</readme> </package> The name of install method doesn't have to match the packge name, for instance the install_cufflinks function in CloudBioLinux can be used to install Cufflinks 1 or 2 as shown below: <package name="cufflinks2" version="2.1.1"> <install version="1.0"> <actions> <action type="cloudbiolinux_install" cbl_url="https://github.com/jmchilton/cloudbiolinux.git" tool_name="cufflinks" /> </actions> </install> <readme>Cufflinks2.</readme> </package> This final example also demonstrates how to target a customized fork of CloudBioLinux. Implementation: On the Galaxy side of this equation, the implementation is fairly straight forward. fabric_util.py has been updated with a '__install_cloudbiolinux_tool' function which implements this action entirely. This function clones clodubiolinux to a temporary directory (optionally checking out a particular revision), and uses the CloudBioLinux deployer functionality to configure a local, no-ssh install of the specified software. Requirements: The core framework for obtaining and running CloudBioLinux requires only bash, wget, git, and python-dev/python-devel. These are likely a subset of packages the IUC has already agreed will need to be present. Individual install procedures may have additonal OS level requirements, but the same could be true of any Make file compiled with the existing Tool Shed infrastructure. If particular modules are isolated as being useful, I am happy to work with the tool developers to ensure the CloudBioLinux install procedure can work with minimal prerequisites. Thanks, -John