Hi Dannon, If I may further elaborate on this issue, I would like to mention that this kind of functionality is also supported by the Sun Grid Engine in the form of 'array jobs'. With this functionality you can execute a job multiple times in an independent way, only differing for instance in the parameter settings. From your description below, it seems similar to the Galaxy parallelism tag. Is there or do you foresee any implementation of this SGE functionality through the drmaa interface in Galaxy? If not, is there anybody who has achieved this through some custom coding? We would be highly interested in this. thanks Bram On 15/02/2012 18:08, Dannon Baker wrote:
It's definitely an experimental feature at this point, and there's no wiki, but basic support for breaking jobs into tasks does exist. It needs a lot more work and can go in a few different directions to make it better, but check out the wrappers with<parallelism> defined, and enable use_tasked_jobs in your universe_wsgi.ini and restart. That's all it should take from a fresh galaxy install to get, iirc, at least BWA and a few other tools working. If you want a super trivial example to play with, change the tool .xml for text tool like "change case" to have<parallelism method="basic"></parallelism> and give that a shot.
If you decide to try this out, do keep in mind that this feature is not at all complete and while there's a long list of things we still want to experiment with along these lines suggestions (and especially contributions) are absolutely welcome.
-Dannon
On Feb 15, 2012, at 11:36 AM, Peter Cock wrote:
Hi all,
The comments on this issue suggest that the Galaxy team is/were working on splitting large jobs over multiple nodes/CPUs:
https://bitbucket.org/galaxy/galaxy-central/issue/79/split-large-jobs
Is there any relevant page on the wiki I should be aware of?
Specifically I am hoping for a general framework where one of the tool inputs can be marked as "embarrassingly parallel" meaning it can be subdivided easily (e.g. multiple sequences in FASTA or FASTQ format, multiple annotations in BED format, multiple lines in tabular format) and the outputs can all be easily combined (e.g. by concatenation in the same order as the input was split).
Thanks,
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- ========================================================== Bram Slabbinck, PhD
Bioinformatics& Systems Biology Division VIB Department of Plant Systems Biology, UGent Technologiepark 927, 9052 Gent, BELGIUM
Email: Bram.Slabbinck@psb.ugent.be WWW: http://bioinformatics.psb.ugent.be ========================================================== Please consider the environment before printing this email