On Wed, Feb 15, 2012 at 5:08 PM, Dannon Baker <dannonbaker@me.com> wrote:
It's definitely an experimental feature at this point, and there's no wiki, but basic support for breaking jobs into tasks does exist. It needs a lot more work and can go in a few different directions to make it better,
Not what I was hoping to hear, but a promising start :)
but check out the wrappers with <parallelism> defined, and enable use_tasked_jobs in your universe_wsgi.ini and restart. That's all it should take from a fresh galaxy install to get, iirc, at least BWA and a few other tools working. If you want a super trivial example to play with, change the tool .xml for text tool like "change case" to have <parallelism method="basic"></parallelism> and give that a shot.
Excellent - that saved me searching blindly. $ cd tools $ grep parallelism */*.xml samtools/sam_bitwise_flag_filter.xml: <parallelism method="basic"></parallelism> sr_mapping/bowtie_wrapper.xml: <parallelism method="basic"></parallelism> sr_mapping/bwa_color_wrapper.xml: <parallelism method="basic"></parallelism> sr_mapping/bwa_wrapper.xml: <parallelism method="basic"></parallelism> Are those four tools being used on Galaxy Main already with this basic parallelism in place? Looking at the code in lib/galaxy/jobs/splitters/basic.py its comments suggest it only works on tools with one input and one output file (although that seems a bit fuzzy as you could be using BWA with a FASTA history item as the reference - would that fail?). I see also interesting things in lib/galaxy/jobs/splitters/multi.py Is that even more experimental? It looks like it could be used to say BWA's read file was to be split, but the reference file shared. Regarding the merging of the out, I see there is a default merge method in lib/galaxy/datatypes/data.py which just concatenates the files. I am surprised at that - it seems like a very bad idea in general - consider many binary files, or XML. Why not put this as the default for text and subclasses thereof? There is also one example where the merge method gets overridden, lib/galaxy/datatypes/tabular.py which avoids the repetition of any headers when merging SAM files. That should be enough clues to implement other customized merge code for other datatypes.
If you decide to try this out, do keep in mind that this feature is not at all complete and while there's a long list of things we still want to experiment with along these lines suggestions (and especially contributions) are absolutely welcome.
OK then, I hope to have a play with this shortly. Thanks, Peter