This seems to fall within the recent discussion of map/reduce operators for workflows, e.g. breaking large jobs up for embarrassingly parallel tasks, then merging them back at some later point. Dannon mentioned some basic functionality does exist within Galaxy to do this, but it's at an early stage of development. Peter Cock did get some things to work with it. See this thread: http://thread.gmane.org/gmane.science.biology.galaxy.devel/4502/focus=4502 chris On Mar 6, 2012, at 9:56 AM, Ann Black wrote:
Good Morning,
We are also interested in this capability. To give a concrete example, we sometimes receive multiple sequence runs 1…* for the same sample. We would like to be able to process each run of the sample through BWA and then merge them together, post process it a bit, and then send the merged bam file through the rest of our standard pipeline.
Ideally this would be automated. Aurélien, I am interested in your workaround – this might get us part of the way there, as we could concatenate the fastq files together and run BWA once. Would you be willing to share some of your custom tools for us to iterate on? But ideally, for performance, we would run the bwa steps in parallel.
Is the galaxy team looking into these types of features or do other people have custom solutions they are using?
Thanks so much,
Ann ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: