Running workflows on mulitple paired-end data sets
Hi, Is there an intelligent way to run a workflow that starts with more than one pair of paired-end data sets in a workflow? I can run multiple workflows when there is a SINGLE input file, but in the case of many paired end workflows, you need to provide two paired samples and just adding another input workflow step does not work, since you can only select a single file in the second input workflow set. I realize that there is an issue with the cycling through the datasets since they need to be paired-up and it would be very easy to mess this up if you allow two (or more) input data sets, but since paired end data is going to be very common, I would assume a special paired end input workflow step that is intelligent and can pair up the input datasets before handing them off to the workflow would be very useful. I can think of workarounds ofcourse, such as creating an interlaced file containing both pairs and then just de-interlacing them in the workflow, but that is kludgy and will result in a lot of data duplication for no reason... I have to run 88 samples which will soon grow to over 200 samples, so running each step manually is not really an option and I would hate to have to program the workflow steps myself... Any ideas? Thanks Thon
Thon, Not currently, though the question comes up often enough that I'll try and bump it up my list. Instead of interlacing and deinterlacing a better workaround for now might be to use the API to execute over all these samples. -Dannon On Feb 13, 2012, at 8:46 PM, Anthonius deBoer wrote:
Hi,
Is there an intelligent way to run a workflow that starts with more than one pair of paired-end data sets in a workflow? I can run multiple workflows when there is a SINGLE input file, but in the case of many paired end workflows, you need to provide two paired samples and just adding another input workflow step does not work, since you can only select a single file in the second input workflow set.
I realize that there is an issue with the cycling through the datasets since they need to be paired-up and it would be very easy to mess this up if you allow two (or more) input data sets, but since paired end data is going to be very common, I would assume a special paired end input workflow step that is intelligent and can pair up the input datasets before handing them off to the workflow would be very useful.
I can think of workarounds ofcourse, such as creating an interlaced file containing both pairs and then just de-interlacing them in the workflow, but that is kludgy and will result in a lot of data duplication for no reason...
I have to run 88 samples which will soon grow to over 200 samples, so running each step manually is not really an option and I would hate to have to program the workflow steps myself...
Any ideas?
Thanks
Thon ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (2)
-
Anthonius deBoer
-
Dannon Baker