Looking for recommendations: How to run galaxy workflows in batch
Hi All, I'm looking to batch process 40 large data sets with the same galaxy workflow. This obviously can be done in a brute-force manual manner. However, is there a better way to schedule/invoke these jobs in batch 1) from the UI with a plugin 2) command-line 3) web-service Thanks in advance for any pointers. Dave
Hi Dave, Yes, galaxy's standard run-workflow dialog has a feature where you can select multiple datasets as input for a single "Input Dataset" step. To do this, click the icon referenced by the tooltip in the screenshot below to select multiple files. All parameters remain static between executions except for the single input dataset that gets modified for each run, and that only one input dataset can be set to multiple files in this fashion. -Dannon On Feb 6, 2012, at 4:18 PM, Dave Lin wrote:
Hi All,
I'm looking to batch process 40 large data sets with the same galaxy workflow.
This obviously can be done in a brute-force manual manner.
However, is there a better way to schedule/invoke these jobs in batch
1) from the UI with a plugin 2) command-line 3) web-service
Thanks in advance for any pointers. Dave
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Thank you Dannon. That is helpful. What if I need to specify multiple inputs per run (i.e. .csfasta + .qual file)? -Dave On Mon, Feb 6, 2012 at 1:27 PM, Dannon Baker <dannonbaker@me.com> wrote:
Hi Dave,
Yes, galaxy's standard run-workflow dialog has a feature where you can select multiple datasets as input for a single "Input Dataset" step. To do this, click the icon referenced by the tooltip in the screenshot below to select multiple files. All parameters remain static between executions except for the single input dataset that gets modified for each run, and that only one input dataset can be set to multiple files in this fashion.
-Dannon
On Feb 6, 2012, at 4:18 PM, Dave Lin wrote:
Hi All,
I'm looking to batch process 40 large data sets with the same galaxy workflow.
This obviously can be done in a brute-force manual manner.
However, is there a better way to schedule/invoke these jobs in batch
1) from the UI with a plugin 2) command-line 3) web-service
Thanks in advance for any pointers. Dave
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
This method only works for single inputs at the moment, though eventually it'd be nice to allow pairing. Another option for you would be to use the workflows API, with which you can definitely specify multiple inputs. See workflow_execute.py in the scripts/api folder of your galaxy installation for one method of doing this. -Dannon On Feb 6, 2012, at 4:53 PM, Dave Lin wrote:
Thank you Dannon. That is helpful.
What if I need to specify multiple inputs per run (i.e. .csfasta + .qual file)?
-Dave
On Mon, Feb 6, 2012 at 1:27 PM, Dannon Baker <dannonbaker@me.com> wrote: Hi Dave,
Yes, galaxy's standard run-workflow dialog has a feature where you can select multiple datasets as input for a single "Input Dataset" step. To do this, click the icon referenced by the tooltip in the screenshot below to select multiple files. All parameters remain static between executions except for the single input dataset that gets modified for each run, and that only one input dataset can be set to multiple files in this fashion.
-Dannon
<PastedGraphic-2.png>
On Feb 6, 2012, at 4:18 PM, Dave Lin wrote:
Hi All,
I'm looking to batch process 40 large data sets with the same galaxy workflow.
This obviously can be done in a brute-force manual manner.
However, is there a better way to schedule/invoke these jobs in batch
1) from the UI with a plugin 2) command-line 3) web-service
Thanks in advance for any pointers. Dave
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Hello Dannon Could it be possible to have the input dataset's display name appended to the new history's name instead of plain numbers when the "Send results in a new history" option is checked? This new feature is indeed veeeeery useful (thanks a million for it) but the numbered suffixes make it hard to track what new history belongs to which dataset. Thanks, L-A Le 06/02/2012 23:00, Dannon Baker a écrit :
This method only works for single inputs at the moment, though eventually it'd be nice to allow pairing. Another option for you would be to use the workflows API, with which you can definitely specify multiple inputs. See workflow_execute.py in the scripts/api folder of your galaxy installation for one method of doing this.
-Dannon
On Feb 6, 2012, at 4:53 PM, Dave Lin wrote:
Thank you Dannon. That is helpful.
What if I need to specify multiple inputs per run (i.e. .csfasta + .qual file)?
-Dave
On Mon, Feb 6, 2012 at 1:27 PM, Dannon Baker<dannonbaker@me.com> wrote: Hi Dave,
Yes, galaxy's standard run-workflow dialog has a feature where you can select multiple datasets as input for a single "Input Dataset" step. To do this, click the icon referenced by the tooltip in the screenshot below to select multiple files. All parameters remain static between executions except for the single input dataset that gets modified for each run, and that only one input dataset can be set to multiple files in this fashion.
-Dannon
<PastedGraphic-2.png>
On Feb 6, 2012, at 4:18 PM, Dave Lin wrote:
Hi All,
I'm looking to batch process 40 large data sets with the same galaxy workflow.
This obviously can be done in a brute-force manual manner.
However, is there a better way to schedule/invoke these jobs in batch
1) from the UI with a plugin 2) command-line 3) web-service
Thanks in advance for any pointers. Dave
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Thanks for the suggestion, I like that! I'll make the change shortly. -Dannon On Feb 7, 2012, at 8:03 AM, Louise-Amélie Schmitt wrote:
Hello Dannon
Could it be possible to have the input dataset's display name appended to the new history's name instead of plain numbers when the "Send results in a new history" option is checked?
This new feature is indeed veeeeery useful (thanks a million for it) but the numbered suffixes make it hard to track what new history belongs to which dataset.
Thanks, L-A
Le 06/02/2012 23:00, Dannon Baker a écrit :
This method only works for single inputs at the moment, though eventually it'd be nice to allow pairing. Another option for you would be to use the workflows API, with which you can definitely specify multiple inputs. See workflow_execute.py in the scripts/api folder of your galaxy installation for one method of doing this.
-Dannon
On Feb 6, 2012, at 4:53 PM, Dave Lin wrote:
Thank you Dannon. That is helpful.
What if I need to specify multiple inputs per run (i.e. .csfasta + .qual file)?
-Dave
On Mon, Feb 6, 2012 at 1:27 PM, Dannon Baker<dannonbaker@me.com> wrote: Hi Dave,
Yes, galaxy's standard run-workflow dialog has a feature where you can select multiple datasets as input for a single "Input Dataset" step. To do this, click the icon referenced by the tooltip in the screenshot below to select multiple files. All parameters remain static between executions except for the single input dataset that gets modified for each run, and that only one input dataset can be set to multiple files in this fashion.
-Dannon
<PastedGraphic-2.png>
On Feb 6, 2012, at 4:18 PM, Dave Lin wrote:
Hi All,
I'm looking to batch process 40 large data sets with the same galaxy workflow.
This obviously can be done in a brute-force manual manner.
However, is there a better way to schedule/invoke these jobs in batch
1) from the UI with a plugin 2) command-line 3) web-service
Thanks in advance for any pointers. Dave
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Dannon Baker <dannonbaker@...> writes:
Hi Dave,
Yes, galaxy's standard run-workflow dialog has a feature where you can select
for a single "Input Dataset" step. To do this, click the icon referenced by
multiple datasets as input the tooltip in the screenshot
below to select multiple files. All parameters remain static between executions except for the single input dataset that gets modified for each run, and that only one input dataset can be set to multiple files in this fashion.
-Dannon
Dannon, what if I don't have this icon??? How can I enable this? Where is this documented? Thanks, Bernd
In your workflows, are you using "Input Dataset" steps? Galaxy uses these steps to know how to map datasets to do special things like this. If you're not currently using them, just open the workflow editor and add input dataset steps (it's at the very bottom of the tool list) connected to the tool inputs at the highest level of the workflow, and you'll see the multiple dataset flagging when you go to run it next time. -Dannon On Jul 4, 2012, at 3:19 AM, Bernd Jagla wrote:
Dannon Baker <dannonbaker@...> writes:
Hi Dave,
Yes, galaxy's standard run-workflow dialog has a feature where you can select
for a single "Input Dataset" step. To do this, click the icon referenced by
multiple datasets as input the tooltip in the screenshot
below to select multiple files. All parameters remain static between executions except for the single input dataset that gets modified for each run, and that only one input dataset can be set to multiple files in this fashion.
-Dannon
Dannon,
what if I don't have this icon??? How can I enable this? Where is this documented?
Thanks,
Bernd
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
But this only works if you have a single dataset (such as a BAM file) for each workflow to run on. If you have pairs of files (such as paired end FASTQ files, not an uncommon workflow nowadays :) ) you need to resort to using the API, since there is no support for paired end sequencing in GALAXY in this batch processing from the UI (yet?). You can run the Workflow one at a time, but you have to choose the FASTQ pairs your self. I have written a fairly generic execution engine that I can share, that uses a config file to describe the files you need from the library in simple key:value pairs and that can execute the paired-end sequencing on hundreds of FASTQ files...It's a little hacky and requires your FASTQ files to have some consistent naming for the forward and reverse reads (_R1.fastq & _R2.fastq) but other than that it seems to do the job... There is however a nasty bug in the API, in that it removes the files from your history if you use them in the API (I will post something on that later) but it seems to work fine for data in the libraries... Thon Regards, Thon de Boer, Ph.D. Bioinformatics Guru +1-650-799-6839 thondeboer@me.com LinkedIn Profile On Jul 4, 2012, at 12:19 AM, Bernd Jagla wrote:
Dannon Baker <dannonbaker@...> writes:
Hi Dave,
Yes, galaxy's standard run-workflow dialog has a feature where you can select
for a single "Input Dataset" step. To do this, click the icon referenced by
multiple datasets as input the tooltip in the screenshot
below to select multiple files. All parameters remain static between executions except for the single input dataset that gets modified for each run, and that only one input dataset can be set to multiple files in this fashion.
-Dannon
Dannon,
what if I don't have this icon??? How can I enable this? Where is this documented?
Thanks,
Bernd
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (5)
-
Bernd Jagla
-
Dannon Baker
-
Dave Lin
-
Louise-Amélie Schmitt
-
Thon Deboer