Hi All,
I'm looking to batch process 40 large data sets with the same galaxy workflow.
This obviously can be done in a brute-force manual manner.
However, is there a better way to schedule/invoke these jobs in batch
1) from the UI with a plugin 2) command-line 3) web-service
Thanks in advance for any pointers. Dave
Hi Dave,
Yes, galaxy's standard run-workflow dialog has a feature where you can select multiple datasets as input for a single "Input Dataset" step. To do this, click the icon referenced by the tooltip in the screenshot below to select multiple files. All parameters remain static between executions except for the single input dataset that gets modified for each run, and that only one input dataset can be set to multiple files in this fashion.
-Dannon
On Feb 6, 2012, at 4:18 PM, Dave Lin wrote:
Hi All,
I'm looking to batch process 40 large data sets with the same galaxy workflow.
This obviously can be done in a brute-force manual manner.
However, is there a better way to schedule/invoke these jobs in batch
- from the UI with a plugin
- command-line
- web-service
Thanks in advance for any pointers. Dave
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Thank you Dannon. That is helpful.
What if I need to specify multiple inputs per run (i.e. .csfasta + .qual file)?
-Dave
On Mon, Feb 6, 2012 at 1:27 PM, Dannon Baker dannonbaker@me.com wrote:
Hi Dave,
Yes, galaxy's standard run-workflow dialog has a feature where you can select multiple datasets as input for a single "Input Dataset" step. To do this, click the icon referenced by the tooltip in the screenshot below to select multiple files. All parameters remain static between executions except for the single input dataset that gets modified for each run, and that only one input dataset can be set to multiple files in this fashion.
-Dannon
On Feb 6, 2012, at 4:18 PM, Dave Lin wrote:
Hi All,
I'm looking to batch process 40 large data sets with the same galaxy workflow.
This obviously can be done in a brute-force manual manner.
However, is there a better way to schedule/invoke these jobs in batch
- from the UI with a plugin
- command-line
- web-service
Thanks in advance for any pointers. Dave
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
This method only works for single inputs at the moment, though eventually it'd be nice to allow pairing. Another option for you would be to use the workflows API, with which you can definitely specify multiple inputs. See workflow_execute.py in the scripts/api folder of your galaxy installation for one method of doing this.
-Dannon
On Feb 6, 2012, at 4:53 PM, Dave Lin wrote:
Thank you Dannon. That is helpful.
What if I need to specify multiple inputs per run (i.e. .csfasta + .qual file)?
-Dave
On Mon, Feb 6, 2012 at 1:27 PM, Dannon Baker dannonbaker@me.com wrote: Hi Dave,
Yes, galaxy's standard run-workflow dialog has a feature where you can select multiple datasets as input for a single "Input Dataset" step. To do this, click the icon referenced by the tooltip in the screenshot below to select multiple files. All parameters remain static between executions except for the single input dataset that gets modified for each run, and that only one input dataset can be set to multiple files in this fashion.
-Dannon
<PastedGraphic-2.png>
On Feb 6, 2012, at 4:18 PM, Dave Lin wrote:
Hi All,
I'm looking to batch process 40 large data sets with the same galaxy workflow.
This obviously can be done in a brute-force manual manner.
However, is there a better way to schedule/invoke these jobs in batch
- from the UI with a plugin
- command-line
- web-service
Thanks in advance for any pointers. Dave
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Hello Dannon
Could it be possible to have the input dataset's display name appended to the new history's name instead of plain numbers when the "Send results in a new history" option is checked?
This new feature is indeed veeeeery useful (thanks a million for it) but the numbered suffixes make it hard to track what new history belongs to which dataset.
Thanks, L-A
Le 06/02/2012 23:00, Dannon Baker a écrit :
This method only works for single inputs at the moment, though eventually it'd be nice to allow pairing. Another option for you would be to use the workflows API, with which you can definitely specify multiple inputs. See workflow_execute.py in the scripts/api folder of your galaxy installation for one method of doing this.
-Dannon
On Feb 6, 2012, at 4:53 PM, Dave Lin wrote:
Thank you Dannon. That is helpful.
What if I need to specify multiple inputs per run (i.e. .csfasta + .qual file)?
-Dave
On Mon, Feb 6, 2012 at 1:27 PM, Dannon Bakerdannonbaker@me.com wrote: Hi Dave,
Yes, galaxy's standard run-workflow dialog has a feature where you can select multiple datasets as input for a single "Input Dataset" step. To do this, click the icon referenced by the tooltip in the screenshot below to select multiple files. All parameters remain static between executions except for the single input dataset that gets modified for each run, and that only one input dataset can be set to multiple files in this fashion.
-Dannon
<PastedGraphic-2.png>
On Feb 6, 2012, at 4:18 PM, Dave Lin wrote:
Hi All,
I'm looking to batch process 40 large data sets with the same galaxy workflow.
This obviously can be done in a brute-force manual manner.
However, is there a better way to schedule/invoke these jobs in batch
- from the UI with a plugin
- command-line
- web-service
Thanks in advance for any pointers. Dave
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Thanks for the suggestion, I like that! I'll make the change shortly.
-Dannon
On Feb 7, 2012, at 8:03 AM, Louise-Amélie Schmitt wrote:
Hello Dannon
Could it be possible to have the input dataset's display name appended to the new history's name instead of plain numbers when the "Send results in a new history" option is checked?
This new feature is indeed veeeeery useful (thanks a million for it) but the numbered suffixes make it hard to track what new history belongs to which dataset.
Thanks, L-A
Le 06/02/2012 23:00, Dannon Baker a écrit :
This method only works for single inputs at the moment, though eventually it'd be nice to allow pairing. Another option for you would be to use the workflows API, with which you can definitely specify multiple inputs. See workflow_execute.py in the scripts/api folder of your galaxy installation for one method of doing this.
-Dannon
On Feb 6, 2012, at 4:53 PM, Dave Lin wrote:
Thank you Dannon. That is helpful.
What if I need to specify multiple inputs per run (i.e. .csfasta + .qual file)?
-Dave
On Mon, Feb 6, 2012 at 1:27 PM, Dannon Bakerdannonbaker@me.com wrote: Hi Dave,
Yes, galaxy's standard run-workflow dialog has a feature where you can select multiple datasets as input for a single "Input Dataset" step. To do this, click the icon referenced by the tooltip in the screenshot below to select multiple files. All parameters remain static between executions except for the single input dataset that gets modified for each run, and that only one input dataset can be set to multiple files in this fashion.
-Dannon
<PastedGraphic-2.png>
On Feb 6, 2012, at 4:18 PM, Dave Lin wrote:
Hi All,
I'm looking to batch process 40 large data sets with the same galaxy workflow.
This obviously can be done in a brute-force manual manner.
However, is there a better way to schedule/invoke these jobs in batch
- from the UI with a plugin
- command-line
- web-service
Thanks in advance for any pointers. Dave
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Dannon Baker <dannonbaker@...> writes:
Hi Dave,
Yes, galaxy's standard run-workflow dialog has a feature where you can select
multiple datasets as input
for a single "Input Dataset" step. To do this, click the icon referenced by
the tooltip in the screenshot
below to select multiple files. All parameters remain static between
executions except for the single
input dataset that gets modified for each run, and that only one input dataset
can be set to multiple files
in this fashion.
-Dannon
Dannon,
what if I don't have this icon??? How can I enable this? Where is this documented?
Thanks,
Bernd
In your workflows, are you using "Input Dataset" steps? Galaxy uses these steps to know how to map datasets to do special things like this. If you're not currently using them, just open the workflow editor and add input dataset steps (it's at the very bottom of the tool list) connected to the tool inputs at the highest level of the workflow, and you'll see the multiple dataset flagging when you go to run it next time.
-Dannon
On Jul 4, 2012, at 3:19 AM, Bernd Jagla wrote:
Dannon Baker <dannonbaker@...> writes:
Hi Dave,
Yes, galaxy's standard run-workflow dialog has a feature where you can select
multiple datasets as input
for a single "Input Dataset" step. To do this, click the icon referenced by
the tooltip in the screenshot
below to select multiple files. All parameters remain static between
executions except for the single
input dataset that gets modified for each run, and that only one input dataset
can be set to multiple files
in this fashion.
-Dannon
Dannon,
what if I don't have this icon??? How can I enable this? Where is this documented?
Thanks,
Bernd
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
But this only works if you have a single dataset (such as a BAM file) for each workflow to run on. If you have pairs of files (such as paired end FASTQ files, not an uncommon workflow nowadays :) ) you need to resort to using the API, since there is no support for paired end sequencing in GALAXY in this batch processing from the UI (yet?). You can run the Workflow one at a time, but you have to choose the FASTQ pairs your self.
I have written a fairly generic execution engine that I can share, that uses a config file to describe the files you need from the library in simple key:value pairs and that can execute the paired-end sequencing on hundreds of FASTQ files...It's a little hacky and requires your FASTQ files to have some consistent naming for the forward and reverse reads (_R1.fastq & _R2.fastq) but other than that it seems to do the job...
There is however a nasty bug in the API, in that it removes the files from your history if you use them in the API (I will post something on that later) but it seems to work fine for data in the libraries...
Thon Regards,
Thon de Boer, Ph.D. Bioinformatics Guru +1-650-799-6839 thondeboer@me.com LinkedIn Profile
On Jul 4, 2012, at 12:19 AM, Bernd Jagla wrote:
Dannon Baker <dannonbaker@...> writes:
Hi Dave,
Yes, galaxy's standard run-workflow dialog has a feature where you can select
multiple datasets as input
for a single "Input Dataset" step. To do this, click the icon referenced by
the tooltip in the screenshot
below to select multiple files. All parameters remain static between
executions except for the single
input dataset that gets modified for each run, and that only one input dataset
can be set to multiple files
in this fashion.
-Dannon
Dannon,
what if I don't have this icon??? How can I enable this? Where is this documented?
Thanks,
Bernd
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
galaxy-dev@lists.galaxyproject.org