workflow representation and execution...
Hi all! I have two questions: 1) Where are workflows saved in the filesystem? Are they saved as xml files (or something) that describes the structure of the workflow/pipeline? (e.g. tools to run, parameters, etc.) 2) When a cluster is used, does a complete workflow run on one node or is it possible that multiple tools in the same workflow are executed in different nodes? Thank you in advance! Kostas
Hi Kostas, Workflows are saved in the database. If you're looking for an external representation you can generate one by going to 'Download or Export' in an individual workflow's menu in the main workflows list. The download option there gives a JSON representation of the workflow that can also be imported into other Galaxies. As far as where the jobs will run, it is definitely possible that multiple jobs in a single workflow will execute on different nodes. -Dannon On Mar 30, 2011, at 5:21 AM, Kostas Karasavvas wrote:
Hi all!
I have two questions:
1) Where are workflows saved in the filesystem? Are they saved as xml files (or something) that describes the structure of the workflow/pipeline? (e.g. tools to run, parameters, etc.)
2) When a cluster is used, does a complete workflow run on one node or is it possible that multiple tools in the same workflow are executed in different nodes?
Thank you in advance! Kostas ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Mar 30, 2011, at 6:01 AM, Kostas Karasavvas wrote:
Thank you Dannon!
As far as where the jobs will run, it is definitely possible that multiple jobs in a single workflow will execute on different nodes.
Good to know. I assume that currently that is not happening though, right? If yes, is it in your immediate plans?
It should happen now - each step in the workflow would get submitted as a separate job to the cluster. At that point it is up to the job scheduler and not Galaxy to determine where the job is run. -- Glen L. Beane Senior Software Engineer The Jackson Laboratory (207) 288-6153
Hi Glen,
Good to know. I assume that currently that is not happening though, right? If yes, is it in your immediate plans?
It should happen now - each step in the workflow would get submitted as a separate job to the cluster. At that point it is up to the job scheduler and not Galaxy to determine where the job is run.
Ah, great. So if two connected tasks run on different nodes how is data movement between these nodes handled? Is there a distr. file system used to take care of that or it's being done "manually"? Thanks you! Kostas
On Mar 30, 2011, at 8:34 AM, Kostas Karasavvas wrote:
Hi Glen,
Good to know. I assume that currently that is not happening though, right? If yes, is it in your immediate plans?
It should happen now - each step in the workflow would get submitted as a separate job to the cluster. At that point it is up to the job scheduler and not Galaxy to determine where the job is run.
Ah, great. So if two connected tasks run on different nodes how is data movement between these nodes handled? Is there a distr. file system used to take care of that or it's being done "manually"?
Thanks you! Kostas
the job should copy its output files into the Galaxy database/files directory before it terminates. So assuming you are not using data staging and the database directory is shared via NFS, then the next tool will read the files from this location. There is an option to stage files if you use TORQUE, but I think most sites use the network shared storage approach. -- Glen L. Beane Senior Software Engineer The Jackson Laboratory (207) 288-6153
Hi Glen, All is clear now. Thanks for the info! :) K.
the job should copy its output files into the Galaxy database/files directory before it terminates. So assuming you are not using data staging and the database directory is shared via NFS, then the next tool will read the files from this location. There is an option to stage files if you use TORQUE, but I think most sites use the network shared storage approach.
participants (3)
-
Dannon Baker
-
Glen Beane
-
Kostas Karasavvas