Running multiple files automatically within workflows
We have a problem that is probably not unique and may have a solution already. We have a script that looks at a directory and identifies all pileup files in that directory and performs a process on each of those files. We would like to integrate this script into galaxy but we're not sure how to handle the input of a directory (rather than a single file) into a workflow? One way (I am guessing) is if the generation of a data library could be automated. This then could somehow automatically import into an analysis workflow and run on every sample imported. I suppose I could manually have a directory of files imported and then run each workflow on the file, one by one, but in the case of having 400 or more samples, this just isn't practical. Is there any way to have a single application simply use a directory as input, instead of a single file, and then have a workflow execute on each of these files automatically? Aside from doing this within our script, it'd be nice to be able to do this for NGS alignments as well, in the case that we have an input set of multiplexed data, where the workflow needs to execute on (potentially) a few hundred samples repeatedly? I'm sure we could probably write a perl script wrapper that does this in the background somehow, but would prefer not to have to hack galaxy into performing this type of function. Thanks in advance. -Juan Perin
Hi Juan, This feature -- automatically running a workflow repeatedly for a number of datasets -- is not yet implemented but is definitely on our todo list. Most likely these datasets will come from a library. I've created an issue for this feature, and you can follow and/or comment on the feature here: http://bitbucket.org/galaxy/galaxy-central/issue/339/automatically-running-a... Thanks, J. On May 28, 2010, at 12:14 PM, Juan Carlos Perin wrote:
We have a problem that is probably not unique and may have a solution already. We have a script that looks at a directory and identifies all pileup files in that directory and performs a process on each of those files. We would like to integrate this script into galaxy but we're not sure how to handle the input of a directory (rather than a single file) into a workflow?
One way (I am guessing) is if the generation of a data library could be automated. This then could somehow automatically import into an analysis workflow and run on every sample imported. I suppose I could manually have a directory of files imported and then run each workflow on the file, one by one, but in the case of having 400 or more samples, this just isn't practical.
Is there any way to have a single application simply use a directory as input, instead of a single file, and then have a workflow execute on each of these files automatically? Aside from doing this within our script, it'd be nice to be able to do this for NGS alignments as well, in the case that we have an input set of multiplexed data, where the workflow needs to execute on (potentially) a few hundred samples repeatedly?
I'm sure we could probably write a perl script wrapper that does this in the background somehow, but would prefer not to have to hack galaxy into performing this type of function.
Thanks in advance.
-Juan Perin _______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
participants (2)
-
Jeremy Goecks
-
Juan Carlos Perin