New subject: Running multiple files automatically within workflows

28 May 2010

      We have a problem that is probably not unique and may have a solution already.  We have a script that looks at a directory and identifies all pileup files in that directory and performs a process on each of those files.  We would like to integrate this script into galaxy but we're not sure how to handle the input of a directory (rather than a single file) into a workflow?  

One way (I am guessing) is if the generation of a data library could be automated.  This then could somehow automatically import into an analysis workflow and run on every sample imported.  I suppose I could manually have a directory of files imported and then run each workflow on the file, one by one, but in the case of having 400 or more samples, this just isn't practical.  

Is there any way to have a single application simply use a directory as input, instead of a single file, and then have a workflow execute on each of these files automatically?  Aside from doing this within our script, it'd be nice to be able to do this for NGS alignments as well, in the case that we have an input set of multiplexed data, where the workflow needs to execute on (potentially) a few hundred samples repeatedly? 

I'm sure we could probably write a perl script wrapper that does this in the background somehow, but would prefer not to have to hack galaxy into performing this type of function.

Thanks in advance.

-Juan Perin

Running multiple files automatically within workflows

Juan Carlos Perin

Jeremy Goecks

tags

participants (2)