Re: [galaxy-dev] directories as inputs/dataset file extensions

4 Apr 2012

      On Apr 4, 2012, at 1:48 PM, Aaron Gallagher wrote:
...
On Apr 4, 2012, at 8:11 AM, Langhorst, Brad wrote:
...
I think I would approach the directory problem with a wrapper script that takes arguments for each of the components needed by the tool.
The script could lay out the various files as expected in the working directory and call the script. I think that's cleaner than expecting users to build a tar archive with the proper structure.
Sorry if I wasn't more clear: the tools take _the entire directory_ (which we call reference packages, to be less ambiguous in the rest of this e-mail) as the input, not parts of it passed separately. Building these reference packages is not a problem. They're a fundamental part of a lot of analyses we do, and as such, we have tools to build them easily. For the Galaxy instance I'm trying to set up, though, most of the reference packages that users need will be provided as shared data.
If the number of files in a reference package is small, then it's not unreasonable to ask users to specify each one in addition to their input.
I have workflows where users specify a list of reads, a gtf file, a bed file, and an interval file.  It's not terribly onerous because galaxy will automatically choose an appropriate file from the history when starting the workflow. We import "sets" of these reference type files to histories by checking them off in the shared data area and import to histories en masse.

I guess you could consider a tarball of a directory to be a distinct file type, and proceed along your original path - but I think you're right that something seems fishy about that.

Maybe more a more specific example including all the files in the directory, the input data, and the specific tool that will do the analysis would make this clearer.

Best wishes,

Brad

--
Brad Langhorst
langhorst@neb.com
978-380-7564