I suggested augmenting the tool_conf syntax as part of the DataCollection development. To replace the need for the the multiple output determined at runtime, I suggest being able to declare data collections within the outputs tags, and being able to use regular expressions in the from_work_dir param to populate the collections. In workflows, one would want to be able to hook a data collection output to a data input. Mothur Metagenomics tool that has an output per distance label and calculator method An example of declaring a list of outputs, which will determined at run time based on from_work_dir regular expression: <tool id="mothur_classify_otu" name="Classify.otu" version="1.20.0" force_history_refresh="True"> ... <outputs> <dataset_collection type="list" label="${tool.name} on ${on_string} consensus taxonomies"> <data format="cons.taxonomy" name="splicing_diff" label="${tool.name} on ${on_string}: ${file_name}" from_work_dir="^\S+?\.(unique|[0-9.]*\.cons\.taxonomy)$" /> </dataset_collection> <dataset_collection type="list" label="{tool.name} on ${on_string} taxomy summaries"> <data format="cons.taxonomy" name="splicing_diff" label="${tool.name} on ${on_string}: ${file_name}" from_work_dir="^\S+?\.(unique|[0-9.]*\.cons\.tax\.summary)$" /> </dataset_collection> </outputs>
Hey Peter,
Have you seen this solution?
https://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files#Number_...
It always seems to get mentioned when this topic is brought up. It has serious limitations in terms or workflow running, but it can probably be made to work for individual tool executions.
Otherwise I would wait for future feature sets, maybe other people have some good ideas however.
-John
On Wed, Feb 19, 2014 at 7:16 AM, Peter Cock<p.j.a.cock@googlemail.com> wrote:
Hi all,
I'm looking for examples of tools which take multiple input files (one or more, determined at run time) and produce multiple output files (one for each input file). Any specific suggestions?
I have a number of sequence filtering/renaming tools where this might be useful - in some cases taking multiple input files and producing a single output is fine, but in general I'd like to know how to preserve a one to one mapping from input files to output files.
I realise this may overlap slightly with the work John is doing on dataset collections, but for now I'd like to target the current Galaxy feature set.
In some of the simpler cases, if I have N input datasets and want N output files, I can just run the tool N times . This means more steps in the Galaxy GUI, but it isn't very complicated.
However, for the current problem I need access to all the inputs at once for setting overall data derived parameters.
Regards,
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- James E. Johnson, Minnesota Supercomputing Institute, University of Minnesota