Number of outputs = number of inputs
Hi all! I have a tool which takes one ore more input files. For each input file one output is created, i.e. 1 input file -> 1 output file, 2 input files -> 2 output files, etc. What is the best way to handle this? I used the directions for handlin multiple output files where the ?Number of Output datasets cannot be determined until tool run? which in my opinion is a bit inappropriate. BTW: The input files are added via the <repeat>-Tag, so maybe there is a similar thing for outputs? Thanks in advance! Cheers, Sascha
I don't believe this is possible in Galaxy right now. Are the outputs independent or is information from all inputs used to produce all outputs? If they are independent, you can create a workflow containing just your tool with 1 input and 1 output and use the batch workflow mode to run it on multiple files and get multiple outputs. This is not a beautiful solution but it gets the job done in some cases. Another thing to look at might be the discussion we are having on the thread "pass more information on a dataset merge". We have a fork (its all work from Jorrit Boekel) of galaxy that creates composite datatypes for each explicitly defined type that can hold collections of a single type. https://bitbucket.org/galaxyp/galaxy-central-homogeneous-composite-datatypes... This would hopefully let you declare that you can accept a collection of whatever your input type is and produce a collection of whatever your output is. Lots of downsides to this approach - not fully implemented, and not included in Galaxy proper, your outputs would be wrapped up in a composite datatype so they wouldn't be easily processable by downstream tools. It would be good to have additional people hacking on it though :) -John ------------------------------------------------ John Chilton Senior Software Developer University of Minnesota Supercomputing Institute Office: 612-625-0917 Cell: 612-226-9223 Bitbucket: https://bitbucket.org/jmchilton Github: https://github.com/jmchilton Web: http://jmchilton.net On Tue, Oct 16, 2012 at 7:13 AM, Sascha Kastens <s.kastens@gatc-biotech.com> wrote:
Hi all!
I have a tool which takes one ore more input files. For each input file one output is created,
i.e. 1 input file -> 1 output file, 2 input files -> 2 output files, etc.
What is the best way to handle this? I used the directions for handlin multiple output files where
the ’Number of Output datasets cannot be determined until tool run’ which in my opinion is a bit
inappropriate. BTW: The input files are added via the <repeat>-Tag, so maybe there is a similar
thing for outputs?
Thanks in advance!
Cheers,
Sascha
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
I tried galaxy-central-homogeneous-composite-datatypes fork, works great. I have a similar problem, where number of output files varies, it seems that your approach might work for output files as well (not only input). Currently I'm trying to work out how to implement it, any help is appreciated. Alex -----Original Message----- From: galaxy-dev-bounces@lists.bx.psu.edu [mailto:galaxy-dev-bounces@lists.bx.psu.edu] On Behalf Of John Chilton Sent: Wednesday, 17 October 2012 12:49 AM To: Sascha Kastens Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Number of outputs = number of inputs I don't believe this is possible in Galaxy right now. Are the outputs independent or is information from all inputs used to produce all outputs? If they are independent, you can create a workflow containing just your tool with 1 input and 1 output and use the batch workflow mode to run it on multiple files and get multiple outputs. This is not a beautiful solution but it gets the job done in some cases. Another thing to look at might be the discussion we are having on the thread "pass more information on a dataset merge". We have a fork (its all work from Jorrit Boekel) of galaxy that creates composite datatypes for each explicitly defined type that can hold collections of a single type. https://bitbucket.org/galaxyp/galaxy-central-homogeneous-composite-datatypes... This would hopefully let you declare that you can accept a collection of whatever your input type is and produce a collection of whatever your output is. Lots of downsides to this approach - not fully implemented, and not included in Galaxy proper, your outputs would be wrapped up in a composite datatype so they wouldn't be easily processable by downstream tools. It would be good to have additional people hacking on it though :) -John ------------------------------------------------ John Chilton Senior Software Developer University of Minnesota Supercomputing Institute Office: 612-625-0917 Cell: 612-226-9223 Bitbucket: https://bitbucket.org/jmchilton Github: https://github.com/jmchilton Web: http://jmchilton.net On Tue, Oct 16, 2012 at 7:13 AM, Sascha Kastens <s.kastens@gatc-biotech.com> wrote:
Hi all!
I have a tool which takes one ore more input files. For each input file one output is created,
i.e. 1 input file -> 1 output file, 2 input files -> 2 output files, etc.
What is the best way to handle this? I used the directions for handlin multiple output files where
the 'Number of Output datasets cannot be determined until tool run' which in my opinion is a bit
inappropriate. BTW: The input files are added via the <repeat>-Tag, so maybe there is a similar
thing for outputs?
Thanks in advance!
Cheers,
Sascha
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Like most days, JJ very politely pointed out that I am a wrong this morning. You can have variable numbers of outputs at runtime, see the last section ("Number of Output datasets cannot be determined until tool run" ) of this page: http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files Sorry about that. -John On Tue, Oct 16, 2012 at 8:48 AM, John Chilton <chil0060@umn.edu> wrote:
I don't believe this is possible in Galaxy right now. Are the outputs independent or is information from all inputs used to produce all outputs? If they are independent, you can create a workflow containing just your tool with 1 input and 1 output and use the batch workflow mode to run it on multiple files and get multiple outputs. This is not a beautiful solution but it gets the job done in some cases.
Another thing to look at might be the discussion we are having on the thread "pass more information on a dataset merge". We have a fork (its all work from Jorrit Boekel) of galaxy that creates composite datatypes for each explicitly defined type that can hold collections of a single type.
https://bitbucket.org/galaxyp/galaxy-central-homogeneous-composite-datatypes...
This would hopefully let you declare that you can accept a collection of whatever your input type is and produce a collection of whatever your output is. Lots of downsides to this approach - not fully implemented, and not included in Galaxy proper, your outputs would be wrapped up in a composite datatype so they wouldn't be easily processable by downstream tools. It would be good to have additional people hacking on it though :)
-John
------------------------------------------------ John Chilton Senior Software Developer University of Minnesota Supercomputing Institute Office: 612-625-0917 Cell: 612-226-9223 Bitbucket: https://bitbucket.org/jmchilton Github: https://github.com/jmchilton Web: http://jmchilton.net
On Tue, Oct 16, 2012 at 7:13 AM, Sascha Kastens <s.kastens@gatc-biotech.com> wrote:
Hi all!
I have a tool which takes one ore more input files. For each input file one output is created,
i.e. 1 input file -> 1 output file, 2 input files -> 2 output files, etc.
What is the best way to handle this? I used the directions for handlin multiple output files where
the ’Number of Output datasets cannot be determined until tool run’ which in my opinion is a bit
inappropriate. BTW: The input files are added via the <repeat>-Tag, so maybe there is a similar
thing for outputs?
Thanks in advance!
Cheers,
Sascha
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (4)
-
Alex.Khassapov@csiro.au
-
John Chilton
-
John Chilton
-
Sascha Kastens