Problem with format="input" when using multiple inputs

21 Jul 2010

      Hi,

I came across a small problem in the tool definition parameters. It's rare, but it happens.

The problem is that if the tool has more than one input file, AND uses variable output format based on the input file, AND the multiple input files are of different formats, then there's no way to properly set the output format based on the input format of a specific input.

Example:
Assume an imaginary tool that accepts 2 inputs: one input can be BED/GFF/BAM, the other input is tabular.
The output should be the same as the interval file (BED/GFF/BAM).

The XML definition would be:
==========
 <inputs>
    <param format="bed,gff,bam" name="input1" type="data" label="Intervals" />
    <param format="tabular"     name="input2" type="data" label="filter" />
  </inputs>

  <outputs>
    <data format="input" name="output" metadata_source="input1" />
  </outputs>
===========

The problem is with 'format="input"' . Any other fixed format (e.g. "fasta" or "bam") would work fine.
The special value "input" is hard-coded (lib/galaxy/tools/actions/__init__.py:226) to take the extension of "one" of the input files, but there's no way to specify which input (due to the loop in lib/galaxy/tools/actions/__init__.py:159)

I guess it's not very a common scenario, because most tools that accept multiple inputs (e.g. concatenate queries) implicitly assume all inputs are the same format.

I propose this ugly patch (attached), which is not the cleanest solution, but it allows specifying an input source, while not breaking any other tool definition.

With it, I can specify the output definition as:
===
  <outputs>
    <data format="input:input1" name="output" metadata_source="input1" />
  </outputs>
===

And then the output dataset format will be copied from 'input1'.

Comments are welcomed,
 -gordon

Assaf Gordon

Assaf Gordon

tags

participants (1)