Determining datatype inheritance in tool XML Cheetah
Hi all, I've just uploaded a simple sequence composition tool to the Test Tool Shed: https://testtoolshed.g2.bx.psu.edu/view/peterjc/seq_composition https://github.com/peterjc/pico_galaxy/commit/45669446f5a14fd90a8a0d9d743049... This accepts multiple input in FASTA, FASTQ, or SFF format - and allows a mixture of these: <inputs> <param name="input_file" type="data" format="fasta,fastq,sff" multiple="true" label="Sequence file" help="FASTA, FASTQ, or SFF format." /> </inputs> In order to build the command line string, I am currently using this for loop: <command interpreter="python"> seq_composition.py -o "$output_file" ##For loop over inputs #for i in $input_file --$i.ext "${i}" #end for </command> This results in things like this being run: seq_composition.py -o XXX.dat --fastqsanger XXX.dat --sff XXX.dat This works, but means my Python script has to know about not just the core data types that I specified in my input parameter XML (fasta,fastq,sff) but also any subclasses (e.g. fastqsanger). It seems what I want/need would be something along these lines in pseudo-code to map any datatype which is a subclass for fastq to use a single command line option: <command interpreter="python"> seq_composition.py -o "$output_file" ##For loop over inputs #for i in $input_file #if isinstance($i.datatype, fastq): --fastq "${i}" #else --$i.ext "${i}" #end if #end for </command> This mock example borrows from the Python isinstance function, but of course some Galaxy datatypes are defined as subclasses at the XML level rather than literally at the Python class level. This should result in getting the following regardless of which flavour of FASTQ the input dataset had assigned: seq_composition.py -o XXX.dat --fastq XXX.dat --sff XXX.dat Does anyone have any Tool XML examples probing an input file's datatype in this way? Peter
Fun question! I have opened a pull request with my answer - https://bitbucket.org/galaxy/galaxy-central/pull-request/457/allow-cheetah-t.... There are three different hacks you can use right now... here is a diff against tools/filters/catWrapper.xml I was using the to test them - all of them require more about the internals of Galaxy then I really think should be exposed to the tool (or tool author). diff --git a/tools/filters/catWrapper.xml b/tools/filters/catWrapper.xml index ec52ba8..060362b 100644 --- a/tools/filters/catWrapper.xml +++ b/tools/filters/catWrapper.xml @@ -7,6 +7,11 @@ #for $q in $queries ${q.input2} #end for + #import galaxy.datatypes.sequence + ; echo "${isinstance($input1.datatype, galaxy.datatypes.sequence.Fastq )}" + ; echo "$input1.datatype.matches_any([galaxy.datatypes.sequence.Fastq])" + ; echo "$input1.datatype.matches_any([ $__app__.datatypes_registry.get_datatype_by_extension( 'fastq' )])" + ; echo "$input1.is_of_type( 'fastq' )" <!-- Doesn't work yet --> </command> <inputs> <param name="input1" type="data" label="Concatenate Dataset"/> I think the last variant of this is what you want though $input.is_of_type( ext ). You don't need to know the full module path to the parent type - you are referring to it using the same extension the rest of the tool uses and it doesn't require the use of $__app__ which... well we shouldn't be exposing to tools - it is not safe and is a hindrance to ensuring backward compatibility. Hope this helps. -John On Tue, Aug 12, 2014 at 11:53 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hi all,
I've just uploaded a simple sequence composition tool to the Test Tool Shed:
https://testtoolshed.g2.bx.psu.edu/view/peterjc/seq_composition https://github.com/peterjc/pico_galaxy/commit/45669446f5a14fd90a8a0d9d743049...
This accepts multiple input in FASTA, FASTQ, or SFF format - and allows a mixture of these:
<inputs> <param name="input_file" type="data" format="fasta,fastq,sff" multiple="true" label="Sequence file" help="FASTA, FASTQ, or SFF format." /> </inputs>
In order to build the command line string, I am currently using this for loop:
<command interpreter="python"> seq_composition.py -o "$output_file" ##For loop over inputs #for i in $input_file --$i.ext "${i}" #end for </command>
This results in things like this being run:
seq_composition.py -o XXX.dat --fastqsanger XXX.dat --sff XXX.dat
This works, but means my Python script has to know about not just the core data types that I specified in my input parameter XML (fasta,fastq,sff) but also any subclasses (e.g. fastqsanger).
It seems what I want/need would be something along these lines in pseudo-code to map any datatype which is a subclass for fastq to use a single command line option:
<command interpreter="python"> seq_composition.py -o "$output_file" ##For loop over inputs #for i in $input_file #if isinstance($i.datatype, fastq): --fastq "${i}" #else --$i.ext "${i}" #end if #end for </command>
This mock example borrows from the Python isinstance function, but of course some Galaxy datatypes are defined as subclasses at the XML level rather than literally at the Python class level.
This should result in getting the following regardless of which flavour of FASTQ the input dataset had assigned:
seq_composition.py -o XXX.dat --fastq XXX.dat --sff XXX.dat
Does anyone have any Tool XML examples probing an input file's datatype in this way?
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
On Tue, Aug 12, 2014 at 5:31 PM, John Chilton <jmchilton@gmail.com> wrote:
Fun question! I have opened a pull request with my answer - https://bitbucket.org/galaxy/galaxy-central/pull-request/457/allow-cheetah-t....
There are three different hacks you can use right now... here is a diff against tools/filters/catWrapper.xml I was using the to test them - all of them require more about the internals of Galaxy then I really think should be exposed to the tool (or tool author).
diff --git a/tools/filters/catWrapper.xml b/tools/filters/catWrapper.xml index ec52ba8..060362b 100644 --- a/tools/filters/catWrapper.xml +++ b/tools/filters/catWrapper.xml @@ -7,6 +7,11 @@ #for $q in $queries ${q.input2} #end for + #import galaxy.datatypes.sequence + ; echo "${isinstance($input1.datatype, galaxy.datatypes.sequence.Fastq )}" + ; echo "$input1.datatype.matches_any([galaxy.datatypes.sequence.Fastq])" + ; echo "$input1.datatype.matches_any([ $__app__.datatypes_registry.get_datatype_by_extension( 'fastq' )])" + ; echo "$input1.is_of_type( 'fastq' )" <!-- Doesn't work yet --> </command> <inputs> <param name="input1" type="data" label="Concatenate Dataset"/>
I think the last variant of this is what you want though $input.is_of_type( ext ). You don't need to know the full module path to the parent type - you are referring to it using the same extension the rest of the tool uses and it doesn't require the use of $__app__ which... well we shouldn't be exposing to tools - it is not safe and is a hindrance to ensuring backward compatibility.
Hope this helps.
That looks good John :) I had considered something like your first hack using isinstance, but much prefer your proposed $input.is_of_type(ext) solution :) Peter
participants (2)
-
John Chilton
-
Peter Cock