Begin forwarded message:

From: Kelly Vincent <kpvincent@bx.psu.edu>
Date: August 18, 2010 11:46:32 AM EDT
To: btimm@wisc.edu
Cc: galaxy-user@bx.psu.edu
Subject: Re: [galaxy-user] [galaxy-dev] Tool Integration: SOAPaligner/soap2

Branden,

Are you currently calling a script file (Python Perl, etc.) in your command tag, or calling the tool directly? If you are running it only with an XML file, you cannot do what you want here. The command that will be run is generated by converting the variable values you put in the <command> tag to the names of the actual input and output files that Galaxy creates when the tool is run.

However, this is not particularly difficult if you are calling a script in the <command> tag, though it's a little convoluted. There are several tools that deal with this situation (the NGS tools again, in particular). There are a couple of basic situations. If you just need to have a single file with a certain extension, simply create a temp file with the extension specified (in Python "tempfile.mkstemp( suffix='.ext' )" will do it). It's slightly more complicated if you need associated files in the same directory (for instance, Samtools expects a fasta file like "hg19.fa" plus its index "hg19.fa.fai" to be in the same place. When you call the indexing command with the name of the fasta file, it automatically creates the index file with that name plus ".fai"). In this case, you should symlink the original file to a temp file (with or without a specified extension, as appropriate) and then get the name of that temp file. The name of the other file is that name PLUS the appropriate extension. For example, sam_to_bam.py does this in lines 73 through 76.

Note that the reason for symlinking to a temp file is that otherwise the tool will create a new file in a database/files subdirectory, which is not the right way to do things. So in the Samtools case, if the uploaded fasta file is "database/files/000/dataset_23.dat", a new file "database/files/000/dataset_23.dat.fai" will be created, but the database won't know about it and it will not be deleted after the tool finishes (but it still won't be available to future use, either).

The paster.log contains the command as generated by the <command> tag, but it's not possible to print to that log from a tool script. You can simply print the relevant command to sdtout or stderr from the script and when you run it in the browser, you will see it in the info section.

Hopefully this makes sense--let us know if you need further assistance.

Regards,
Kelly


On Aug 17, 2010, at 1:21 PM, Branden Timm wrote:

Just a follow up here ... if the <command> tag needs to list all output files, but the tool itself does not specify output files on the command line, does that mean I need to write a wrapper for my tool that accepts each output file as a command-line parameter?

Currently I've tried specifying all of the output files on the <command> line, where each variable corresponds to a <data> line under outputs, but then SOAPaligner/soap2 tries to read those non-existent files as FASTA files.

As a more general question, is there a way to see the exact command(s) that Galaxy is dispatching on my behalf when I execute a tool?  I checked paster.log but it wasn't there.   It seems like that would be a great debugging feature.

Branden

On Aug 17, 2010, at 10:06 AM, Hans-Rudolf Hotz wrote:

Hi Branden

Hello,
I'm very new to Galaxy, and trying to use SOAPaligner/soap2 as a test
integration case.

soap2 includes two executables, 2bwt-builder and soap. 2bwt-builder
takes a FASTA files and generates a set of 13 different index files,
which soap needs in order to do it's alignment.

I have started by just creating the tool XML configuration for
2bwt-builder. The configuration follows:

<tool id="2bwt-builder" name="2bwt-Builder">
<description>build index files for the SOAPaligner/soap2</description>
<command>2bwt-builder $input</command>

the "command line" needs all output files listed, see:
http://bitbucket.org/galaxy/galaxy-central/wiki/AddToolTutorial


However, in your case: Do you really want to make an extra tool for the indexing step? Wouldn't it make more sense to have the indices pre-built for some genomes?

Your soap galaxy tool can then re-use the indices again and again. This is also much more space efficient, as all the user share the same index files.

Regards, Hans


<inputs>
<param type="data" format="fasta" name="input" label="Source file"/>
</inputs>

<outputs>
<data format="tabular" name=".amb Index File"/>
<data format="tabular" name=".ann Index File"/>
<data format="tabular" name=".bwt Index File"/>
<data format="tabular" name=".fmv Index File"/>
<data format="tabular" name=".hot Index File"/>
<data format="tabular" name=".lkt Index File"/>
<data format="tabular" name=".pac Index File"/>
<data format="tabular" name=".rev.bwt Index File"/>
<data format="tabular" name=".rev.fmv Index File"/>
<data format="tabular" name=".rev.lkt Index File"/>
<data format="tabular" name=".rev.pac Index File"/>
<data format="tabular" name=".sa Index File"/>
<data format="tabular" name=".sai Index File"/>
</outputs>
</tool>

I've used the tabular data type for the output files, which I'm not sure
is correct. When the script runs, it generates 13 output files in my
history, but they are all empty according to galaxy. When I look at
galaxy_dist/database/files/.../, the output files have been generated
correctly and are non-empty.

Where am I going wrong? Thank you in advance for any advice.

--
Branden Timm
System Administrator
Great Lakes Bioenergy Research Center
University of Wisconsin
btimm@glbrc.wisc.edu
_______________________________________________
galaxy-dev mailing list
galaxy-dev@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-dev

_______________________________________________
galaxy-dev mailing list
galaxy-dev@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-dev

_______________________________________________
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user