Tool Integration: SOAPaligner/soap2
Hello, I'm very new to Galaxy, and trying to use SOAPaligner/soap2 as a test integration case. soap2 includes two executables, 2bwt-builder and soap. 2bwt-builder takes a FASTA files and generates a set of 13 different index files, which soap needs in order to do it's alignment. I have started by just creating the tool XML configuration for 2bwt-builder. The configuration follows: <tool id="2bwt-builder" name="2bwt-Builder"> <description>build index files for the SOAPaligner/soap2</description> <command>2bwt-builder $input</command> <inputs> <param type="data" format="fasta" name="input" label="Source file"/> </inputs> <outputs> <data format="tabular" name=".amb Index File"/> <data format="tabular" name=".ann Index File"/> <data format="tabular" name=".bwt Index File"/> <data format="tabular" name=".fmv Index File"/> <data format="tabular" name=".hot Index File"/> <data format="tabular" name=".lkt Index File"/> <data format="tabular" name=".pac Index File"/> <data format="tabular" name=".rev.bwt Index File"/> <data format="tabular" name=".rev.fmv Index File"/> <data format="tabular" name=".rev.lkt Index File"/> <data format="tabular" name=".rev.pac Index File"/> <data format="tabular" name=".sa Index File"/> <data format="tabular" name=".sai Index File"/> </outputs> </tool> I've used the tabular data type for the output files, which I'm not sure is correct. When the script runs, it generates 13 output files in my history, but they are all empty according to galaxy. When I look at galaxy_dist/database/files/.../, the output files have been generated correctly and are non-empty. Where am I going wrong? Thank you in advance for any advice. -- Branden Timm System Administrator Great Lakes Bioenergy Research Center University of Wisconsin btimm@glbrc.wisc.edu
Hi Branden
Hello, I'm very new to Galaxy, and trying to use SOAPaligner/soap2 as a test integration case.
soap2 includes two executables, 2bwt-builder and soap. 2bwt-builder takes a FASTA files and generates a set of 13 different index files, which soap needs in order to do it's alignment.
I have started by just creating the tool XML configuration for 2bwt-builder. The configuration follows:
<tool id="2bwt-builder" name="2bwt-Builder"> <description>build index files for the SOAPaligner/soap2</description> <command>2bwt-builder $input</command>
the "command line" needs all output files listed, see: http://bitbucket.org/galaxy/galaxy-central/wiki/AddToolTutorial However, in your case: Do you really want to make an extra tool for the indexing step? Wouldn't it make more sense to have the indices pre-built for some genomes? Your soap galaxy tool can then re-use the indices again and again. This is also much more space efficient, as all the user share the same index files. Regards, Hans
<inputs> <param type="data" format="fasta" name="input" label="Source file"/> </inputs>
<outputs> <data format="tabular" name=".amb Index File"/> <data format="tabular" name=".ann Index File"/> <data format="tabular" name=".bwt Index File"/> <data format="tabular" name=".fmv Index File"/> <data format="tabular" name=".hot Index File"/> <data format="tabular" name=".lkt Index File"/> <data format="tabular" name=".pac Index File"/> <data format="tabular" name=".rev.bwt Index File"/> <data format="tabular" name=".rev.fmv Index File"/> <data format="tabular" name=".rev.lkt Index File"/> <data format="tabular" name=".rev.pac Index File"/> <data format="tabular" name=".sa Index File"/> <data format="tabular" name=".sai Index File"/> </outputs> </tool>
I've used the tabular data type for the output files, which I'm not sure is correct. When the script runs, it generates 13 output files in my history, but they are all empty according to galaxy. When I look at galaxy_dist/database/files/.../, the output files have been generated correctly and are non-empty.
Where am I going wrong? Thank you in advance for any advice.
-- Branden Timm System Administrator Great Lakes Bioenergy Research Center University of Wisconsin btimm@glbrc.wisc.edu _______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
Hi Hans, Thanks for the advice. I agree it does make more sense to generate the indexes once. Would I create those indexes in the same directory as the FASTA and follow the DataIntegration document here? http://bitbucket.org/galaxy/galaxy-central/wiki/DataIntegration Cheers -- Branden Timm btimm@glbrc.wisc.edu On 8/17/2010 10:06 AM, Hans-Rudolf Hotz wrote:
Hi Branden
Hello, I'm very new to Galaxy, and trying to use SOAPaligner/soap2 as a test integration case.
soap2 includes two executables, 2bwt-builder and soap. 2bwt-builder takes a FASTA files and generates a set of 13 different index files, which soap needs in order to do it's alignment.
I have started by just creating the tool XML configuration for 2bwt-builder. The configuration follows:
<tool id="2bwt-builder" name="2bwt-Builder"> <description>build index files for the SOAPaligner/soap2</description> <command>2bwt-builder $input</command>
the "command line" needs all output files listed, see: http://bitbucket.org/galaxy/galaxy-central/wiki/AddToolTutorial
However, in your case: Do you really want to make an extra tool for the indexing step? Wouldn't it make more sense to have the indices pre-built for some genomes?
Your soap galaxy tool can then re-use the indices again and again. This is also much more space efficient, as all the user share the same index files.
Regards, Hans
<inputs> <param type="data" format="fasta" name="input" label="Source file"/> </inputs>
<outputs> <data format="tabular" name=".amb Index File"/> <data format="tabular" name=".ann Index File"/> <data format="tabular" name=".bwt Index File"/> <data format="tabular" name=".fmv Index File"/> <data format="tabular" name=".hot Index File"/> <data format="tabular" name=".lkt Index File"/> <data format="tabular" name=".pac Index File"/> <data format="tabular" name=".rev.bwt Index File"/> <data format="tabular" name=".rev.fmv Index File"/> <data format="tabular" name=".rev.lkt Index File"/> <data format="tabular" name=".rev.pac Index File"/> <data format="tabular" name=".sa Index File"/> <data format="tabular" name=".sai Index File"/> </outputs> </tool>
I've used the tabular data type for the output files, which I'm not sure is correct. When the script runs, it generates 13 output files in my history, but they are all empty according to galaxy. When I look at galaxy_dist/database/files/.../, the output files have been generated correctly and are non-empty.
Where am I going wrong? Thank you in advance for any advice.
-- Branden Timm System Administrator Great Lakes Bioenergy Research Center University of Wisconsin btimm@glbrc.wisc.edu _______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
Hi Branden, If you look at some of the NGS tools (see SAM-to-BAM for instance, because it's simple) you can see the way we have handled this, which is to offer several pre-built indexes but also give the user the option to use a fasta file for which there is no pre-built index, through the use of conditionals. So there is definitely no need to make an entirely separate tool for the indexing. Also, note that handling files used for input/output/indexes for external tools can sometimes be tricky, if those tools expect the files to have particular extensions. SAM-to-BAM also handles this situation with some temp file renaming trickery. For the pre-built indexes, once you have them built, have a look at the DataIntegration wiki page, but http://bitbucket.org/galaxy/galaxy-central/wiki/NGSLocalSetup will also be helpful to you (start with the "Setting Up the Reference Genomes for NGS Tools" section). Let us know if you run into any more issues. Regards, Kelly Galaxy Team On Aug 17, 2010, at 11:35 AM, Branden Timm wrote:
Hi Hans, Thanks for the advice. I agree it does make more sense to generate the indexes once. Would I create those indexes in the same directory as the FASTA and follow the DataIntegration document here? http://bitbucket.org/galaxy/galaxy-central/wiki/DataIntegration
Cheers
-- Branden Timm btimm@glbrc.wisc.edu
On 8/17/2010 10:06 AM, Hans-Rudolf Hotz wrote:
Hi Branden
Hello, I'm very new to Galaxy, and trying to use SOAPaligner/soap2 as a test integration case.
soap2 includes two executables, 2bwt-builder and soap. 2bwt-builder takes a FASTA files and generates a set of 13 different index files, which soap needs in order to do it's alignment.
I have started by just creating the tool XML configuration for 2bwt-builder. The configuration follows:
<tool id="2bwt-builder" name="2bwt-Builder"> <description>build index files for the SOAPaligner/soap2</ description> <command>2bwt-builder $input</command>
the "command line" needs all output files listed, see: http://bitbucket.org/galaxy/galaxy-central/wiki/AddToolTutorial
However, in your case: Do you really want to make an extra tool for the indexing step? Wouldn't it make more sense to have the indices pre-built for some genomes?
Your soap galaxy tool can then re-use the indices again and again. This is also much more space efficient, as all the user share the same index files.
Regards, Hans
<inputs> <param type="data" format="fasta" name="input" label="Source file"/> </inputs>
<outputs> <data format="tabular" name=".amb Index File"/> <data format="tabular" name=".ann Index File"/> <data format="tabular" name=".bwt Index File"/> <data format="tabular" name=".fmv Index File"/> <data format="tabular" name=".hot Index File"/> <data format="tabular" name=".lkt Index File"/> <data format="tabular" name=".pac Index File"/> <data format="tabular" name=".rev.bwt Index File"/> <data format="tabular" name=".rev.fmv Index File"/> <data format="tabular" name=".rev.lkt Index File"/> <data format="tabular" name=".rev.pac Index File"/> <data format="tabular" name=".sa Index File"/> <data format="tabular" name=".sai Index File"/> </outputs> </tool>
I've used the tabular data type for the output files, which I'm not sure is correct. When the script runs, it generates 13 output files in my history, but they are all empty according to galaxy. When I look at galaxy_dist/database/files/.../, the output files have been generated correctly and are non-empty.
Where am I going wrong? Thank you in advance for any advice.
-- Branden Timm System Administrator Great Lakes Bioenergy Research Center University of Wisconsin btimm@glbrc.wisc.edu _______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
On 08/17/2010 05:35 PM, Branden Timm wrote:
Hi Hans, Thanks for the advice. I agree it does make more sense to generate the indexes once. Would I create those indexes in the same directory as the FASTA and follow the DataIntegration document here? http://bitbucket.org/galaxy/galaxy-central/wiki/DataIntegration
As you write your own tool, you can put them wherever you want (as long as the files are accessible to your Galaxy instance, and you provide the correct path in the command line. Eventually, once it is working, you might want to set up a "soap_index.loc" file, listing all your genomes (ie indices for the individual genomes) and write a wrapper for the soap binary. As an example you might want to look at megablast_wrapper.py. Hans
Cheers
-- Branden Timm btimm@glbrc.wisc.edu
On 8/17/2010 10:06 AM, Hans-Rudolf Hotz wrote:
Hi Branden
Hello, I'm very new to Galaxy, and trying to use SOAPaligner/soap2 as a test integration case.
soap2 includes two executables, 2bwt-builder and soap. 2bwt-builder takes a FASTA files and generates a set of 13 different index files, which soap needs in order to do it's alignment.
I have started by just creating the tool XML configuration for 2bwt-builder. The configuration follows:
<tool id="2bwt-builder" name="2bwt-Builder"> <description>build index files for the SOAPaligner/soap2</description> <command>2bwt-builder $input</command>
the "command line" needs all output files listed, see: http://bitbucket.org/galaxy/galaxy-central/wiki/AddToolTutorial
However, in your case: Do you really want to make an extra tool for the indexing step? Wouldn't it make more sense to have the indices pre-built for some genomes?
Your soap galaxy tool can then re-use the indices again and again. This is also much more space efficient, as all the user share the same index files.
Regards, Hans
<inputs> <param type="data" format="fasta" name="input" label="Source file"/> </inputs>
<outputs> <data format="tabular" name=".amb Index File"/> <data format="tabular" name=".ann Index File"/> <data format="tabular" name=".bwt Index File"/> <data format="tabular" name=".fmv Index File"/> <data format="tabular" name=".hot Index File"/> <data format="tabular" name=".lkt Index File"/> <data format="tabular" name=".pac Index File"/> <data format="tabular" name=".rev.bwt Index File"/> <data format="tabular" name=".rev.fmv Index File"/> <data format="tabular" name=".rev.lkt Index File"/> <data format="tabular" name=".rev.pac Index File"/> <data format="tabular" name=".sa Index File"/> <data format="tabular" name=".sai Index File"/> </outputs> </tool>
I've used the tabular data type for the output files, which I'm not sure is correct. When the script runs, it generates 13 output files in my history, but they are all empty according to galaxy. When I look at galaxy_dist/database/files/.../, the output files have been generated correctly and are non-empty.
Where am I going wrong? Thank you in advance for any advice.
-- Branden Timm System Administrator Great Lakes Bioenergy Research Center University of Wisconsin btimm@glbrc.wisc.edu _______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
Just a follow up here ... if the <command> tag needs to list all output files, but the tool itself does not specify output files on the command line, does that mean I need to write a wrapper for my tool that accepts each output file as a command-line parameter? Currently I've tried specifying all of the output files on the <command> line, where each variable corresponds to a <data> line under outputs, but then SOAPaligner/soap2 tries to read those non-existent files as FASTA files. As a more general question, is there a way to see the exact command(s) that Galaxy is dispatching on my behalf when I execute a tool? I checked paster.log but it wasn't there. It seems like that would be a great debugging feature. Branden On Aug 17, 2010, at 10:06 AM, Hans-Rudolf Hotz wrote:
Hi Branden
Hello, I'm very new to Galaxy, and trying to use SOAPaligner/soap2 as a test integration case.
soap2 includes two executables, 2bwt-builder and soap. 2bwt-builder takes a FASTA files and generates a set of 13 different index files, which soap needs in order to do it's alignment.
I have started by just creating the tool XML configuration for 2bwt-builder. The configuration follows:
<tool id="2bwt-builder" name="2bwt-Builder"> <description>build index files for the SOAPaligner/soap2</description> <command>2bwt-builder $input</command>
the "command line" needs all output files listed, see: http://bitbucket.org/galaxy/galaxy-central/wiki/AddToolTutorial
However, in your case: Do you really want to make an extra tool for the indexing step? Wouldn't it make more sense to have the indices pre-built for some genomes?
Your soap galaxy tool can then re-use the indices again and again. This is also much more space efficient, as all the user share the same index files.
Regards, Hans
<inputs> <param type="data" format="fasta" name="input" label="Source file"/> </inputs>
<outputs> <data format="tabular" name=".amb Index File"/> <data format="tabular" name=".ann Index File"/> <data format="tabular" name=".bwt Index File"/> <data format="tabular" name=".fmv Index File"/> <data format="tabular" name=".hot Index File"/> <data format="tabular" name=".lkt Index File"/> <data format="tabular" name=".pac Index File"/> <data format="tabular" name=".rev.bwt Index File"/> <data format="tabular" name=".rev.fmv Index File"/> <data format="tabular" name=".rev.lkt Index File"/> <data format="tabular" name=".rev.pac Index File"/> <data format="tabular" name=".sa Index File"/> <data format="tabular" name=".sai Index File"/> </outputs> </tool>
I've used the tabular data type for the output files, which I'm not sure is correct. When the script runs, it generates 13 output files in my history, but they are all empty according to galaxy. When I look at galaxy_dist/database/files/.../, the output files have been generated correctly and are non-empty.
Where am I going wrong? Thank you in advance for any advice.
-- Branden Timm System Administrator Great Lakes Bioenergy Research Center University of Wisconsin btimm@glbrc.wisc.edu _______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
participants (3)
-
Branden Timm
-
Hans-Rudolf Hotz
-
Kelly Vincent