[galaxy-dev] Wrappers for NCBI BLAST+ blastdbcmd

9 May 2011

      Hi all,

I've done a couple of wrappers for the NCBI BLAST+
tool blastdbcmd. The NCBI BLAST+ tool blastdbcmd
replaces the NCBI legacy BLAST tool fastacmd.

The wrapper first lets you get a FASTA file of sequences
from a database by their ID (which works best if your
database was built with -parse_seqids), while the second
just shows a information about a database like number of
sequences and total length (human readable text).

Branch here:
https://bitbucket.org/peterjc/galaxy-central/src/blastdbcmd

Two files:
tools/ncbi_blast_plus/ncbi_blastdbcmd_info.xml
tools/ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml

Is anyone interested in helping to test these before I
ask the Galaxy team to merge them into the trunk?

Those of you familiar with the command line tool's
options will know you can use -entry all to get all the
sequences in the database. This is fine for a small
database (e.g. a single genome), but would be a
really bad idea for something like the NCBI NR
database. Currently there is no safety check for this
(but it could be done with a wrapper script that asks via
the -info switch how many sequences there are). Do
you think some defensive code is a good idea here,
e.g. a limit of 5000 sequences when "all" is used?

Thanks,

Peter

[galaxy-dev] Wrappers for NCBI BLAST+ blastdbcmd

Peter Cock