On Mon, Mar 12, 2012 at 6:28 PM, John Major <john.e.major.jr@gmail.com> wrote:
A small warning re-the current cloud-Blast+ config.
To properly use the metagenomic tools, if you use the blast+ galaxy tool, make sure to export in blast.XML, then you'll need a script to parse out the readID and the Hit_def (as the hit ID). It appears that the 'Hit_def' field contains the correct key to the taxonomy database. Specifically, the Hit_def field is in the format #_#, where the 'gi' id is the first #. The tabular (normal and extended) data does not contain this info.
I noticed this after attempting to use the tabular data, and using a trimmed col[1] (supposed to be hit seqID), but my results always came back as a ranked list of the most sequenced genomes in nt.... basically keying in randomly.
j
Hi John, Can you expand on that with a specific example (ideally on the galaxy-dev list, CC'd, since BLAST+ isn't event available on the public galaxy)? Also which version of BLAST+ are you using since I recall some changes to the tabular output IDs prior to 2.2.25 (which is what the wrappers were tested on, I've not tried 2.2.26 yet). Thanks, Peter