Peter, you requested an example, here are the first five hits for my first query sequence (OTU#0) 0 324034994 527 93.23 266 13 5 1 265 22 283 7e-102 379.0 0 56181650 513 93.26 267 10 8 1 265 25 285 7e-102 379.0 0 314913953 582 91.79 268 13 9 1 265 24 285 2e-92 347.0 0 305670062 281 92.52 254 14 5 4 256 32 281 2e-92 347.0 0 310814066 1180 91.73 266 14 7 1 265 24 282 9e-92 345.0 You will notice there are 13 columns, one in addition to the 12 column titles you explained. This is because there is a column between sseqID and pident. In the metagenomic tutorial the first 4 columns are explained, and column 3 is described as length of sequence in database (or length of the subject sequence). This is the problem column. The length of only one of the subject GI numbers above match the subject length in NCBI. This has caused me to wonder if I can trust the hit info. In all cases that I've checked, when this happens the correct match is the listed GI value minus 1 (ie, in NCBI, gi|324034994 is not 527nt long, but 324034993 IS 527nt long). On Mon, Apr 23, 2012 at 11:05 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Mon, Apr 23, 2012 at 5:56 PM, Sarah Hicks <garlicscape@gmail.com> wrote:
I am having trouble finding information on the MegaBLAST output columns. What is each column for? I can't seem to figure this out by comparing info in the columns to NCBI directly because the GI#'s don't match with the correct entry on NCBI. I've seen that others have posted about that problem, so I'm also waiting on details on that question, but for now, I'd just like to know what to make of the output... best, Sarah
I've not tried to track down this reported possible bug in GI numbers, and weather it also affects BLAST+ as well as the legacy NCBI BLAST (which has now been discontinued). Do you have a specific example.
As to the 12 columns, they are standard BLAST tabular output, and should match the defaults in BLAST+ tabular output which are:
Column NCBI name Description 1 qseqid Query Seq-id (ID of your sequence) 2 sseqid Subject Seq-id (ID of the database hit) 3 pident Percentage of identical matches 4 length Alignment length 5 mismatch Number of mismatches 6 gapopen Number of gap openings 7 qstart Start of alignment in query 8 qend End of alignment in query 9 sstart Start of alignment in subject (database hit) 10 send End of alignment in subject (database hit) 11 evalue Expectation value (E-value) 12 bitscore Bit score
Peter