Richard:

This beauty was mine. Thanks for pointing this out. It is now fixed.

Thanks,

anton


On Dec 9, 2010, at 10:04 PM, Richard Bruskiewich wrote:

Galaxy Colleagues,

I don't know who is maintaining the Galaxy wiki page at http://bitbucket.org/galaxy/galaxy-central/wiki/NGSLocalSetup but I noticed that the Python script under the Megablast instructions has an error: the "defline" operation after the "line.startswith" should be moved *after* the if length > 0 statement, otherwise, the defline is reset incorrectly before the previous sequence is written out. This results in a frameshift in the FASTA header line identifiers (i.e. the current sequence gets the next sequence identifier).

I've commented out the erroneous defline below and added the right one:
import sys

length = 0
defline = ''
seq = []

for line in sys.stdin :
line = line.rstrip( '\r\n' )
if line.startswith( '>' ):
# defline = line.split( "|" )[1] # defline should NOT be here
if length > 0:
print ">%s_%s" % ( defline, length )
print "\n".join( seq )
length = 0
seq = []
defline = line.split( "|" )[1] # defline should be here

else:
seq.append( line )
length += len( line )

print ">%s_%s" % ( defline, length )
print "\n".join( seq )
While on the topic of this page, perhaps the software versions need to be revisited. Megablast has been superseded already by Blast+. Perhaps new releases of Galaxy should update this?

BTW, when is the new Galaxy release (cloud man AMI too...) coming out? I heard rumors that it was due this week.

Cheers
Richard Bruskiewich

--
Richard Bruskiewich, PhD
Senior Scientist, Computational and Systems Biology
Applications Team for Computational Genomics
T.T. Chang Genetic Resources Center
International Rice Research Institute

_______________________________________________
galaxy-dev mailing list
galaxy-dev@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-dev

Anton Nekrutenko
http://nekrut.bx.psu.edu
http://usegalaxy.org