Galaxy Colleagues,
I don't know who is maintaining the Galaxy wiki page at http://bitbucket.org/galaxy/galaxy-central/wiki/NGSLocalSetup but I noticed that the Python script under the Megablast instructions has an error: the "defline" operation after the "line.startswith" should be moved *after* the if length > 0 statement, otherwise, the defline is reset incorrectly before the previous sequence is written out. This results in a frameshift in the FASTA header line identifiers (i.e. the current sequence gets the next sequence identifier).
I've commented out the erroneous defline below and added the right one:
import sysWhile on the topic of this page, perhaps the software versions need to be revisited. Megablast has been superseded already by Blast+. Perhaps new releases of Galaxy should update this?
length = 0
defline = ''
seq = []
for line in sys.stdin :
line = line.rstrip( '\r\n' )
if line.startswith( '>' ):
# defline = line.split( "|" )[1] # defline should NOT be here
if length > 0:
print ">%s_%s" % ( defline, length )
print "\n".join( seq )
length = 0
seq = []
defline = line.split( "|" )[1] # defline should be here
else:
seq.append( line )
length += len( line )
print ">%s_%s" % ( defline, length )
print "\n".join( seq )