[galaxy-dev] NCBI Blast+ "BLAST XML to tabular" tool question

20 Nov 2013

      I'm doing a 1 step generic reporting tool along the lines of the "BLAST XML to tabular" script by Peter.  I was just about to ask about this line, which looked pretty much like a bug: 

	sallseqid = ";".join(name.split(None,1)[0] for name in hit_def.split(" >"))

Then I found the patch from Nov 7th 2013:

	https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/bl...

	try:
		sallseqid = ";".join(name.split(None,1)[0] for name in hit_def.split(" >"))
	except IndexError as e:
		stop_err("Problem splitting multuple hits?\n%r\n--> %s" % (hit_def, e))

Yay!  But what I've seen in recent XML output reports is that the ">" content has been changed to ">" .  E.g. 

	https://github.com/biopython/biopython/blob/master/Tests/Blast/mirna.xml

	<Hit>
		<Hit_num>66</Hit_num>
		<Hit_id>gi|195029385|ref|XR_047134.1|</Hit_id>
		<Hit_def>Drosophila grimshawi miR-7-RA (Dgri\mir-7), ncRNA >gi|195336156|ref|XR_048470.1| Drosophila sechellia miR-7-RA (Dsec\mir-7), ncRNA >gi|195585143|ref|XR_050309.1| Drosophila simulans miR-7-RA (Dsim\mir-7), ncRNA</Hit_def>
		<Hit_accession>XR_047134</Hit_accession>
		...

So perhaps a stop_err() could be avoided, if test is for ">" instead?  I assume that no variants of python ElementTree.iterparse() will unescape content when returned via the iterator?

Damion

[galaxy-dev] NCBI Blast+ "BLAST XML to tabular" tool question

Dooley, Damion