
On Thu, Nov 21, 2013 at 5:59 PM, Dooley, Damion <Damion.Dooley@bccdc.ca> wrote:
I hear you, re. guessing about data - it just sounded like this would be a rare case. Is it happening on particular database searches? Now that I look at it I'm wondering in what situation the IndexError would be triggered. I'm diving into the details here just because I don't want to discover later on there that I'd made some assumptions about the id parsing.
Yes, it is rare - but the fix was triggered by falling over the following example from a BLAST against the NR database, shown in the commit comment: https://github.com/peterjc/galaxy_blast/commit/5210af6622bf905ecb09ffbf6d7d3... <Hit> <Hit_num>146</Hit_num> <Hit_id>gi|157832142|pdb|1NKD|A</Hit_id> <Hit_def>Chain A, Atomic Resolution (1.07 Angstroms) Structure Of The Rop Mutant <2aa> >gi|157833740|pdb|1RPO|A Chain A, Restored Heptad Pattern Continuity Does Not Alter The Folding Of A 4- Alpha-Helical Bundle</Hit_def> <Hit_accession>1NKD_A</Hit_accession> <Hit_len>65</Hit_len> Spliting on just the greater than sign broke on the <2aa> comment. Splitting on space then greater than sign is slightly less fragile. Ideally this multi-entry field would be presented explicitly in the XML, something I suggested in passing on this related blog post: http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.... You can see the problem entry like this: $ blastdbcmd -entry 157832142 -db nr -outfmt "%t" Chain A, Atomic Resolution (1.07 Angstroms) Structure Of The Rop Mutant <2aa> Chain A, Restored Heptad Pattern Continuity Does Not Alter The Folding Of A 4- Alpha-Helical Bundle To see if there are any more naught entries in the NR database, I am trying this command (no output yet, might take a while though): $ time blastdbcmd -entry all -db nr -outfmt "%t" | grep ">" ... Regards, Peter