Howdy, Brant, The method you describe below is reasonable. You could potentially run into memory problems if you have too many reads, or if you have too many hits for a particular read (python dictionaries, worst case, contain three times as many hash-bins as keys). You could avoid this by only keeping a dictionary of read-name to best-hit. Depending on your scoring scheme, highest score may or may not be the right choice for "best". For example, we have used the number of matches as the score for such comparisons. Also, depending on your application, you might want the "best" hit only in those cases where it is significantly better than the second best hit. Bob H On Mar 25, 2010, at 4:42 PM, Brant Faircloth wrote:
On Mar 25, 2010, at 1:09 PM, rsharris@bx.psu.edu wrote:
Howdy, Brant,
hi bob,
I'm at a conference this week and my internet connectivity is spotty.
no problem, thanks for the fast response.
Short answer is "no", lastz doesn't provide a best-only filter. I am not sure what your input files are (one-to-one, many-to-many, one-to- many?).
it's basically many to many right now, as long my RAM holds and i don't go one-to-many.
For one-to-one you could use sort and head to solve this (at command line-- not sure specifically how you would do this in galaxy). For many-to-one, such as mapping a large number of reads to a genome, it's a little more difficult because you want to sort to occur separately per read.
i've basically been parsing the lastz output to build a dictionary like so:
{ read1: { score1:match1, score2:match2 },
read2: { score1:match1, } }
where score1…scoreN is an integer, then:
for read in matches: m = matches[read] bestMatch = m[max(m.keys())]
… do stuff …
this seems fast enough with a few tens of thousands of lines returned from lastz.
It isn't an overly difficult thing to do though. ... laptop battery going down. Gotta sign off.
i basically just wanted to see if i had missed anything and maybe give the old +1 for --bestmatch as a potential filter parameter in the future ; ). (given enough time, i'll poke around in the source and see what i can do with my meager C skills)
thanks again for the extremely rapid response! enjoy your conference...
best, b
< * ) (_ \\ _ ||