LASTZ: Controlling Length of Hits
We're trying to only get hits of certain lengths. Is there a setting to use that sets the minimum length for each hit?
On Mar 21, 2011, at 1:45 PM, JASON G. BANKERT wrote:
We're trying to only get hits of certain lengths. Is there a setting to use that sets the minimum length for each hit?
Howdy, Jason, Lastz (the underlying program) has some options that are geared toward filtering by length, though none uses length exactly. In the lastz wrapper for galaxy, the only length-relevant filtering option is "Do not report matches that cover less than this percentage of each read". If your reads are all the same length, or close to the same length, this could meet your needs. If the length distribution of your reads is pretty wide (as can occur with 454), then probably not. I'm not familiar with all the rest of the galaxy toolset, but it seems like there's bound to be a tool that can compute interval length from the interval's start and end, and then filter on that. Bob H
Hi there is an example in the windshield splatter analysis using galaxy where for metagenomics they filter their data on hit length related to the initial individual subject sequence length in megablast. In simple steps (from memory so don't shoot me :) ) all in galaxy assuming fasta input; 1) upload fasta 2) compute sequence lengths on 1 3) on set 1 perform megablast (or whatever) that give a hit length 4) combine 2 and 3 on basis of unique seqname 5) use the filter tool to filter on hitlngth collumn divided by original length collumn (in the example > 50% hitlength) 6) strip additional collumns of length to return a valid megablast or lastZ file.... You can save the history as a workflow for repetive use. something like this you were looking for? The video is in the screencasts sections using 454 data and megabast...but it looks similar to your question... Alex ________________________________________ Van: galaxy-user-bounces@lists.bx.psu.edu [galaxy-user-bounces@lists.bx.psu.edu] namens Bob Harris [rsharris@bx.psu.edu] Verzonden: maandag 21 maart 2011 22:04 Aan: JASON G. BANKERT CC: galaxy-user@bx.psu.edu Onderwerp: Re: [galaxy-user] LASTZ: Controlling Length of Hits On Mar 21, 2011, at 1:45 PM, JASON G. BANKERT wrote:
We're trying to only get hits of certain lengths. Is there a setting to use that sets the minimum length for each hit?
Howdy, Jason, Lastz (the underlying program) has some options that are geared toward filtering by length, though none uses length exactly. In the lastz wrapper for galaxy, the only length-relevant filtering option is "Do not report matches that cover less than this percentage of each read". If your reads are all the same length, or close to the same length, this could meet your needs. If the length distribution of your reads is pretty wide (as can occur with 454), then probably not. I'm not familiar with all the rest of the galaxy toolset, but it seems like there's bound to be a tool that can compute interval length from the interval's start and end, and then filter on that. Bob H ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
On Mar 21, 2011, at 1:45 PM, JASON G. BANKERT wrote:
We're trying to only get hits of certain lengths. Is there a setting to use that sets the minimum length for each hit?
The short answer is no, but I expect there are other tools in galaxy that could do that filtering. There are two reasons lastz doesn't provide filtering based on length. First, there are three possible interpretations of what "length" is, all equally valid. Should it be the length of the hit in the reference, or in the read? Or should it be the number of positions in the alignment? Second, even if there is no difference in the three lengths, length is a poorer discriminator than the number of matches. For example, a strict length cutoff of 100 would reject a exact match of length 99 but keep a 90-match-10-mismatch hit. I'm not familiar enough with galaxy to give you specific details of how to filter by length. But if you choose tabular output from lastz you should be able to use galaxy's "text manipulation" tools to compute the length, then one of the "filter and sort" tools to discard short alignments. Or, if you are using SAM output, it looks like you could use "convert SAM to interval" in the "NGS: SAM Tools" group, then compute the length and filter as above. Hope that is helpful, Bob H
participants (3)
-
Bob Harris
-
Bossers, Alex
-
JASON G. BANKERT