Hi, I used the Megablast function (in the NGS: Mapping\ROCHE-454\) to analyze my FASTA sequences against nt database and it worked fine for me. However, it generated 56,804 hits although my query has only 1000 sequences. I am wondering is there any way to suppress the number of reported alignments to just one best hit per sequence? (In the local BLAST there are parameters such as -K1 -v 1 -b 1 to do so, but I can't find similar options in Galaxy). Many thanks! Sam
Hello Sam, When running Megablast, filtering by identity or evalue can help reduce the hits (the default values are all fairly permissive, if you are performing the query vs the same species target genome and the query has been filtered for base calling quality). Filtering out low-complexity would also be a big help, as a guess, considering the number of hits generated from your initial data. There is also the "Parse blast XML output" tool. Modifying the data into interval format would allow the use of the "Operate on Genomic Intervals -> Cluster the intervals of a dataset". This is based on coverage, if that is one of your criteria (could be, if the threshold for identity is a range you consider to be candidate choices for "best"). Identity & coverage are commonly combined to identify "best", but this is just a suggestion. The same type of logic could be used with top scoring evalue matches combined with coverage (would likely be similar as using evalue alone, if the identity is set to be high). The idea to add a filter for "single best" is a good one, but has some complexity associated with it. I will pass it along to the team as an enhancement request to consider. Hopefully this helps! Jen Galaxy team On 4/11/11 1:43 PM, Hsin-l (Sam) Chiang wrote:
Hi,
I used the Megablast function (in the NGS: Mapping\ROCHE-454\) to analyze my FASTA sequences against nt database and it worked fine for me. However, it generated 56,804 hits although my query has only 1000 sequences. I am wondering is there any way to suppress the number of reported alignments to just one best hit per sequence? (In the local BLAST there are parameters such as -K1 -v 1 -b 1 to do so, but I can't find similar options in Galaxy).
Many thanks!
Sam ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org
participants (2)
-
Hsin-l (Sam) Chiang
-
Jennifer Jackson