Hi Galaxy users When Magablasting 1)....what does the "identity value -p" mean ...is it percent identity? I want my megablast results to be reported form only a 100% match. I do not see a place for % alinement concordance. 2) form my Illumina Hiseq reads, are the adaptor sequences filtered during the filter step? Scott tighe --2 Scott Tighe Advanced Genome Technology Lab Vermont Cancer Center at the University of Vermont 149 Beaumont Avenue Health Science Research Bd RM 305 Burlington Vermont USA 05405 lab 802-656-AGTC (2482) cell 802-999-6666
Hello Scott, For #1, option "-p": Here is a link to some megablast parameter documentation online: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/megablast.html#3 (the primary paper for the Galaxy tool is noted at the bottom of the tool form, but this is convenient) Quote: Table 3.30 Parameter -p Function Specifies the percentage identity cut-off Default 0 Input format [Real] Example To set percent id cutoff to 75%, use: -p 75 Note: The input value range is between 0 and 100, with 0 meaning no cutoff. It only works on the aligned region or individual HSPs. For #2, there are a few ways to interpret filter. If you mean will megablast consider the adapter part of the sequence in calculations, the answer is that it does for some and doesn't for others. The part of the sequence that is adapter wouldn't align to the genome, and percent identity is only based on HSPs (high scoring pairs - one part of the pair is the DNA query and the other is the genome target, for that alignment region only). So, adapter sequence wouldn't be involved in percent identify calculations (or be expected to!). But, these unaligned regions could become a problem if coverage or certain other statistics were part of your analysis. Learning about the statistics you choose to use, to see if query length is part of the calculation, will let you know if clipping is necessary. If important, removing adapters can be done with tools in "NGS: QC and manipulation" (perform a tool search on keywords "trim" or "clip". Best, Jen Galaxy team On 2/20/12 4:59 PM, Scott Tighe wrote:
Hi Galaxy users
When Magablasting
1)....what does the "identity value -p" mean ...is it percent identity? I want my megablast results to be reported form only a 100% match. I do not see a place for % alinement concordance. 2) form my Illumina Hiseq reads, are the adaptor sequences filtered during the filter step?
Scott tighe
--2 Scott Tighe Advanced Genome Technology Lab Vermont Cancer Center at the University of Vermont 149 Beaumont Avenue Health Science Research Bd RM 305 Burlington Vermont USA 05405 lab 802-656-AGTC (2482) cell 802-999-6666
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support
participants (2)
-
Jennifer Jackson
-
Scott Tighe