Hello Scott,
For #1, option "-p":
Here is a link to some megablast parameter documentation online:
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/megablast.html#3
(the primary paper for the Galaxy tool is noted at the bottom of the
tool form, but this is convenient)
Quote:
Table 3.30 Parameter -p
Function Specifies the percentage identity cut-off
Default 0
Input format [Real]
Example To set percent id cutoff to 75%, use: -p 75
Note: The input value range is between 0 and 100, with 0 meaning no
cutoff. It only works on the aligned region or individual HSPs.
For #2, there are a few ways to interpret filter. If you mean will
megablast consider the adapter part of the sequence in calculations, the
answer is that it does for some and doesn't for others. The part of the
sequence that is adapter wouldn't align to the genome, and percent
identity is only based on HSPs (high scoring pairs - one part of the
pair is the DNA query and the other is the genome target, for that
alignment region only). So, adapter sequence wouldn't be involved in
percent identify calculations (or be expected to!). But, these unaligned
regions could become a problem if coverage or certain other statistics
were part of your analysis. Learning about the statistics you choose to
use, to see if query length is part of the calculation, will let you
know if clipping is necessary. If important, removing adapters can be
done with tools in "NGS: QC and manipulation" (perform a tool search on
keywords "trim" or "clip".
Best,
Jen
Galaxy team
On 2/20/12 4:59 PM, Scott Tighe wrote:
Hi Galaxy users
When Magablasting
1)....what does the "identity value -p" mean ...is it percent identity?
I want my megablast results to be reported form only a 100% match. I do
not see a place for % alinement concordance.
2) form my Illumina Hiseq reads, are the adaptor sequences filtered
during the filter step?
Scott tighe
--2
Scott Tighe
Advanced Genome Technology Lab
Vermont Cancer Center at the University of Vermont
149 Beaumont Avenue
Health Science Research Bd RM 305
Burlington Vermont USA 05405
lab 802-656-AGTC (2482)
cell 802-999-6666
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at
usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support