Hello, I recently figured out how to filter the output bwa SAM file for flag type in order to determine the number of reads that were successfully mapped. My question is, I previously thought I could generate a pileup, then sum the number of counts for each base of the reference, and divide this number by the length of the reads (not exact if some reads are hanging over the edge of the reference sequence, but should be close?). But when I compare these methods, I get drastically different results. What am I missing? Thanks, Austin
Just to emphasize the point, the difference between the two methods is an order of magnitude (for a particular example, filtering for flag type gives around 60% of reads mapping, but estimating mapped reads from pileup gave ~5% of reads mapped). Thanks, Austin On Tue, Sep 6, 2011 at 8:40 AM, Austin Paul <austinpa@usc.edu> wrote:
Hello,
I recently figured out how to filter the output bwa SAM file for flag type in order to determine the number of reads that were successfully mapped. My question is, I previously thought I could generate a pileup, then sum the number of counts for each base of the reference, and divide this number by the length of the reads (not exact if some reads are hanging over the edge of the reference sequence, but should be close?). But when I compare these methods, I get drastically different results. What am I missing?
Thanks, Austin
Hi Austin, As you mention, the difference could be related to how much of the query sequence is actually aligned and passes summary filters. It all depends on the query data, reference genome, and the parameters used for alignment and summary tools. If you do think that a problem exists with a SAMTools utility itself, or have a detailed question about the exact algorithm used, you may want to send an email to the tool authors and see what they think. http://samtools.sourceforge.net/ Thanks! Jen Galaxy team On 9/6/11 1:02 PM, Austin Paul wrote:
Just to emphasize the point, the difference between the two methods is an order of magnitude (for a particular example, filtering for flag type gives around 60% of reads mapping, but estimating mapped reads from pileup gave ~5% of reads mapped).
Thanks, Austin
On Tue, Sep 6, 2011 at 8:40 AM, Austin Paul <austinpa@usc.edu <mailto:austinpa@usc.edu>> wrote:
Hello,
I recently figured out how to filter the output bwa SAM file for flag type in order to determine the number of reads that were successfully mapped. My question is, I previously thought I could generate a pileup, then sum the number of counts for each base of the reference, and divide this number by the length of the reads (not exact if some reads are hanging over the edge of the reference sequence, but should be close?). But when I compare these methods, I get drastically different results. What am I missing?
Thanks, Austin
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/Support
participants (2)
-
Austin Paul
-
Jennifer Jackson