Using BWA to map without any mismathces
Hello, We have a sample containing several bacterial species and we want to uniquely map RNA-seq reads to the genomes of each of our organisms to get the expression patterns of each organism separately. We tried to use BWA in Galaxy with the “edit distance” (aln -n in the command line version) set to 0 but none of the reads were mapped (all had the SAM tag set to “4’). This is an artifact since running BLAST with some of the sequences showed that they have 100% identity to one of our genomes and not any others, so they should map uniquely. When running BWA with the number of mismatches set to between 1-5 >90% of our reads were mapped, and the number of mapped reads increased with the mismatch number so that seems to be working OK. Does the "aln -n" option really determine the number of mismatches? Any ideas why BWA will not run well in Galaxy using –n=0? Thanks Daniel --
Hi Daniel, Yes, "aln -n" is a type of mismatch parameter. Would you like to share a history so we can take a look at all exact settings and provide feedback? From the history panel (far right, top corner), click on the gear icon, select "Share or Publish" from the menu, then click on the share button (first one). Copy the link and send that back in an email to just me, not the entire list, to keep your data private. If you are running this on a local instance, maybe try to see if you can duplicate on the public Main server, both to rule out local install issues and to help with sharing. Small sample test/s that demonstrate the issue would be fine. http://usegalaxy.org I will watch for your email, Hopefully we can help! Jen Galaxy team On 3/2/13 11:44 AM, Daniel Sher wrote:
Hello, We have a sample containing several bacterial species and we want to uniquely map RNA-seq reads to the genomes of each of our organisms to get the expression patterns of each organism separately. We tried to use BWA in Galaxy with the “edit distance” (aln -n in the command line version) set to 0 but none of the reads were mapped (all had the SAM tag set to “4’). This is an artifact since running BLAST with some of the sequences showed that they have 100% identity to one of our genomes and not any others, so they should map uniquely.
When running BWA with the number of mismatches set to between 1-5 >90% of our reads were mapped, and the number of mapped reads increased with the mismatch number so that seems to be working OK.
Does the "aln -n" option really determine the number of mismatches? Any ideas why BWA will not run well in Galaxy using –n=0? Thanks Daniel
--
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
Hi Daniel, Thanks for reporting this issue - we have done some testing here and can duplicate your results. We are still reviewing options about how to address this, including Galaxy wrapper modifications and possibly more. The issue has to do with how certain variables are passed in the wrapper and interpreted in the command string. If you are interested, write me back to let me know and I can send you a link to the development ticket you can track (once created). Back to how to do this sort of analysis - yes, "aln -n" is a type of mismatch parameter. And ideally setting this would result in a mismatch-free alignment. Instead it returns no results: Maximum edit distance (aln -n): "0" and Fraction of missing alignments given 2% uniform base error rate (aln -n): "0" As a work-around, we have found that making the the Fraction variable very small achieves a close approximation of mismatch-free alignments on our test sets. This is by no means a guarantee, but pending future changes, these are the recommended form settings: Maximum edit distance (aln -n): "0" and Fraction of missing alignments given 2% uniform base error rate (aln -n): "0.00001" Thank you for your patience while we worked out exactly what was going on. Hopefully the temporary work-around will allow you to continue with your research, Jen Galaxy team On 3/2/13 11:44 AM, Daniel Sher wrote:
Hello, We have a sample containing several bacterial species and we want to uniquely map RNA-seq reads to the genomes of each of our organisms to get the expression patterns of each organism separately. We tried to use BWA in Galaxy with the "edit distance" (aln -n in the command line version) set to 0 but none of the reads were mapped (all had the SAM tag set to "4'). This is an artifact since running BLAST with some of the sequences showed that they have 100% identity to one of our genomes and not any others, so they should map uniquely.
When running BWA with the number of mismatches set to between 1-5 >90% of our reads were mapped, and the number of mapped reads increased with the mismatch number so that seems to be working OK.
Does the "aln -n" option really determine the number of mismatches? Any ideas why BWA will not run well in Galaxy using --n=0? Thanks Daniel
--
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
participants (2)
-
Daniel Sher
-
Jennifer Jackson