Are reads of 36nt in length long enough to accutatly map on splicing junctions?
Hi All, I have a very basic question. I have RNA-seq datasets of several cell types and want to compare the alternative splicing events between cell types. The reads are 36nt in length. Are these reads long enough to map on the splicing jucntions accurately when I run Tophat with stringent parameters (no mismatch)? Thanks. Best, Jianguang Du
36bp reads will map across splice junctions but at a relatively low rate; you can try changing segment length to get better mapping, but you'll want to evaluate the results carefully to ensure that you're getting good results. Good luck, J. On Apr 8, 2013, at 5:45 PM, Du, Jianguang wrote:
Hi All, I have a very basic question. I have RNA-seq datasets of several cell types and want to compare the alternative splicing events between cell types. The reads are 36nt in length. Are these reads long enough to map on the splicing jucntions accurately when I run Tophat with stringent parameters (no mismatch)? Thanks. Best, Jianguang Du
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at:
Hi Jeremy, Thank you for the information. In addition to reducing the "the Minimum length of reas segments", do I also need to reduce "Anchor length" to get more mapping on splicing junctins? Looks like the setting for "Anchor length" only affects the number of mapped splicing junctions reported in the .splicing junctions output. Is my understanding correct? Does the "regions" mean the number of mapped splicing junctions? Thanks. Best, Jianguang ________________________________ From: Jeremy Goecks [jeremy.goecks@emory.edu] Sent: Tuesday, April 09, 2013 9:03 AM To: Du, Jianguang Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions? 36bp reads will map across splice junctions but at a relatively low rate; you can try changing segment length to get better mapping, but you'll want to evaluate the results carefully to ensure that you're getting good results. Good luck, J. On Apr 8, 2013, at 5:45 PM, Du, Jianguang wrote: Hi All, I have a very basic question. I have RNA-seq datasets of several cell types and want to compare the alternative splicing events between cell types. The reads are 36nt in length. Are these reads long enough to map on the splicing jucntions accurately when I run Tophat with stringent parameters (no mismatch)? Thanks. Best, Jianguang Du ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org<http://usegalaxy.org/>. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
In addition to reducing the "the Minimum length of reas segments", do I also need to reduce "Anchor length" to get more mapping on splicing junctins?
Definitely worth a try.
Looks like the setting for "Anchor length" only affects the number of mapped splicing junctions reported in the .splicing junctions output. Is my understanding correct?
No, it will affect mapped reads as well.
Does the "regions" mean the number of mapped splicing junctions?
Yes. Best, J.
Hi Jeremy, Thank you very much for the reply. I have some more questions of the same topic. 1) My reads are 36nt long. How much should I set for "the Minimum length of reads segments" to get the most reliable output with the highest mapping of splicing junctions?. In my previous run of TopHat, I set it as 18. Can I reduce it more to get better mapping on splicing junctions? 2) I do not understand exactly how TopHat works as for the "Anchor length" although I have read the manual for TopHat. Suppose I set the "Anchor length as 8 and the "Maximum number of mismatch that can appear in the anchor region of spliced alignment" as 0 when I run Tophat. Does it mean, for a read maps on two adjacent exons, TopHat will report this alignment to the outputs ".accepted hits" and ".splicing junctions" if either end of the read has 8 or more nucleotides mapping on one exon? 3) Is there disadvantage/negative effect if I choose to set the "Anchor length" at the lowest, for example 3? My understanding is that, under the 0 mismatch condition, if 3 nuceoides of one end of a read mapped on one exon, the other part of the read will map on the adjacent exon (in my case, the other part would be 33 nucleotides). So my understanding is that setting the "Anchor length" at 3 does not increase the inaccuracy of the alignment. Am I correct? Best, Jianguang ________________________________ From: Jeremy Goecks [jeremy.goecks@emory.edu] Sent: Tuesday, April 09, 2013 1:57 PM To: Du, Jianguang Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions? In addition to reducing the "the Minimum length of reas segments", do I also need to reduce "Anchor length" to get more mapping on splicing junctins? Definitely worth a try. Looks like the setting for "Anchor length" only affects the number of mapped splicing junctions reported in the .splicing junctions output. Is my understanding correct? No, it will affect mapped reads as well. Does the "regions" mean the number of mapped splicing junctions? Yes. Best, J.
1) My reads are 36nt long. How much should I set for "the Minimum length of reads segments" to get the most reliable output with the highest mapping of splicing junctions?. In my previous run of TopHat, I set it as 18. Can I reduce it more to get better mapping on splicing junctions?
You'll need to define for yourself what you mean by "better/best mapping" and experiment to find the parameters that give you the best results.
2) I do not understand exactly how TopHat works as for the "Anchor length" although I have read the manual for TopHat. Suppose I set the "Anchor length as 8 and the "Maximum number of mismatch that can appear in the anchor region of spliced alignment" as 0 when I run Tophat. Does it mean, for a read maps on two adjacent exons, TopHat will report this alignment to the outputs ".accepted hits" and ".splicing junctions" if either end of the read has 8 or more nucleotides mapping on one exon?
I think that's correct.
3) Is there disadvantage/negative effect if I choose to set the "Anchor length" at the lowest, for example 3? My understanding is that, under the 0 mismatch condition, if 3 nuceoides of one end of a read mapped on one exon, the other part of the read will map on the adjacent exon (in my case, the other part would be 33 nucleotides). So my understanding is that setting the "Anchor length" at 3 does not increase the inaccuracy of the alignment. Am I correct?
Setting the anchor length especially small reduces the constraints on mapping, so more reads will map but there are likely to be more false positives as well. Good luck, J.
Hi Jeremy, Thank you very much for your reply. I have one more question about the "Anchor length". For a RNA-seq read mapped on the splicing junction under the 0 mismatch condition, if 5 nucleotides of one end map on one exon, does it mean the rest part of the read must map on the adjacent exon? What I want to understand is that, although reducing "Anchor length" may reduce the reliability of mapping on one end/exon, but the increased number of mapped nucleotides on the adjacent exon may increase the reliability of mapping. Does it mean overall the reliability of mapping is not changed? Best, Jianguang ________________________________ From: Jeremy Goecks [jeremy.goecks@emory.edu] Sent: Wednesday, April 10, 2013 3:16 PM To: Du, Jianguang Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] Are reads of 36nt in length long enough to accutatly map on splicing junctions? 1) My reads are 36nt long. How much should I set for "the Minimum length of reads segments" to get the most reliable output with the highest mapping of splicing junctions?. In my previous run of TopHat, I set it as 18. Can I reduce it more to get better mapping on splicing junctions? You'll need to define for yourself what you mean by "better/best mapping" and experiment to find the parameters that give you the best results. 2) I do not understand exactly how TopHat works as for the "Anchor length" although I have read the manual for TopHat. Suppose I set the "Anchor length as 8 and the "Maximum number of mismatch that can appear in the anchor region of spliced alignment" as 0 when I run Tophat. Does it mean, for a read maps on two adjacent exons, TopHat will report this alignment to the outputs ".accepted hits" and ".splicing junctions" if either end of the read has 8 or more nucleotides mapping on one exon? I think that's correct. 3) Is there disadvantage/negative effect if I choose to set the "Anchor length" at the lowest, for example 3? My understanding is that, under the 0 mismatch condition, if 3 nuceoides of one end of a read mapped on one exon, the other part of the read will map on the adjacent exon (in my case, the other part would be 33 nucleotides). So my understanding is that setting the "Anchor length" at 3 does not increase the inaccuracy of the alignment. Am I correct? Setting the anchor length especially small reduces the constraints on mapping, so more reads will map but there are likely to be more false positives as well. Good luck, J.
I have one more question about the "Anchor length". For a RNA-seq read mapped on the splicing junction under the 0 mismatch condition, if 5 nucleotides of one end map on one exon, does it mean the rest part of the read must map on the adjacent exon? What I want to understand is that, although reducing "Anchor length" may reduce the reliability of mapping on one end/exon, but the increased number of mapped nucleotides on the adjacent exon may increase the reliability of mapping. Does it mean overall the reliability of mapping is not changed?
No, in general the probability of mapping 5 bases + (N-5) remaining bases incorrectly is higher than mapping 8 bases + (N-8) bases incorrectly because (a) there are more matching 5-mers than 8-mers in a genome and (b) there can mismatches when mapping the remainder. J.
participants (2)
-
Du, Jianguang
-
Jeremy Goecks