Hi Jeremy,
Thank you very much for your reply.
I have one more question about the "Anchor length". For a RNA-seq read mapped on the splicing junction under the 0 mismatch condition, if 5 nucleotides of one end map on one exon, does it mean the rest part of the read must map on the adjacent exon? What I want to understand is that, although reducing "Anchor length" may reduce the reliability of mapping on one end/exon, but the increased number of mapped nucleotides on the adjacent exon may increase the reliability of mapping. Does it mean overall the reliability of mapping is not changed?
Best,
Jianguang
1) My reads are 36nt long. How much should I set for "the Minimum length of reads segments" to get the most reliable output with the highest mapping of splicing junctions?. In my previous run of TopHat, I set it as 18. Can I reduce it more to get better mapping on splicing junctions?
2) I do not understand exactly how TopHat works as for the "Anchor length" although I have read the manual for TopHat.Suppose I set the "Anchor length as 8 and the "Maximum number of mismatch that can appear in the anchor region of spliced alignment" as 0 when I run Tophat. Does it mean, for a read maps on two adjacent exons, TopHat will report this alignment to the outputs ".accepted hits" and ".splicing junctions" if either end of the read has 8 or more nucleotides mapping on one exon?
3) Is there disadvantage/negative effect if I choose to set the "Anchor length" at the lowest, for example 3? My understanding is that, under the 0 mismatch condition, if 3 nuceoides of one end of a read mapped on one exon, the other part of the read will map on the adjacent exon (in my case, the other part would be 33 nucleotides). So my understanding is that setting the "Anchor length" at 3 does not increase the inaccuracy of the alignment. Am I correct?