How to decide "Mean Inner Distance between Mate Pairs"?
Dear All, I am analyzing the downloaded RNA-seq datasets. However I am not sure how much is Mean Inner Distance between Mate Pairs for these paired-end datasets. Take a paired-end RNA-seq dataset as an example, there is a description for this dataset in SRA database of NCBI: "Layout: PAIRED, Orientation: 5'-3'-3'-5', Nominal length: 400, Nominal Std Dev: 20". At first I thought the Mean Inner Distance between Mate Pairs should be 325bps because the length of reads on both ends is 36bps. However when I aligned the sequence of the paired reads on to transcripts and genome using BLASTn, the distance between the paired reads is about 200bps. How should I decide the Mean Inner Distance between Mate Pairs in my case? Thanks. Jianguang Du
On Wed, Aug 15, 2012 at 11:13 AM, Du, Jianguang <jiandu@iupui.edu> wrote:
Dear All,
I am analyzing the downloaded RNA-seq datasets. However I am not sure how much is Mean Inner Distance between Mate Pairs for these paired-end datasets.
Take a paired-end RNA-seq dataset as an example, there is a description for this dataset in SRA database of NCBI: "Layout: PAIRED*, Orientation: * 5'-3'-3'-5'*, Nominal length: *400*, Nominal Std Dev: *20"
At first I thought the Mean Inner Distance between Mate Pairs should be
325bps because the length of reads on both ends is 36bps. However when I aligned the sequence of the paired reads on to transcripts and genome using BLASTn, the distance between the paired reads is about 200bps. How should I decide the Mean Inner Distance between Mate Pairs in my case?
The information from SRA is likely only an approximation. SRA does not validate these details, I do not think. You can probably use the distribution from your data as the best estimate. Sean Thanks.
Jianguang Du
Great advice Sean! Jianguang, this is the correct analysis - mapping the data to test the actual insert size of the library as sequenced. The experimental notes at SRA are just a starting place, the data is truth. A sample through TopHat itself might produce more precise results. I suspect the coverage on your top Blastn HSP is not complete, breaking off where it hits a splice. And that you have some bias for sequences/hits that cross junctions near ends. But overall, none of this would likely make that much of a difference in the analysis as a whole. Good luck! Jen Galaxy team On 8/15/12 8:39 AM, Sean Davis wrote:
On Wed, Aug 15, 2012 at 11:13 AM, Du, Jianguang <jiandu@iupui.edu <mailto:jiandu@iupui.edu>> wrote:
Dear All,
I am analyzing the downloaded RNA-seq datasets. However I am not sure how much is Mean Inner Distance between Mate Pairs for these paired-end datasets.
Take a paired-end RNA-seq dataset as an example, there is a description for this dataset in SRA database of NCBI: "Layout: PAIRED/, Orientation: /5'-3'-3'-5'/, Nominal length: /400/, Nominal Std Dev: /20"
At first I thought the Mean Inner Distance between Mate Pairs should be 325bps because the length of reads on both ends is 36bps. However when I aligned the sequence of the paired reads on to transcripts and genome using BLASTn, the distance between the paired reads is about 200bps. How should I decide the Mean Inner Distance between Mate Pairs in my case?
The information from SRA is likely only an approximation. SRA does not validate these details, I do not think.
You can probably use the distribution from your data as the best estimate.
Sean
Thanks.
Jianguang Du
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
participants (3)
-
Du, Jianguang
-
Jennifer Jackson
-
Sean Davis