Hi, After mapping RNA-Seq paired end reads with Tophat, I can see that most of reads fall into the right regions. However, I still can see lots of reads mapped to non-coding region (the locations where the reads are mapped to don't contain exons). I am wondering if these "non-coding reads" will be included when cufflinks calculates transcript/gene expression. Dying to know your opinion. And another question is: how to know the number of reads mapped to a certain exon? Thanks
I am wondering if these "non-coding reads" will be included when cufflinks calculates transcript/gene expression.
Reads will only be included if they map to assembled/known transcripts.
And another question is: how to know the number of reads mapped to a certain exon?
This isn't possible because a single read may map to multiple exons and/or transcripts. Cufflinks assigns reads probabilistically when their mapping cannot be uniquely determined. See http://cufflinks.cbcb.umd.edu/faq.html#count http://cufflinks.cbcb.umd.edu/howitworks.html for details. Best, J.
Dear All I need some (lots) suggestions and help, first and most important is that i am working on bacterial RNA seq (illumina reads) my analysis steps are as following .... Step 1. FASTQ sequence data was groomed Step 2. I did mapping by Bowtie with default parameters. Reference genome fasta file i am using from my history, because the reference genome is not vaialble on galaxy. Step3. i sorted the bowtie output file using r workflow (Germy Goecks workflow) , link below https://main.g2.bx.psu.edu/u/jeremy/p/transcriptome-analysis-faq#faq2 Step4. this sorting provided me Concatenate files Step5. Concatenated files were used to run CUFFLINK, this provided me assembled trancript file Step6: Assembled transcript files from step 5 were used for CUFFMERGE Step 7: For CUFFDIFF Transcript GTF file generated from Step 6and concatenate files from step 4 were used Now my question is if this workflow is acceptable for bacterial transcriptome anaylsis, Should i filter SAM file, if yes then at which step Should i convert SAM file to the BAM file, then at which step it should be Is that Ok to use fasta of reference genome for mapping should it be converted to any other format, if yes then what should be the workflow If any one have experince of bowtie parametes to map bacterial RNA seqquence analsis are very much welcomed Thanking you all
On Wed, Apr 18, 2012 at 8:37 AM, Jeremy Goecks <jeremy.goecks@emory.edu> wrote:
I am wondering if these "non-coding reads" will be included when cufflinks calculates transcript/gene expression.
Reads will only be included if they map to assembled/known transcripts.
Well it depends what transcript annotation file you pass to cuffdiff. If you run cufflinks without using --GTF: "Tells Cufflinks to use the supplied reference annotation (a GFF file) to estimate isoform expression. It will not assemble novel transcripts, and the program will ignore alignments not structurally compatible with any reference transcript."[1] In Galaxy language, option "Use Reference Annotation:" with "Use reference annotation" selected. Then the two other options, "No" or "Use reference annotation as guide", will allow cufflinks to estimate unknown transcripts. If later you use cuffmerge to produce the transcripts annotation from your cufflinks runs and use it for cuffdiff, the "non-coding reads" will almost for sure pollute your transcript expression estimates. [1]http://cufflinks.cbcb.umd.edu/manual.html Jeremy, do you have a workflow to estimate what percent of the reads are mapping to unknown expressed regions? I would like to be able to produce this estimate before I make a decision on which transcripts annotation I should pass to cuffdiff. I would expect a small percent of reads to map outside of known expressed regions, but is this number is to big, then I would like to check for potential problems with my library. Regards, Carlos
Jeremy, do you have a workflow to estimate what percent of the reads are mapping to unknown expressed regions?
Here's a simple approach assuming mapped reads are in BAM format: BAM --> SAM SAM --> Interval Intersect reads as interval with known annotation not allowing for any overlap. Best, J.
participants (4)
-
Ateequr Rehman
-
Carlos Borroto
-
Jeremy Goecks
-
杨继文