On Wed, Apr 18, 2012 at 8:37 AM, Jeremy Goecks <jeremy.goecks@emory.edu> wrote:
I am wondering if these "non-coding reads" will be included when cufflinks calculates transcript/gene expression.
Reads will only be included if they map to assembled/known transcripts.
Well it depends what transcript annotation file you pass to cuffdiff. If you run cufflinks without using --GTF: "Tells Cufflinks to use the supplied reference annotation (a GFF file) to estimate isoform expression. It will not assemble novel transcripts, and the program will ignore alignments not structurally compatible with any reference transcript."[1] In Galaxy language, option "Use Reference Annotation:" with "Use reference annotation" selected. Then the two other options, "No" or "Use reference annotation as guide", will allow cufflinks to estimate unknown transcripts. If later you use cuffmerge to produce the transcripts annotation from your cufflinks runs and use it for cuffdiff, the "non-coding reads" will almost for sure pollute your transcript expression estimates. [1]http://cufflinks.cbcb.umd.edu/manual.html Jeremy, do you have a workflow to estimate what percent of the reads are mapping to unknown expressed regions? I would like to be able to produce this estimate before I make a decision on which transcripts annotation I should pass to cuffdiff. I would expect a small percent of reads to map outside of known expressed regions, but is this number is to big, then I would like to check for potential problems with my library. Regards, Carlos