Re: [galaxy-user] Tophat mapping

18 Apr 2012

      On Wed, Apr 18, 2012 at 8:37 AM, Jeremy Goecks <jeremy.goecks@emory.edu> wrote:
...
I am wondering if these "non-coding reads" will be included when cufflinks
calculates transcript/gene expression.
Reads will only be included if they map to assembled/known transcripts.
Well it depends what transcript annotation file you pass to cuffdiff.
If you run cufflinks without using --GTF:

"Tells Cufflinks to use the supplied reference annotation (a GFF file)
to estimate isoform expression. It will not assemble novel
transcripts, and the program will ignore alignments not structurally
compatible with any reference transcript."[1]

In Galaxy language, option "Use Reference Annotation:" with "Use
reference annotation" selected. Then the two other options, "No" or
"Use reference annotation as guide", will allow cufflinks to estimate
unknown transcripts. If later you use cuffmerge to produce the
transcripts annotation from your cufflinks runs and use it for
cuffdiff, the "non-coding reads" will almost for sure pollute your
transcript expression estimates.

[1]http://cufflinks.cbcb.umd.edu/manual.html

Jeremy, do you have a workflow to estimate what percent of the reads
are mapping to unknown expressed regions? I would like to be able to
produce this estimate before I make a decision on which transcripts
annotation I should pass to cuffdiff. I would expect a small percent
of reads to map outside of known expressed regions, but is this number
is to big, then I would like to check for potential problems with my
library.

Regards,
Carlos

Re: [galaxy-user] Tophat mapping

Carlos Borroto