Can someone help me understanding the quartile normalization in Cufflinks? I read different threads in which they reported that the FPKM values were inflated after normalization (-N) but most people didn't report their values so I don't know how big the inflation should be... In my case, the difference is huge! The FPKM values for the four first genes without normalization are in the range of [61 - 184] while after normalization, they are in the range of [2.4e+6 - 7.4e+6]. Even though this inflation does not seem to affect the calculation of the gene expression changes [ log (FPKM2/FPKM1) ], I'm wondering if something is wrong with my dataset. Is it was I should expect? Is it always better to use the normalization? Thanks, David
David, Quartile normalization is explained in the Cufflinks manual: http://cufflinks.cbcb.umd.edu/manual.html -- "With this option, Cufflinks normalizes by the upper quartile of the number of fragments mapping to individual loci instead of the total number of sequenced fragments. This can improve robustness of differential expression calls for less abundant genes and transcripts." My reading of this is that the "M" in FPKM is taken from the upper quartile rather than the total; if the FPKM numbers for highly expressed isoforms change substantially, that suggests many of your reads are mapping to minimally expressed isoforms. Without knowing more about your experiment, it's not possible to say whether you should be doing quartile normalization. However, given that it's designed for DE calls for less abundant isoforms, you'll want to see whether this holds true for your dataset(s) and whether Cuffdiff DE tests makes sense in the context of your research questions. Good luck, J. On Aug 25, 2011, at 1:49 PM, David Joly wrote:
Can someone help me understanding the quartile normalization in Cufflinks? I read different threads in which they reported that the FPKM values were inflated after normalization (-N) but most people didn't report their values so I don't know how big the inflation should be...
In my case, the difference is huge! The FPKM values for the four first genes without normalization are in the range of [61 - 184] while after normalization, they are in the range of [2.4e+6 - 7.4e+6]. Even though this inflation does not seem to affect the calculation of the gene expression changes [ log (FPKM2/FPKM1) ], I'm wondering if something is wrong with my dataset.
Is it was I should expect? Is it always better to use the normalization?
Thanks,
David
participants (2)
-
David Joly
-
Jeremy Goecks