a question about cuffdiff "values"
Hello: I am a Galaxy-naive molecular, developmental biologist studying repression/derepression of early embryonic gene expression in zebrafish embryos. After attending the Galaxy meeting I returned home and worked up two mRNAseq files to determine RNA expression differences using cuffdiff between a treated and an untreated sample (i.e. data from cuffdiff under the title of "gene differential expression testing"). I downloaded the data, opened it up in an Excel file and captured all the "significant" rows. If I look at the "value 1" and "value 2" columns I find that many of the numbers are single digits. I expect that in one of the columns that the numbers will be very low (that is, less than 1) because the treatment should be inducing gene expression in a subfamily of genes that are repressed. My questions are: 1) what do these numbers represent? 2) If in the "value" column where I expect a higher number has a "value of 10" or less mean anything or should one be selecting for values higher that these single digit numbers 3) And in the column of genes that might be repressed is there really a difference between a "value of 0.1 versus something like 0.01" since that can change my log ratios significantly--this, of course, goes back to my first question I would appreciate any help I could get, sincerely, el linney Professor of Molecular Genetics and Microbiology Duke University Medical Center
Hi El,
1) what do these numbers represent?
FPKM values for sample 1 and 2. Cufflinks documentation is the place to get definitions for all columns: http://cufflinks.cbcb.umd.edu/manual.html#gene_exp_diff
2) If in the "value" column where I expect a higher number has a "value of 10" or less mean anything or should one be selecting for values higher that these single digit numbers 3) And in the column of genes that might be repressed is there really a difference between a "value of 0.1 versus something like 0.01" since that can change my log ratios significantly--this, of course, goes back to my first question
These questions get at the challenge of interpreting FPKM values. One thing to look at is the confidence intervals (CI) produced by Cufflinks/diff. CIs that overlap 0 are, in my experience, unreliable no matter how large the FPKM. Most likely genes with FPKM values near 0 have CIs overlapping 0, which means there's likely no difference between them. However, genes with low FPKM values ( e.g. < 10) but tight CIs and > 0 should probably be included for further analysis. Another thing to look at is whether a couple highly-expressed genes are reducing FPKM values. If so, using the upper-quartile normalization option can help you get better resolution for genes expressed at low levels. Good luck, J.
participants (2)
-
Elwood Linney
-
Jeremy Goecks