Hi all,I have been analyzing my RNA-seq data on mouse tissues. My RNA-data is single-ended and 51 bp in length. I ran TopHat/Cufflink/Cuffdiff to test to differential gene expression
In the Cuffdiff's output, I got very high RPKM value for some of miRNA and some other short genes ( less than 100bp). These genes are in the top genes with the highest RPKM. I think the RPKM values of these genes are probably too high to be true.
test_id |
gene_id |
gene |
locus |
sample_1 |
sample_2 |
status |
value_1 |
value_2 |
log2(fold_change) |
test_stat |
p_value |
q_value |
significant |
ENSMUSG00000093077 |
ENSMUSG00000093077 |
Mir5105 |
5:146231229-146302874 |
Epithelium |
Fiber |
OK |
1.53E+06 |
445558 |
-1.78097 |
-355.367 |
0.00715 |
0.016986 |
yes |
ENSMUSG00000093098 |
ENSMUSG00000093098 |
Gm22641 |
7:130162450-133124354 |
Epithelium |
Fiber |
OK |
87894.1 |
36474.7 |
-1.26887 |
-0.59863 |
0.4913 |
0.587174 |
no |
ENSMUSG00000089855 |
ENSMUSG00000089855 |
Gm15662 |
10:105187662-105583874 |
Epithelium |
Fiber |
OK |
42868.9 |
21566.5 |
-0.99114 |
-20.7066 |
0.0186 |
0.039568 |
yes |
ENSMUSG00000092984 |
ENSMUSG00000092984 |
Mir5115 |
2:73012853-73012927 |
Epithelium |
Fiber |
OK |
21104.8 |
8317.49 |
-1.34335 |
-447.314 |
0.0001 |
0.000354 |
yes |
ENSMUSG00000086324 |
ENSMUSG00000086324 |
Gm15564 |
16:35926510-36037131 |
Epithelium |
Fiber |
OK |
6443.35 |
3664.15 |
-0.81433 |
-1.52095 |
0.2129 |
0.301429 |
no |
ENSMUSG00000092981 |
ENSMUSG00000092981 |
Mir5125 |
17:23803186-23824739 |
Epithelium |
Fiber |
OK |
5974.14 |
2390.75 |
-1.32127 |
-0.34111 |
0.5746 |
0.661937 |
no |
I checked some forums and they said that this is the drawback of TopHat/Cufflink/Cuffdiff when dealing with short genes. But I am still not so clear about this. Anyone got the same problem? What can I do with this situation?
Anyone suggests any other good tools to test for (1) differential gene expression OR (2) both differential gene expression and gene discovery?
Thank you
Thanh