How should I include biological replicates in cufflink/cuffdiff?
Dear team, I have a few questions regarding RNA-seq data analysis. I want to compare transcriptome profile of a bacterium between 5 time points, and for each time point I have three biological replicates. Right now I have each of the replicates' sequences aligned with Bowtie and assembled transcripts generates using Cufflinks. I understand that if I am just comparing two time points (if each of time point has only one sample, no replicate) I just need to cuffmerge the two time points and cuffdiff the merged file. My question is, if I need to compare between 5 time points, should I do comparison pairwise? And also, how should I include three biological replicates for each time point? I can only find 'include replicates' in Cuffdiff. Could you check is the following methodology correct? let's say I have samples in time 0 , 1hour, 2hours, 3 hours and 4 hours. I will name each of the replicates as 0hour-1, 0hour-2, 0hour-3,etc. I will first compare expression between 0hour and 1hour. I will use cuffmerge to merge 0hour-1, 0hour-2, 0hour-3, 1hour-1,1hour-2.1hour-3 to generate one cuffmerge file. Then I will run cuffdiff using the merged file, include two groups, group 1 is 0 hour (add 0hour 1-3 in group 1) and group 2 is 1hour (add 1hour1-3 in group 2). Does this sound reasonable? Thank you very much, Qian -- Qian Dong Bauer Lab, MCBD Simon Hall: 313-317 212 S. Hawthorne Dr. Bloomington, IN 47405 Email:dong3@indiana.edu Lab Phone:812-855-8443
My question is, if I need to compare between 5 time points, should I do comparison pairwise?
No, do them all at once with Cuffdiff: (a) set 'Perform Replicate Analysis' to 'Yes'; (b) create 5 replicate conditions, one for each time point; (c) add your replicates for each time point. There's a Cuffdiff flag to do time series analysis, but it isn't implemented yet in Galaxy, so you'll get pairwise comparisons for all conditions. You can use the filtering tool to reduce Cuffdiff outputs to only the timepoint comparisons.
I will use cuffmerge to merge 0hour-1, 0hour-2, 0hour-3, 1hour-1,1hour-2.1hour-3 to generate one cuffmerge file.
Correct.
Then I will run cuffdiff using the merged file, include two groups, group 1 is 0 hour (add 0hour 1-3 in group 1) and group 2 is 1hour (add 1hour1-3 in group 2).
Use the process I described above to do all pairwise comparisons in one run. Good luck, J.
Dear Jeremy, Thank you for your advice! However when I tried this out I got some more questions. I am dealing with a bacterium which has about 4000 genes. When I tried Cuffmerge to merge everything with reference annotation, I got a merged file of only 50 lines. If I left out the reference annotation file, Cuffmerge returned me a merged file of 4000 lines (which is more reasonable). However this difference didn't happen if I use Cuffcompare to merge all the files. With or Without reference annotation, the merged file are both of 4000 lines. If I continue to Cuffdiff with this Cuffcompare file, I got over 1000 significantly changed genes. Could you give me some suggestion on this? Should I just trust the Cuffcompare file? Is it possible that there might be some problem with my reference annotation file? Thank you very much, Qian On Sun, Mar 3, 2013 at 9:59 AM, Jeremy Goecks <jeremy.goecks@emory.edu>wrote:
My question is, if I need to compare between 5 time points, should I do comparison pairwise?
No, do them all at once with Cuffdiff:
(a) set 'Perform Replicate Analysis' to 'Yes'; (b) create 5 replicate conditions, one for each time point; (c) add your replicates for each time point.
There's a Cuffdiff flag to do time series analysis, but it isn't implemented yet in Galaxy, so you'll get pairwise comparisons for all conditions. You can use the filtering tool to reduce Cuffdiff outputs to only the timepoint comparisons.
I will use cuffmerge to merge 0hour-1, 0hour-2, 0hour-3, 1hour-1,1hour-2.1hour-3 to generate one cuffmerge file.
Correct.
Then I will run cuffdiff using the merged file, include two groups, group 1 is 0 hour (add 0hour 1-3 in group 1) and group 2 is 1hour (add 1hour1-3 in group 2).
Use the process I described above to do all pairwise comparisons in one run.
Good luck, J.
-- Qian Dong Bauer Lab, MCBD Simon Hall: 313-317 212 S. Hawthorne Dr. Bloomington, IN 47405 Email:dong3@indiana.edu Lab Phone:812-855-8443
I am dealing with a bacterium which has about 4000 genes. When I tried Cuffmerge to merge everything with reference annotation, I got a merged file of only 50 lines. If I left out the reference annotation file, Cuffmerge returned me a merged file of 4000 lines (which is more reasonable).
However this difference didn't happen if I use Cuffcompare to merge all the files. With or Without reference annotation, the merged file are both of 4000 lines. If I continue to Cuffdiff with this Cuffcompare file, I got over 1000 significantly changed genes.
Could you give me some suggestion on this? Should I just trust the Cuffcompare file?
Cuffmerge attempts to remove incomplete or spurious transcripts. My best guess is bacterial transcripts, with few/no introns, are being filtered out because they appear to be incomplete to Cuffmerge. So, in your case, Cuffcompare could be the superior option. You might want to verify my guess by discussing the issue with the cufflinks developers directly: tophat.cufflinks@gmail.com ; please feel free to post anything you learn to this list. Best, J.
participants (2)
-
Jeremy Goecks
-
Qian Dong