![](https://secure.gravatar.com/avatar/98876b4025ea8e47dc91b018b3ccb36b.jpg?s=120&d=mm&r=g)
Hi Lizex,
I've started analyzing my RNA-Seq data for two time points: Day0 and Day4 for control and treated. I've done aligning the data to the reference genome using Tophat. I've removed duplicates from the data sets. Could somebody please tell me, how important is it to remove duplicates and how will it influence my results if I don't remove?
This depends on whether you are removing duplicates in your fastq data and/or multi-mapping reads either using Tophat or post-processing steps. In any case, this approach that will affect quantitation outputs from Cufflinks and likely transcript assemblies as well.
I want to start with Cufflinks all the way through to Cuffdiff. Where do I start since there are just so many options (in the manual) to choose from? What do I look for?
Here's a tutorial that will help you get started with RNA-seq analysis: http://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise Galaxy makes it easy to experiment with different parameter values, so you'll want to read the Cufflinks/compare/diff manual and adjust parameters that are relevant to your data: http://cufflinks.cbcb.umd.edu/manual.html In general, RNA-seq studies look at (a) transcripts assembled; (b) expression values; and (c) differential expression estimates. Good luck, J.