[galaxy-user] Clustering with cuffcompare or cuffdiff results

14 Feb 2012

      Dear Sir or Madam,

I am planning to do clustering of several libraries based on the output of cuffcompare or cuffdiff, as they allow me to construct a matrix whose columns represent the libraries and rows are the count of transcripts or genes.  I want to construct the matrix because it is the required input format of many RNA-seq clustering softwares, e.g. baySeq, HTSCluster. However, by reading the answer of question "I want to find differentially expressed genes. Can I use Cufflinks in conjunction with count-based differential expression packages?" in the cufflinks FAQ list, it is suggested not to convert FPKM value to count data. 

Now my question is 
1. It seems that it is better to run everything up to cuffdiff, but does cuffdiff allow multiple sample comparison because I read somewhere that even for multi-samples it still compare tham pairwisely? In a sense, because I want to do clustering which needs some quantitative data source to do the merging, will cuffdiff provide me some quantitative measures rather than the test score and p-value which is too qualitative to include? 
2. If I really need to get count data from the FPKM values, how do I obtain the mentioned "effective length"? Would it be better if I treat each assembled transcript as an object in clustering, rather than genes. What does it mean "you'd be throwing away Cufflinks' uncertainty" even with using isoforms as objects? How should I include the uncertainty into my clustering?

Best,
Sherry

[galaxy-user] Clustering with cuffcompare or cuffdiff results

Zhang Xiaoyu