1. It seems that it is better to run everything up to cuffdiff, but does cuffdiff allow multiple sample comparison because I read somewhere that even for multi-samples it still compare tham pairwisely?

Cuffdiff supports replicate analysis.

In a sense, because I want to do clustering which needs some quantitative data source to do the merging, will cuffdiff provide me some quantitative measures rather than the test score and p-value which is too qualitative to include?

Take a look at the Cuffdiff documentation for outputs: http://cufflinks.cbcb.umd.edu/manual.html#cuffdiff_output

2. If I really need to get count data from the FPKM values, how do I obtain the mentioned "effective length"? Would it be better if I treat each assembled transcript as an object in clustering, rather than genes. What does it mean "you'd be throwing away Cufflinks' uncertainty" even with using isoforms as objects? How should I include the uncertainty into my clustering?

These FAQs from http://cufflinks.cbcb.umd.edu/faq.html address your questions:

--
I want to find differentially expressed genes. Can I use Cufflinks in conjunction with count-based differential expression packages?

It's possible, but we strongly advise against this. Current count-based differential expression tools are poorly suited to differential expression analysis in genomes with alternatively spliced genes. The main reason for this is that when a gene has multiple isoforms, a change in the total number of reads or fragments from that gene doesn't always correspond to a change in expression for that gene. Conversely, a gene's expression may change, but the total number of fragments generated by its isoforms may be very similar. In order to detect changes accurately, it's necessary to estimate how many fragments came from each individual splice variant in each sample. Current count-based tools don't do this (to our knowledge - please send us email if you know of one!). Even if they did, fragments that come from parts of genes that are shared by more than one splice variant can't generally assigned to a single isoform, so the fragment counts for each isoform are only estimates, and there is some uncertainty in the counts. Isoforms that are very similar will have a great deal of uncertainty surrounding their fragment counts. This uncertainty needs to be accounted for when testing for differential expression. So while you could use Cufflinks to estimate isoform-level counts, you'd be throwing away Cufflinks' uncertainty, and thus have more confidence in the differences you see than you really should. This will probably lead to many false positives in your analysis. Furthermore, we do not normalize simply by the length to calculate FPKM but an effective length, as explained in our publications. Calculting counts from FPKM by multiplying by the length will give incorrect results. We strongly encourage you to consider using Cuffdiff to find differentially expressed genes and transcripts.

Will you please report how many fragments come from each transcript in a future release?

For the foreseeable future, we will not be reporting the number of fragments we think originated from each transcript. People who have asked for this almost always want to use Cufflinks in conjunction with count-based differential expression packages, which is not a good idea. We're trying to keep our output formats as simple as possible.
--

Best,
J.