I am not sure about cuffcompare, but cuffdiff doesn't generate any extra files if you add more groups and replicates to the command line. It adds columns to the output files but the number of files remains the same. For a workflow for Martin for now, I would suggest doing this for making calls with no novel genes: 1) upload your reads 2) fastq groom them into sanger format 3) run tophat on each lane individually 4) run cuffcompare with the gtf file you downloaded from uscs or wherever against itself, this puts it in a nice format to use with cuffdiff 5) merge the bam files from tophat for the 10 lanes from each group into one file 6) run cuffdiff using the transcript gtf output file from cuffcompare and the two merged bam files Merging is kind of crappy because you use in-replicate variation information, but its the best you can do now. I have patched galaxy to have cuffdiff handle replicates and to do normalization, when that gets merged into the main branch your workflow will be the same except you won't have to merge all of the bam files from each condition together to use cuffdiff. -rory On Jan 21, 2011, at 9:40 AM, Jeremy Goecks wrote:
Hi David,
Cuffcompare and Cuffdiff generate many more outputs than most other tools; specifically, both generate multiple output files for each additional input given. While Galaxy can handle an arbitrary number of inputs easily, handling so many outputs is challenging and requires extending the framework to handle so many output files.