Hello!

 

                I have 10 human RNA-Seq samples consisting of 3 groups (2 replicates per group).  I have already run each of them through TopHat and Cufflinks on the Penn State Galaxy instance.  I am now at a head-scratching moment.  I want to use CuffCompare next (in the end I will want to run CuffDiff so that I can determine the gene/isoform expression between these 3 groups) but am unsure of the best way to do this.  After reading several Galaxy posts, I’ve come across a couple of ideas:

1.       Run CuffCompare on two Cufflinks output files.  When that is finished take the CuffCompare output file and run it again in CuffCompare with the third Cufflinks output file sample.  When this is finished, take that CuffCompare output file and run it again in CuffCompare with the fourth Cufllinks output file sample, etc… (I hope you catch my drift as to where this is going).  In a nutshell I will be repeatedly merging Cufflinks outputs in CuffCompare.  Then when all 10 have been put through CuffCompare, then I can run CuffDiff and set up 3 groups in CuffDiff with their appropriate BAM files from TopHat.

2.       Add all 10 Cufflinks output files in CuffCompare using the “add new GTF input file” option.

 

                I chose step two because it looked the simplest and from the posts I read, it sounded like this was a fully functional option.  I was also using a reference annotation file as well (that file has worked before in the past on non-replicate analyses).  However, I came across an error:

Error running cuffcompare. You are using Cufflinks v1.0.3, which is the most recent release.

No fasta index found for ./input1. Rebuilding, please wait..

Error: sequence lines in a FASTA record must have the same length!

 

cuffcompare v1.0.3 (2403)
cuffcompare -o cc_output  -r /galaxy/main_database/files/002/678/dataset_2678888.dat  -R  -s  ./input1 ./input2 ./input3 ./input4 ./input5 ./input6

 

                Any suggestions as to why this is happening?  Am I trying something that shouldn’t be attempted yet?  Is there a better alternative to analyzing replicates?  Any suggestions/ideas/workflows/you name it would be greatly appreciated!!

               

Thanks,

David