Thanks for the reply. I tried to use the script provided on a previous galaxy thread for adding the chr on to the gtf file on the mac terminal but I keep getting this error -

awk: can't open file ensembl.gtf
 source line number 1

I am very new to using the terminal so please let me know if there is something basic that I am not doing right,

Thanks!
Kurinji

On Tue, Jun 28, 2011 at 6:13 AM, Jeremy Goecks <jeremy.goecks@emory.edu> wrote:
Hello Kurinji,

I was at your USC Galaxy seminar last week, which I found very helpful - thank you!

Glad to hear that you found the workshop helpful. As a reminder, please email questions about using Galaxy and its tools to the galaxy-user mailing list (which I've cc'd). You may get quicker and different responses from community members, and everyone will benefit from the discussion.

I used my recently generated RNAseq data in Galaxy (which was pre-aligned using tophat and already had cufflinks run on it) - I ran cuffcompare with all the gtf files and then cuffdiff for the three pairs (there is 1 control and 3 different drug treatments - no replicates). I got several output files, as expected, but decided just to look at the gene differential expression as a start. Some questions I have are -

1. (very basic question!) which is sample 1 (and corresponding value 1) and sample 2 (and corresponding value 2)in my output file. This is what my output file is called -

90: Cuffdiff on data 37, data 38, and data 60: gene differential expression testing 33,969 lines

Is 37 sample one or sample two? Given the data - I would expect sample 37 to correspond to "value 2" - but I could be wrong. Please let me know!

The best way to figure out which dataset corresponds with Cuffdiff's labels is to click the rerun button in the dataset: sample names correspond directly to the reads datasets (i.e. BAM files) provided as input to Cuffdiff.

2. How do I find the UCSC gene names corresponding with start/end sites - I did input the hg18 UCSC gtf file as a reference

You'll need to use a reference annotation (GTF file) that has the gene_name attribute as input for Cufflinks/compare/difff. Typically Ensembl annotations have this attribute; however, you'll need to prepend 'chr' to each line--really, to each chromosome name--in order to bring Ensembl notation in line with UCSC/Galaxy notation.

Actually, I noticed that value 1 in this particular output file is all 0 - no idea why. It is not this way in the other files, making me wonder if there is an error somewhere. I am sure the bam file is okay as I viewed it on IGV and saw the patterns I would expect for some candidate genes I looked at.

It's difficult for me to comment without seeing your analysis. Some output files depend on particular attributes being set correctly in the annotation file. You may want to search through our mailing list archives and see if your question has already been answered: http://gmod.827538.n3.nabble.com/Galaxy-Users-f815892.html

Good luck,
J.