Re: [galaxy-user] Cuffdiff Question

28 Jun 2011

      Thanks for the reply. I tried to use the script provided on a previous
galaxy thread for adding the chr on to the gtf file on the mac terminal but
I keep getting this error -

awk: can't open file ensembl.gtf
 source line number 1

I am very new to using the terminal so please let me know if there is
something basic that I am not doing right,

Thanks!
Kurinji

On Tue, Jun 28, 2011 at 6:13 AM, Jeremy Goecks <jeremy.goecks@emory.edu>wrote:
...
Hello Kurinji,
I was at your USC Galaxy seminar last week, which I found very helpful -
thank you!
Glad to hear that you found the workshop helpful. As a reminder, please
email questions about using Galaxy and its tools to the galaxy-user mailing
list (which I've cc'd). You may get quicker and different responses from
community members, and everyone will benefit from the discussion.
I used my recently generated RNAseq data in Galaxy (which was pre-aligned
using tophat and already had cufflinks run on it) - I ran cuffcompare with
all the gtf files and then cuffdiff for the three pairs (there is 1 control
and 3 different drug treatments - no replicates). I got several output
files, as expected, but decided just to look at the gene differential
expression as a start. Some questions I have are -
1. (very basic question!) which is sample 1 (and corresponding value 1) and
sample 2 (and corresponding value 2)in my output file. This is what my
output file is called -
90: Cuffdiff on data 37, data 38, and data 60: gene differential expression
testing 33,969 lines
Is 37 sample one or sample two? Given the data - I would expect sample 37
to correspond to "value 2" - but I could be wrong. Please let me know!
The best way to figure out which dataset corresponds with Cuffdiff's labels
is to click the rerun button in the dataset: sample names correspond
directly to the reads datasets (i.e. BAM files) provided as input to
Cuffdiff.
2. How do I find the UCSC gene names corresponding with start/end sites - I
did input the hg18 UCSC gtf file as a reference
You'll need to use a reference annotation (GTF file) that has the gene_name
attribute as input for Cufflinks/compare/difff. Typically Ensembl
annotations have this attribute; however, you'll need to prepend 'chr' to
each line--really, to each chromosome name--in order to bring Ensembl
notation in line with UCSC/Galaxy notation.
Actually, I noticed that value 1 in this particular output file is all 0 -
no idea why. It is not this way in the other files, making me wonder if
there is an error somewhere. I am sure the bam file is okay as I viewed it
on IGV and saw the patterns I would expect for some candidate genes I looked
at.
It's difficult for me to comment without seeing your analysis. Some output
files depend on particular attributes being set correctly in the annotation
file. You may want to search through our mailing list archives and see if
your question has already been answered:
http://gmod.827538.n3.nabble.com/Galaxy-Users-f815892.html
Good luck,
J.

Re: [galaxy-user] Cuffdiff Question

Kurinji Pandiyan