I was at your USC Galaxy seminar last week, which I found very
helpful - thank you!
Glad to hear that you found the workshop helpful. As a reminder, please email questions
about using Galaxy and its tools to the galaxy-user mailing list (which I've
cc'd). You may get quicker and different responses from community members, and
everyone will benefit from the discussion.
I used my recently generated RNAseq data in Galaxy (which was
pre-aligned using tophat and already had cufflinks run on it) - I ran cuffcompare with all
the gtf files and then cuffdiff for the three pairs (there is 1 control and 3 different
drug treatments - no replicates). I got several output files, as expected, but decided
just to look at the gene differential expression as a start. Some questions I have are -
1. (very basic question!) which is sample 1 (and corresponding value 1) and sample 2 (and
corresponding value 2)in my output file. This is what my output file is called -
90: Cuffdiff on data 37, data 38, and data 60: gene differential expression testing
Is 37 sample one or sample two? Given the data - I would expect sample 37 to correspond
to "value 2" - but I could be wrong. Please let me know!
The best way to figure out which dataset corresponds with Cuffdiff's labels is to
click the rerun button in the dataset: sample names correspond directly to the reads
datasets (i.e. BAM files) provided as input to Cuffdiff.
2. How do I find the UCSC gene names corresponding with start/end
sites - I did input the hg18 UCSC gtf file as a reference
You'll need to use a reference annotation (GTF file) that has the gene_name attribute
as input for Cufflinks/compare/difff. Typically Ensembl annotations have this attribute;
however, you'll need to prepend 'chr' to each line--really, to each chromosome
name--in order to bring Ensembl notation in line with UCSC/Galaxy notation.
Actually, I noticed that value 1 in this particular output file is
all 0 - no idea why. It is not this way in the other files, making me wonder if there is
an error somewhere. I am sure the bam file is okay as I viewed it on IGV and saw the
patterns I would expect for some candidate genes I looked at.
It's difficult for me to comment without seeing your analysis. Some output files
depend on particular attributes being set correctly in the annotation file. You may want
to search through our mailing list archives and see if your question has already been