Hi,
I have run some RNASeq analysis and I am trying to get the ensembl gene annotations to show in the cuffcompare files. I have done the following:
1. Ran cufflinks analysis on the .bam files.
2. I got the .gtf file for hg19 from ensembl. Based on the email below, I replaced the the chromosome name from 1, 2, 3 etc to chr1, chr2, chr3 etc.. Then I tried to run the processed .gtf file with self through cuffcompare as recommended below, I am getting an error

3. If I try to run cuffcompare on two of my cufflinks data file and use the processed gtf file as is, I am getting the same error.

Any inputs on what I am doing wrong are appreciated. I will be happy to share the history if needed.
Thanks and Regards,
Aarti


On 1/24/2011 9:24 PM, Rory Kirchner wrote:
For the ensembl annotation, you can download the gtf file from ensembl for your organism here:

http://uswest.ensembl.org/info/data/ftp/index.html

To use this, you need to fix it because the chromosome names are not correct (depending on your organism, it is not correct for mouse and rat at least). If you are on a mac or on a unix machine, do this from the terminal (assuming your downloaded gtf file is named ensembl.gtf):

awk -F "\t" '{OFS="\t"; $1 = "chr"$1; print}' ensembl.gtf | awk -F"\t" '{OFS="\t"; if($1=="chrMT") $1="chrM"; print}' > ensembl_cleaned.gtf

This changes the ensembl chromosome names from 1,2,3,4,X,MT to chr1, chr2, chr3, chrM to match the bowtie index ids.

This file is unsorted, so it won't work with SAM files but it will work with the BAM files that tophat outputs. If you need to work with SAM files for some reason, this might work:

sort -k 1,1 -k 4,4n infile > outfile

Sorted or unsorted. if you run the reformatted gtf file in cuffcompare against itself (use it as the reference gtf and the 'test' gtf) the GTF file that is output from that cuffcompare will have all of the cds, tss all that stuff when you use it as the reference for cuffdiff.

-rory

On Jan 24, 2011, at 10:02 AM, Jeremy Goecks wrote:

Hi Matteo and Vasu,

There are different ways to refer to genes. Names that start with NM_ are termed 'accession numbers,' and they are a valid way to refer to genes. 

Matteo, what you may want is the canonical gene name (e.g. Xkr4). If so, you'll want to use a gene annotation/reference file from UCSC; when you are getting the file, you'll want to select the table with the word 'canonical' in it. E.g. for hg19/UCSC genes, there is a table called knownCanonical that provides the canonical gene names.

Thanks,
J.

On Jan 24, 2011, at 9:38 AM, vasu punj wrote:

This is a knwon issue of GTF file from Ensembl


--- On Mon, 1/24/11, Matteo Bovolenta <bvlmtt@unife.it> wrote:

From: Matteo Bovolenta <bvlmtt@unife.it>
Subject: [galaxy-user] Gene Name in Cufflink/compare/diff
To: galaxy-user@bx.psu.edu
Date: Monday, January 24, 2011, 5:05 AM

Hi all,

when I run a RNASeq analysis using tophat, cufflink, coffcompare and
cuffdiff by aligning my data to the RefSeq genes I obtain tables from
cufflink/compare/diff which does not include the gene name, but only
the NM_.
Does someone knows how I can obtain all the tables with the gene name?

Thank you all very much for the support,

Best Regards,

Matteo

--
Matteo Bovolenta, PhD
Dipartimento di Medicina Sperimentale e Diagnostica
Sezione di Genetica Medica
Università di Ferrara
Via Fossato di Mortara, 74
44100 Ferrara
tel +39 0532 974449(office)
tel +39 0532 974502 (lab)
fax +39 0532 236157
email bvlmtt@unife.it
http://www.unife.it/medicina/geneticamedica
http://www.bio-nmd.eu
registered in ORPHANET
http://www.orpha.net

NOTA DI RISERVATEZZA: ai sensi del D.Lgs. 196/2003 si precisa che le
informazioni contenute in questo messaggio e nei relativi allegati
sono riservate ed a uso esclusivo del destinatario. Qualora il
messaggio in parola Le fosse pervenuto per errore, La invitiamo ad
eliminarlo senza copiarlo, a non inoltrarlo a terzi e a non farne
alcun uso, dando gentilmente comunicazione all'indirizzo del mittente:
bvlmtt@unife.it Grazie.

CONFIDENTIALITY NOTICE: this message together with its annexes may
contain confidential, proprietary or legally privileged information
and is intended only for the use of the addressee named above. No
confidentiality or privilege is waived or lost by any mistransmission.
If you are not the intended recipient of this message you are hereby
notified that you must not use, disseminate, copy it in any form or
take any action in reliance on it. If you have received this message
in error please delete it and any copies of it and kindly inform the
sender of this e-mail by bvlmtt@unife.it  Thank you

_______________________________________________
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user

_______________________________________________
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user

J.



_______________________________________________
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user

_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user