How to replace ensembl gene ID with gene names in Cuffdiff output?
Hi all, I had the following Cuffdiff output from genes defferetial expression testing: test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value XLOC_000001 XLOC_000001 ENST00000450305,ENST00000456328,ENST00000515242,ENST00000518655 chr1:11868-31109 NORMAL 1 OK 0.558797 0.84004 0.588134 -0.44598 0.655611 0.767628 There are four transcript IDs belong to gene DDXL11L1, my question is how to replace these ID with official gene names? Thanks, -- Wei Liao Research Scientist, Brentwood Biomedical Research Institute 16111 Plummer St. Bldg 7, Rm D-122 North Hills, CA 91343 818-891-7711 ext 7645
Hi Wei, BioMart has tools to extract tabular data that maps Ensembl transcript identifiers to alternate identifiers, gene symbols, etc. See the tool under "Get Data -> BioMart Central server". You'll likely have to map from Ensembl transcriptID -> HGNC transcript -> HGNC gene http://uswest.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG0000022397... Other sources also have this type of data: http://www.genenames.org/data/hgnc_data.php?hgnc_id=37102 The Cuffdiff file will also need to be prepared, the Ensembl transcripts need to be in tabular format. 'Convert Delimiters to Tab' is the correct tool choice. Then, once both files have the data you wish to you in tabular format, join the data on common keys using tools in 'Join, Subtract and Group -> Column Join'. To finish, use tools in 'Text Manipulation', 'Filter and Sort', and 'Join, Subtract and Group' to format the data so that it is useful for your purposes. If you need help, we have screencasts cover most of these text manipulation operations: Galaxy 101 + Tool tutorials (top 6) http://wiki.g2.bx.psu.edu/Learn/Screencasts Hopefully this helps, Jen Galaxy team On 4/19/12 1:43 PM, Wei Liao wrote:
Hi all, I had the following Cuffdiff output from genes defferetial expression testing: test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value XLOC_000001 XLOC_000001 ENST00000450305,ENST00000456328,ENST00000515242,ENST00000518655 chr1:11868-31109 NORMAL 1 OK 0.558797 0.84004 0.588134 -0.44598 0.655611 0.767628
There are four transcript IDs belong to gene DDXL11L1, my question is how to replace these ID with official gene names? Thanks,
-- Wei Liao Research Scientist, Brentwood Biomedical Research Institute 16111 Plummer St. Bldg 7, Rm D-122 North Hills, CA 91343 818-891-7711 ext 7645
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
participants (2)
-
Jennifer Jackson
-
Wei Liao