Hello Karthik, Because it is sourced from UCSC, the GRCh37 genome is available in Galaxy as "hg19". The full name is: Human Feb. 2009 (GRCh37/hg19) (hg19) This aligns with how Ensembl also understands the content: http://lists.ensembl.org/pipermail/dev/2012-February/002180.html For RNA-seq analysis (and sometime other types of analysis) you may need to adjust other input data's chromosome naming to match the UCSC format. This is explained in the RNA-seq FAQ: https://main.g2.bx.psu.edu/u/jeremy/p/transcriptome-analysis-faq#faq5 The data in the FTP link you provide is annotation data. To use annotation data with the RNA-seq pipeline, a GTF file would be a good format. The RNA-seq tool pages have a link out to Ensembl, but any source from the same genome is OK (UCSC, etc.). Best, Jen Galaxy team On 4/2/12 4:14 AM, Karthik Srinivasan wrote:
Hi,
How do I run Tophat and RNA-Seq analysis using the GRCH37- embl 66 genome? I noticed there is no input for this genome version.
Can I construct a reference genome from the following embl format source sequences: ftp://ftp.ensembl.org/pub/release-66/embl/homo_sapiens/, and map it against my RNAseq data?
Regards,
Karthik
Karthik Srinivasan | Senior Application Engineer P:+912242554282 <tel:+912242554282> | M:+919987014704 <tel:+919987014704> OracleHealth Sciences Global Business Unit 6'th Floor, Silver Metropolis, W.E.Highway, Goregaon(E) | 400063 Mumbai
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at: