Re: [galaxy-user] Uploading Genome from Ensembl
Hello Sheena, It is odd that this particular genome exists in two versions, with different content, for the same release number/date. Ensemble will update annotation with new releases, but not the reference genome itself unless they also increment the genome build number. The Zebrafish project page at Ensembl states that UCSC has the latest release, meaning that the genome labeled as "Zebrafish Jul. 2010 (Zv9/danRer7) (danRer7)" in Galaxy is expected to be the same as one would create after combining the files in their download area: http://uswest.ensembl.org/Danio_rerio/Info/Index But, if there are known differences (perhaps you want different masking or haplotypes/chrY PAR inclusion/exclusion), then combining the data can occur prior to upload into Galaxy or after (both invoke a similar tool): -- If prior, using unix and the shell "cat" command is one option. -- If after, then load all chromosomes into your history and use the tool "Text Manipulation -> Concatenate datasets tail-to-head". Best regards, Jen Galaxy team On 12/6/11 2:31 PM, Scroggins, Sheena wrote:
Thanks for responding. I would like to upload the genome to Galaxy, but I'm not sure how to combine all the fa files into one file. The Zebrafish genome from Ensembl comes in 27 separate .fa files. How do I combine these so that when I upload them to Galaxy, I can use the whole genome as my reference genome?
Thanks, Sheena
-----Original Message----- From: Jennifer Jackson [mailto:jen@bx.psu.edu] Sent: Tuesday, November 08, 2011 6:58 AM To: Scroggins, Sheena Cc: galaxy-user@bx.psu.edu Subject: Re: [galaxy-user] Uploading Genome from Ensembl
Hello Sheena,
If you concern with using the UCSC version of the database has to do with the chromosome naming and downstream Cufflinks analysis using Ensembl's reference GTF files, please see #5 on our FAQ, which demonstrates how to modify an Ensembl GTF file to be compatible with the UCSC chromosome naming (slight changes may be needed for each particular genome): http://main.g2.bx.psu.edu/u/jeremy/p/transcriptome-analysis-faq
If there is another reason, please know that custom reference genomes (in fasta format) can be uploaded using FTP following this method: http://wiki.g2.bx.psu.edu/Learn/Upload%20via%20FTP
Hopefully this helps,
Best,
Jen Galaxy team
On 11/4/11 2:50 PM, Scroggins, Sheena wrote:
How do I upload the Zebrafish genome from Ensembl to my user history in Galaxy? I'm trying to map my RNA-Seq data using TopHat and need to map it to the Ensembl version of ZFv9, but Galaxy only has the UCSC version built in. The Ensembl version is slightly different. Thanks, Sheena
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support
-- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support
Good Afternoon Sheena: Can you please explain what is different between the Ensembl and UCSC Zebrafish Zv9 genome sequences ? --Hiram
On 11/4/11 2:50 PM, Scroggins, Sheena wrote:
How do I upload the Zebrafish genome from Ensembl to my user history in Galaxy? I'm trying to map my RNA-Seq data using TopHat and need to map it to the Ensembl version of ZFv9, but Galaxy only has the UCSC version built in. The Ensembl version is slightly different. Thanks, Sheena
participants (2)
-
Hiram Clawson
-
Jennifer Jackson