Galaxy wrapper for GATK2This wrapper is copyright 2013 by Björn Grüning, Jim Johnson & the Galaxy Team. The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size. http://www.broadinstitute.org/gatk http://www.broadinstitute.org/gatk/about/citing-gatk GATK is Free for academics, and fee for commercial use. Please study the GATK licensing website: http://www.broadinstitute.org/gatk/about/#licensing InstallationThe recommended installation is by means of the toolshed. Galaxy should be able to install samtools dependencies automatically for you. GATK2, and its new licence model, does not allow us to distribute the GATK binaries. As a consequence you need to install GATK2 by your own, please see the GATK website for more information: http://www.broadinstitute.org/gatk/download Once you have installed GATK2, you need to edit the env.sh files that are installed together with the wrappers. You must edit the GATK2_PATH environment variable in the file: <tool_dependency_dir>/environment_settings/GATK2_PATH/iuc/gatk2/<hash_string>/env.sh to point to the folder where you have installed GATK2. Optionally, you may also want to edit the GATK2_SITE_OPTIONS environment variable in the file: <tool_dependency_dir>/environment_settings/GATK2_SITE_OPTIONS/iuc/gatk2/<hash_string>/env.sh to deactivate the 'call home feature' of GATK with something like: GATK2_SITE_OPTIONS='-et NO_ET -K /data/gatk2_key_file' GATK2_SITE_OPTIONS can be also used to insert other specific options into every GATK2 wrapper at runtime, without changing the actual wrapper. Read more about the "Phone Home" problem at: http://www.broadinstitute.org/gatk/guide/article?id=1250 Optionally, you may also want to add some commands to be executed before GATK (e.g. to load modules) to the file: <tool_dependency_dir>/gatk2/default/env.sh Finally, you should fill in additional information about your genomes and annotations in the gatk2_picard_index.loc and gatk2_annotations.txt. You can find them in the tool-data/ Galaxy directory. |
It looks like I need to generate the dict file for the mm10 reference as well as add the reference to the srma_index.loc. My question is where do these need to exist? Do they belong in the repo directory structure or or in the primary tool-data directory? The hg19.fa, hg19.fa.fia, hg19.dict as well as these same files for the mm9 GRCh37. However, the .dict does not exist for mm10. Even though that is the case the references do not appear in the gatk2 tools.
Any ideas?
Thanks,Iry
From: Daniel Blankenberg <dan@bx.psu.edu>
Date: Thursday, October 2, 2014 1:57 PM
To: Iry Witham <iry.witham@jax.org>
Cc: "galaxy-dev@lists.bx.psu.edu" <galaxy-dev@lists.bx.psu.edu>
Subject: Re: [galaxy-dev] Cloudman indices installation/configuration
Hi Iry,
First thing to check is that your fields are tab delimited — they appear to be spaces instead of tabs in this email, but copy and pasting into email can munge things sometimes (also “gh19.fa” is probably a typo, but that wouldn’t prevent the selection option from showing up).
Thanks for using Galaxy,
Dan
On Oct 2, 2014, at 1:49 PM, Iry Witham <Iry.Witham@jax.org> wrote:
___________________________________________________________Hi Team,
I have a new instance of galaxy cloudman running and have added tools from the toolshed to it. When I attempt to run tools like sam-to-bam or any gatk tool I am prompted for a reference genome. However, indices/references not available for these tools. I have added the following line to the sam_fa_indices.loc, but that did nothing:
index hg19 /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/gh19.fa
I have also added the following three lines to the gatk2_picard_index.loc:
hg19 hg19 Human (hg19) /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/hg19.faGRCh37 GRCh37 Human (GRCh37) /mnt/galaxyIndices/genomes/Hsapiens/GRCh37/seq/GRCh37.famm10 mm10 Mouse (mm10) /mnt/galaxyIndices/genomes/Mmusculus/mm10/seq/mm10.fa
I know I have missed something, but can't seem to figure it out. Could someone point me in the right direction?
Regards,__________________________________Iry T. WithamScientific Applications AdministratorComputational Sciences Group
The Jackson Laboratory
600 Main Street
Bar Harbor, ME 04609
Phone: 207-288-6744
email: iry.witham@jax.org
<372D007A-1B00-4668-BA6B-F0527C1F24BE[34][3].png>
The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/
The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/
The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.