Hi Enis,
Thanks for that information. Now I am getting an error with the Unified_Genotyper failing to locate the GenomeAnalysisTK.jar. I discovered that gatk2 needs to be downloaded and installed. I have done that, but can't seem to figure out where the env.sh file reference below exists. Can you point me to the correct proximity of that file? Or do I need to create the file and if so where?
Thanks,Iry
Galaxy wrapper for GATK2
This wrapper is copyright 2013 by Björn Grüning, Jim Johnson & the Galaxy Team.
The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
http://www.broadinstitute.org/gatk http://www.broadinstitute.org/gatk/about/citing-gatk
GATK is Free for academics, and fee for commercial use. Please study the GATK licensing website: http://www.broadinstitute.org/gatk/about/#licensing
Installation
The recommended installation is by means of the toolshed.
Galaxy should be able to install samtools dependencies automatically for you. GATK2, and its new licence model, does not allow us to distribute the GATK binaries. As a consequence you need to install GATK2 by your own, please see the GATK website for more information:
http://www.broadinstitute.org/gatk/download
Once you have installed GATK2, you need to edit the env.sh files that are installed together with the wrappers. You must edit the GATK2_PATH environment variable in the file:
<tool_dependency_dir>/environment_settings/GATK2_PATH/iuc/gatk2/<hash_string>/env.sh
to point to the folder where you have installed GATK2.
Optionally, you may also want to edit the GATK2_SITE_OPTIONS environment variable in the file:
<tool_dependency_dir>/environment_settings/GATK2_SITE_OPTIONS/iuc/gatk2/<hash_string>/env.sh
to deactivate the 'call home feature' of GATK with something like:
GATK2_SITE_OPTIONS='-et NO_ET -K /data/gatk2_key_file'
GATK2_SITE_OPTIONS can be also used to insert other specific options into every GATK2 wrapper at runtime, without changing the actual wrapper.
Read more about the "Phone Home" problem at: http://www.broadinstitute.org/gatk/guide/article?id=1250
Optionally, you may also want to add some commands to be executed before GATK (e.g. to load modules) to the file:
<tool_dependency_dir>/gatk2/default/env.sh
Finally, you should fill in additional information about your genomes and annotations in the gatk2_picard_index.loc and gatk2_annotations.txt. You can find them in the tool-data/ Galaxy directory.
From: Enis Afgan <afgane@gmail.com>
Date: Saturday, October 4, 2014 6:10 AM
To: Iry Witham <iry.witham@jax.org>
Cc: "galaxy-dev@lists.bx.psu.edu" <galaxy-dev@lists.bx.psu.edu>
Subject: Re: [galaxy-dev] Cloudman indices installation/configuration
Hi Iry,Try adding the following to your /mnt/galaxy/galaxy-app/tool_data_table_conf.xml, populating the referenced files (tool-data/gatk2_picard_index.loc and tool-data/gatk2_annotations.txt) as desired and restarting Galaxy:
<!-- Location of Picard dict files valid for GATK --><table name="gatk2_picard_indexes" comment_char="#"><columns>value, dbkey, name, path</columns><file path="tool-data/gatk2_picard_index.loc" /></table><!-- Available of GATK references --><table name="gatk2_annotations" comment_char="#"><columns>value, name, gatk_value, tools_valid_for</columns><file path="tool-data/gatk2_annotations.txt" /></table>
Hope this gets you going. Let us know if it doesn't,Enis
On Fri, Oct 3, 2014 at 1:36 PM, Iry Witham <Iry.Witham@jax.org> wrote:
It looks like I need to generate the dict file for the mm10 reference as well as add the reference to the srma_index.loc. My question is where do these need to exist? Do they belong in the repo directory structure or or in the primary tool-data directory? The hg19.fa, hg19.fa.fia, hg19.dict as well as these same files for the mm9 GRCh37. However, the .dict does not exist for mm10. Even though that is the case the references do not appear in the gatk2 tools.
Any ideas?
Thanks,Iry
From: Daniel Blankenberg <dan@bx.psu.edu>
Date: Thursday, October 2, 2014 1:57 PM
To: Iry Witham <iry.witham@jax.org>
Cc: "galaxy-dev@lists.bx.psu.edu" <galaxy-dev@lists.bx.psu.edu>
Subject: Re: [galaxy-dev] Cloudman indices installation/configuration
Hi Iry,
First thing to check is that your fields are tab delimited — they appear to be spaces instead of tabs in this email, but copy and pasting into email can munge things sometimes (also “gh19.fa” is probably a typo, but that wouldn’t prevent the selection option from showing up).
Thanks for using Galaxy,
Dan
On Oct 2, 2014, at 1:49 PM, Iry Witham <Iry.Witham@jax.org> wrote:
___________________________________________________________Hi Team,
I have a new instance of galaxy cloudman running and have added tools from the toolshed to it. When I attempt to run tools like sam-to-bam or any gatk tool I am prompted for a reference genome. However, indices/references not available for these tools. I have added the following line to the sam_fa_indices.loc, but that did nothing:
index hg19 /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/gh19.fa
I have also added the following three lines to the gatk2_picard_index.loc:
hg19 hg19 Human (hg19) /mnt/galaxyIndices/genomes/Hsapiens/hg19/seq/hg19.faGRCh37 GRCh37 Human (GRCh37) /mnt/galaxyIndices/genomes/Hsapiens/GRCh37/seq/GRCh37.famm10 mm10 Mouse (mm10) /mnt/galaxyIndices/genomes/Mmusculus/mm10/seq/mm10.fa
I know I have missed something, but can't seem to figure it out. Could someone point me in the right direction?
Regards,__________________________________Iry T. WithamScientific Applications AdministratorComputational Sciences Group
The Jackson Laboratory
600 Main Street
Bar Harbor, ME 04609
Phone: 207-288-6744
email: iry.witham@jax.org
<372D007A-1B00-4668-BA6B-F0527C1F24BE[34][3].png>
The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/
The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/
The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.