Hi, I am trying to incorporate GATK in my pipeline but not been able to make it work. I aligned my data with Hg 19 and then ran sam tool filter and then picard duplicate removal. I uploaded dbSNP and the reference FASTA file for Hg 19 in galaxy to run this pipeline. But for some reason GATK tool for base recalibration will not accept this output file. I wonder if there is sorting or indexing issue but how to fix this in galaxy. An error occurred running this job: Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/space/g2main [Mon Dec 10 10:30:42 EST 2012] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/space/g2main/tmp-gatk-tKp41A/gatk_input.fasta OUTPUT=/space/g2main/tmp-gatk-tKp41A/dict3503196447953523717.tmp Thanks, Umar
Hi, I have encountered the same kind of errors. When I update the loc files link to GATK, some of the tools display the reference genomes I added and some not. It seems that the galaxy wrapper for GATK 1.6 is not very functional. GATK don't really care because they are not supporting it any more, even documentation has disappeared. And I understand that galaxy developers have other stuff to do than supporting a tool that will disappear because it's not open source any more. I don't know what tool could replace the recalibration process done by GATK and don't know how to correct bugs neither. Any suggestions ? Philippe 2012/12/12 Farooq,Umar (res) <UFarooq@resident.uchc.edu>
Hi,
I am trying to incorporate GATK in my pipeline but not been able to make it work. I aligned my data with Hg 19 and then ran sam tool filter and then picard duplicate removal. I uploaded dbSNP and the reference FASTA file for Hg 19 in galaxy to run this pipeline. But for some reason GATK tool for base recalibration will not accept this output file. I wonder if there is sorting or indexing issue but how to fix this in galaxy.
An error occurred running this job: Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/space/g2main [Mon Dec 10 10:30:42 EST 2012] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/space/g2main/tmp-gatk-tKp41A/gatk_input.fasta OUTPUT=/space/g2main/tmp-gatk-tKp41A/dict3503196447953523717.tmp
Thanks, Umar
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
I'm having some problems with GATK as well, but do have a functional pipeline that uses the following GATK tools in Galaxy: - Realigner Target Creator - Indel Realigner - Unified Genotyper - Variant Filtration The main problem I'm having with them is that it seems I need to run the fasta/fastq groomer on all inputs before starting, and if I attempt to use the 'advanced' options on either of the last two steps above it fails immediately every time with a command-line option parsing error. I plan on digging into the wrapper script in the coming days in an attempt to correct this, which is currently attributed to Dan Blankenberg. I'm relatively new to Galaxy development though and don't know where to submit my updates though should I fix any of these problems. Joshua On Tue, Dec 11, 2012 at 4:26 PM, Philipe Moncuquet <philippe.mcqt@gmail.com>wrote:
Hi,
I have encountered the same kind of errors. When I update the loc files link to GATK, some of the tools display the reference genomes I added and some not. It seems that the galaxy wrapper for GATK 1.6 is not very functional. GATK don't really care because they are not supporting it any more, even documentation has disappeared. And I understand that galaxy developers have other stuff to do than supporting a tool that will disappear because it's not open source any more. I don't know what tool could replace the recalibration process done by GATK and don't know how to correct bugs neither. Any suggestions ?
Philippe
2012/12/12 Farooq,Umar (res) <UFarooq@resident.uchc.edu>
Hi,
I am trying to incorporate GATK in my pipeline but not been able to make it work. I aligned my data with Hg 19 and then ran sam tool filter and then picard duplicate removal. I uploaded dbSNP and the reference FASTA file for Hg 19 in galaxy to run this pipeline. But for some reason GATK tool for base recalibration will not accept this output file. I wonder if there is sorting or indexing issue but how to fix this in galaxy.
An error occurred running this job: Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/space/g2main [Mon Dec 10 10:30:42 EST 2012] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/space/g2main/tmp-gatk-tKp41A/gatk_input.fasta OUTPUT=/space/g2main/tmp-gatk-tKp41A/dict3503196447953523717.tmp
Thanks, Umar
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Joshua and Philipe thanks for the input. I was able to run the GATK in another pipeline but does not serve my purpose. It was FASTQ groomer followed by BWA alignment with human_g1k_v37 with the read groups. It produced a SAM file which I put into GATK pipeline with option "use as BAM(SAM file)". Once it started running, it kept going. But several issues with it. GATK best practices recommend removing the duplicates with Picard before this but after using Picard this output wont work in galaxy GATK. The second issue is if I align with Hg 19 because this is what I have been working with, even if I provide the reference H19 from history (uploaded) GATK does not work. All I needed was base recalibration and indel realignment so I decided to do these before running the picard duplicate removal tool and run everything with human_g1k_v37. It worked from base recalibration then indel realignment band then Picard duplicate removal. But I needed to run mpileup on this output. A very interesting thing I found that even though everything in this pipeline was with human_g1k_v37 but the mpileup will run only with Hg19. As I cannot rely on such a twisted pipeline so essentially cannot use GATK in galaxy. ________________________________________ From: Joshua Orvis [jorvis@gmail.com] Sent: Tuesday, December 11, 2012 5:34 PM To: Philipe Moncuquet Cc: Farooq,Umar (res); galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] GATK Not running I'm having some problems with GATK as well, but do have a functional pipeline that uses the following GATK tools in Galaxy: - Realigner Target Creator - Indel Realigner - Unified Genotyper - Variant Filtration The main problem I'm having with them is that it seems I need to run the fasta/fastq groomer on all inputs before starting, and if I attempt to use the 'advanced' options on either of the last two steps above it fails immediately every time with a command-line option parsing error. I plan on digging into the wrapper script in the coming days in an attempt to correct this, which is currently attributed to Dan Blankenberg. I'm relatively new to Galaxy development though and don't know where to submit my updates though should I fix any of these problems. Joshua On Tue, Dec 11, 2012 at 4:26 PM, Philipe Moncuquet <philippe.mcqt@gmail.com<mailto:philippe.mcqt@gmail.com>> wrote: Hi, I have encountered the same kind of errors. When I update the loc files link to GATK, some of the tools display the reference genomes I added and some not. It seems that the galaxy wrapper for GATK 1.6 is not very functional. GATK don't really care because they are not supporting it any more, even documentation has disappeared. And I understand that galaxy developers have other stuff to do than supporting a tool that will disappear because it's not open source any more. I don't know what tool could replace the recalibration process done by GATK and don't know how to correct bugs neither. Any suggestions ? Philippe 2012/12/12 Farooq,Umar (res) <UFarooq@resident.uchc.edu<mailto:UFarooq@resident.uchc.edu>> Hi, I am trying to incorporate GATK in my pipeline but not been able to make it work. I aligned my data with Hg 19 and then ran sam tool filter and then picard duplicate removal. I uploaded dbSNP and the reference FASTA file for Hg 19 in galaxy to run this pipeline. But for some reason GATK tool for base recalibration will not accept this output file. I wonder if there is sorting or indexing issue but how to fix this in galaxy. An error occurred running this job: Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/space/g2main [Mon Dec 10 10:30:42 EST 2012] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/space/g2main/tmp-gatk-tKp41A/gatk_input.fasta OUTPUT=/space/g2main/tmp-gatk-tKp41A/dict3503196447953523717.tmp Thanks, Umar ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org<http://usegalaxy.org>. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org<http://usegalaxy.org>. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Hi Joshua, Is this on the main public site? If so, can you share your history with me and I'll take a look? If this is on a local instance, can you provide additional information, such as the GATK version that you are using? Thanks for using Galaxy, Dan On Dec 11, 2012, at 5:34 PM, Joshua Orvis wrote:
I'm having some problems with GATK as well, but do have a functional pipeline that uses the following GATK tools in Galaxy:
- Realigner Target Creator - Indel Realigner - Unified Genotyper - Variant Filtration
The main problem I'm having with them is that it seems I need to run the fasta/fastq groomer on all inputs before starting, and if I attempt to use the 'advanced' options on either of the last two steps above it fails immediately every time with a command-line option parsing error. I plan on digging into the wrapper script in the coming days in an attempt to correct this, which is currently attributed to Dan Blankenberg. I'm relatively new to Galaxy development though and don't know where to submit my updates though should I fix any of these problems.
Joshua
On Tue, Dec 11, 2012 at 4:26 PM, Philipe Moncuquet <philippe.mcqt@gmail.com> wrote: Hi,
I have encountered the same kind of errors. When I update the loc files link to GATK, some of the tools display the reference genomes I added and some not. It seems that the galaxy wrapper for GATK 1.6 is not very functional. GATK don't really care because they are not supporting it any more, even documentation has disappeared. And I understand that galaxy developers have other stuff to do than supporting a tool that will disappear because it's not open source any more. I don't know what tool could replace the recalibration process done by GATK and don't know how to correct bugs neither. Any suggestions ?
Philippe
2012/12/12 Farooq,Umar (res) <UFarooq@resident.uchc.edu> Hi,
I am trying to incorporate GATK in my pipeline but not been able to make it work. I aligned my data with Hg 19 and then ran sam tool filter and then picard duplicate removal. I uploaded dbSNP and the reference FASTA file for Hg 19 in galaxy to run this pipeline. But for some reason GATK tool for base recalibration will not accept this output file. I wonder if there is sorting or indexing issue but how to fix this in galaxy.
An error occurred running this job: Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/space/g2main [Mon Dec 10 10:30:42 EST 2012] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/space/g2main/tmp-gatk-tKp41A/gatk_input.fasta OUTPUT=/space/g2main/tmp-gatk-tKp41A/dict3503196447953523717.tmp
Thanks, Umar
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Hi Philippe, The GATK wrappers provided with the Galaxy distribution are for GATK version 1.4. There is a set of 1.6/GATK-lite wrappers that has been developed by the team, but is not yet available. There may also be other options available in the Tool Shed that have been contributed by the community. Thanks for using Galaxy, Dan On Dec 11, 2012, at 5:26 PM, Philipe Moncuquet wrote:
Hi,
I have encountered the same kind of errors. When I update the loc files link to GATK, some of the tools display the reference genomes I added and some not. It seems that the galaxy wrapper for GATK 1.6 is not very functional. GATK don't really care because they are not supporting it any more, even documentation has disappeared. And I understand that galaxy developers have other stuff to do than supporting a tool that will disappear because it's not open source any more. I don't know what tool could replace the recalibration process done by GATK and don't know how to correct bugs neither. Any suggestions ?
Philippe
2012/12/12 Farooq,Umar (res) <UFarooq@resident.uchc.edu> Hi,
I am trying to incorporate GATK in my pipeline but not been able to make it work. I aligned my data with Hg 19 and then ran sam tool filter and then picard duplicate removal. I uploaded dbSNP and the reference FASTA file for Hg 19 in galaxy to run this pipeline. But for some reason GATK tool for base recalibration will not accept this output file. I wonder if there is sorting or indexing issue but how to fix this in galaxy.
An error occurred running this job: Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/space/g2main [Mon Dec 10 10:30:42 EST 2012] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/space/g2main/tmp-gatk-tKp41A/gatk_input.fasta OUTPUT=/space/g2main/tmp-gatk-tKp41A/dict3503196447953523717.tmp
Thanks, Umar
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Hi Umar, Can you click the eye icon to view the contents of the 'log' dataset for the GATK run. The end of the log should have the actual error encountered (the text you provided is a bit of a red herring) Since you are using hg19, the most likely cause for the error is that the reference fasta file you are using is not ordered properly, or that your alignments were made using a different genome (e.g. alignment with bwa using built-in hg19 [not ordered properly] and then GATK using a different hg19 fasta from your history.) If you are using a custom genome, make sure that it is GATK-ordered and that the same one is used in all steps; there is an hg19 GATK-ordered fasta file available in a Data library ('GATK') on Main. Thanks for using Galaxy, Dan On Dec 11, 2012, at 12:11 PM, Farooq,Umar (res) wrote:
Hi,
I am trying to incorporate GATK in my pipeline but not been able to make it work. I aligned my data with Hg 19 and then ran sam tool filter and then picard duplicate removal. I uploaded dbSNP and the reference FASTA file for Hg 19 in galaxy to run this pipeline. But for some reason GATK tool for base recalibration will not accept this output file. I wonder if there is sorting or indexing issue but how to fix this in galaxy.
An error occurred running this job: Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/space/g2main [Mon Dec 10 10:30:42 EST 2012] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/space/g2main/tmp-gatk-tKp41A/gatk_input.fasta OUTPUT=/space/g2main/tmp-gatk-tKp41A/dict3503196447953523717.tmp
Thanks, Umar
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Thanks Daniel for reply. I got the Hg19 file from the GATK bundle. After your reply I realigned the FASTQ in BWA with same Hg19 I was using from GATK. Following is error log. Please guide. Thanks, Umar INFO 08:47:28,627 HelpFormatter - --------------------------------------------------------------------------------- INFO 08:47:28,630 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.4-18-g80a4ce0, Compiled 2012/01/23 15:33:58 INFO 08:47:28,630 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 08:47:28,630 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki INFO 08:47:28,630 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa INFO 08:47:28,631 HelpFormatter - Program Args: -T CountCovariates --num_threads 4 -et NO_ET --recal_file /galaxy/main_pool/pool3/tmp/job_working_directory/004/779/4779763/galaxy_dataset_5444683.dat --standard_covs --run_without_dbsnp_potentially_ruining_quality -I /space/g2main/tmp-gatk-Ib7JbA/gatk_input.bam -R /space/g2main/tmp-gatk-Ib7JbA/gatk_input.fasta INFO 08:47:28,631 HelpFormatter - Date/Time: 2012/12/13 08:47:28 INFO 08:47:28,631 HelpFormatter - --------------------------------------------------------------------------------- INFO 08:47:28,632 HelpFormatter - --------------------------------------------------------------------------------- INFO 08:47:28,647 GenomeAnalysisEngine - Strictness is SILENT INFO 08:47:28,667 ReferenceDataSource - Index file /space/g2main/tmp-gatk-Ib7JbA/gatk_input.fasta.fai does not exist. Trying to create it now. PROGRESS UPDATE: file is 7 percent complete PROGRESS UPDATE: file is 15 percent complete PROGRESS UPDATE: file is 22 percent complete PROGRESS UPDATE: file is 28 percent complete PROGRESS UPDATE: file is 33 percent complete PROGRESS UPDATE: file is 39 percent complete PROGRESS UPDATE: file is 44 percent complete PROGRESS UPDATE: file is 49 percent complete PROGRESS UPDATE: file is 53 percent complete PROGRESS UPDATE: file is 57 percent complete PROGRESS UPDATE: file is 62 percent complete PROGRESS UPDATE: file is 66 percent complete PROGRESS UPDATE: file is 73 percent complete PROGRESS UPDATE: file is 79 percent complete PROGRESS UPDATE: file is 86 percent complete PROGRESS UPDATE: file is 91 percent complete PROGRESS UPDATE: file is 96 percent complete INFO 08:55:15,430 ReferenceDataSource - Dict file /space/g2main/tmp-gatk-Ib7JbA/gatk_input.dict does not exist. Trying to create it now. INFO 08:56:07,262 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 08:56:07,336 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.06 ________________________________________ From: Daniel Blankenberg [dan@bx.psu.edu] Sent: Wednesday, December 12, 2012 4:29 PM To: Farooq,Umar (res) Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] GATK Not running Hi Umar, Can you click the eye icon to view the contents of the 'log' dataset for the GATK run. The end of the log should have the actual error encountered (the text you provided is a bit of a red herring) Since you are using hg19, the most likely cause for the error is that the reference fasta file you are using is not ordered properly, or that your alignments were made using a different genome (e.g. alignment with bwa using built-in hg19 [not ordered properly] and then GATK using a different hg19 fasta from your history.) If you are using a custom genome, make sure that it is GATK-ordered and that the same one is used in all steps; there is an hg19 GATK-ordered fasta file available in a Data library ('GATK') on Main. Thanks for using Galaxy, Dan On Dec 11, 2012, at 12:11 PM, Farooq,Umar (res) wrote:
Hi,
I am trying to incorporate GATK in my pipeline but not been able to make it work. I aligned my data with Hg 19 and then ran sam tool filter and then picard duplicate removal. I uploaded dbSNP and the reference FASTA file for Hg 19 in galaxy to run this pipeline. But for some reason GATK tool for base recalibration will not accept this output file. I wonder if there is sorting or indexing issue but how to fix this in galaxy.
An error occurred running this job: Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/space/g2main [Mon Dec 10 10:30:42 EST 2012] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/space/g2main/tmp-gatk-tKp41A/gatk_input.fasta OUTPUT=/space/g2main/tmp-gatk-tKp41A/dict3503196447953523717.tmp
Thanks, Umar
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (4)
-
Daniel Blankenberg
-
Farooq,Umar (res)
-
Joshua Orvis
-
Philipe Moncuquet