All,
It is noticed that Galaxy/GATK indexes reference fasta & dbSNP file everytime when it runs. Re-indexing takes time (~10min), hence it affects overall run time when it use for multiple times. However, this could be avoided by reusing the
available index. Here is the snapshot of the log:
INFO 11:43:57,365 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.4-21-g30b937d, Compiled 2012/02/01 19:01:14
INFO 11:43:57,365 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 11:43:57,365 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
INFO 11:43:57,366 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
INFO 11:43:57,367 HelpFormatter - ---------------------------------------------------------------------------------
INFO 11:43:57,429 GenomeAnalysisEngine - Strictness is STRICT
INFO 11:43:57,432 ReferenceDataSource - Index file /tmp/tmp-gatk-6jlUfH/gatk_input.fasta.fai does not exist. Trying to create it now.
PROGRESS UPDATE: file is 15 percent complete
PROGRESS UPDATE: file is 28 percent complete
PROGRESS UPDATE: file is 91 percent complete
INFO 11:45:32,231 ReferenceDataSource - Dict file /tmp/tmp-gatk-6jlUfH/gatk_input.dict does not exist. Trying to create it now.
INFO 11:45:54,262 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 11:45:54,280 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
INFO 11:45:54,304 RMDTrackBuilder - Creating Tribble index in memory for file /tmp/tmp-gatk-6jlUfH/input_dbsnp_0.vcf
INFO 11:48:05,910 RMDTrackBuilder - Writing Tribble index to disk for file /tmp/tmp-gatk-6jlUfH/input_dbsnp_0.vcf.idx
Do we have any option/alternate in Galaxy to avoid this re-indexing at /tmp, as I have already built the index for reference and dbSNP.
Look forward to any suggestions.
Thanks,
Raj