
I'm curious what is this genome called 'hg_g1k_v37' and how does it correspond to NCBI GRCh37 which is identical to UCSC hg19 ? --Hiram Jennifer Jackson wrote:
UCSC does not contain the genome 'hg_g1k_v37' - the genome available from UCSC is 'hg19'.
Even though these are technically the same human release, on a practical level, they have a different arrangement for some of the chromosomes. You can compare NBCI GRCh37 <http://www.ncbi.nlm.nih.gov/genome/assembly/2758/> with UCSC hg19 <http://genome.ucsc.edu> for an explanation. Reference genomes must be /exact/ in order to be used with tools - base for base. When they are exact, the identifier will be exact between Galaxy and the source (UCSC, Ensembl) or the full Build name will provide enough information to make a connection to NCBI or other.
Sometimes genomes are similar enough that a dataset sourced from one can be used with another, if the database attribute is changed and the data from the regions that differ is removed. This may be possible in your case, only trying will let you know how difficult it actually is with your analysis. The GATK pipeline is very sensitive to exact inputs. You will need to be careful with genome database assignments, etc. Following the links on the tool forms to the GATK help pages can provide some more detail about expected inputs, if this is something that you are going to try.