The genome build 'hg_g1k_v37' is build "b37" in the GATK
documentation. Hg19 is also included (as a distinct build). I
encourage you to examine these if you are interested in crossing
over between genomes or identifying other projects that have data
based on the same genome build.
"
GATK resource bundle: A collection of standard files for working
with human resequencing data with the GATK.
The standard reference sequence we use in the GATK is the the b37
edition from the Human Genome Reference Consortium. All of the key
GATK data files are available against this reference sequence.
Additionally, we used to use UCSC-style (chr1, not 1) for build
hg18, and provide lifted-over files from b37 to hg18 for those still
using those files.
b37 resources: the standard data set
*
Reference sequence (standard 1000 Genomes fasta) along with fai and
dict files
<more, please follow link for details ...>
hg19 resources: lifted over from b37
* Includes the UCSC-style hg19 reference along with all lifted over
VCF files."
Hopefully this helps,
Jen
Galaxy team
On 6/27/12 7:09 AM, Lilach Friedman
wrote:
May I join to the question of Carlos? what is
exactly hg_g1k_v37? and how can I get the intervals of specific
genes in this format?
Hi Jennifer,
Is there a way to directly upload my files from the public
Galaxy to my cloud Galaxy instance (in AWS)? Or should I
download them first to my computer, and then to upload them?
(It takes a lot of time because of the low uploading
speed).
Currently, the human reference genome indexed for
the GATK-beta tools is 'hg_g1k_v37'. The GATK-beta
tools are under active revision by our team, so we
expect there to be little to no change to the beta
version on the main public instance until this is
completed.
Attempting to convert data between different
builds is not recommended. These tools are very
sensitive to exact inputs, which extends to naming
conventions, etc. The best practice path is to
start and continue an analysis project with the
same exact genome build throughout.
If you want to use the hg19 indexes provided by
the GATK project, a cloud instance is the current
option (using a hg19 genome as a 'custom genome'
will exceed the processing limits available on the
public Galaxy instance). Following the links on
the GATK tools can provide more information about
sources, including links on the GATK web site
which will note the exact contents of the both of
these genome versions, downloads, and other
resources.
Hopefully this helps to clear up any confusion,
Best,
Jen
Galaxy team
On 6/21/12 7:50 AM, Lilach Friedman
wrote:
Hi Jennifer,
Thank you for this reply.
I made a new BWA file, this time using the
hg19(full) genome.
However, when I am trying to use
DepthOfCoverage, the reference genomr is
stucked on the hg_g1k_v37 (this is the
only option to select), and I cannot
change it to hg19(full). Most probably,
because I selected hg_g1k_v37 in the
previous time I tried to use
DepthOfCoverage.
It seems as a bug? How can I change it?
The problem with this analysis
probably has to do with a mismatch
between the genomes: the intervals
obtained from UCSC (hg19) and the
BAM from your BWA (hg_g1k_v37) run.
UCSC does not contain the genome
'hg_g1k_v37' - the genome available
from UCSC is 'hg19'.
Even though these are technically
the same human release, on a
practical level, they have a
different arrangement for some of
the chromosomes. You can compare
NBCI GRCh37 with UCSC
hg19 for an
explanation. Reference genomes must
be exact in order to be used
with tools - base for base. When
they are exact, the identifier will
be exact between Galaxy and the
source (UCSC, Ensembl) or the full
Build name will provide enough
information to make a connection to
NCBI or other.
Sometimes genomes are similar enough
that a dataset sourced from one can
be used with another, if the
database attribute is changed and
the data from the regions that
differ is removed. This may be
possible in your case, only trying
will let you know how difficult it
actually is with your analysis. The
GATK pipeline is very sensitive to
exact inputs. You will need to be
careful with genome database
assignments, etc. Following the
links on the tool forms to the GATK
help pages can provide some more
detail about expected inputs, if
this is something that you are going
to try.
Good luck with the re-run!
Jen
Galaxy team
On 6/18/12 4:42 AM, Lilach
Friedman wrote:
Hi,
I am trying to used Depth
of Coverage to see the
coverages is specific
intervals.
The intervals were taken
from UCSC (exons of 2
genes), loaded to Galaxy
and the file type was
changed to intervals.
I gave to Depth of
Coverage two BAM files
(resulted from BWA,
selection of only raws
with the Matching pattern:
XT:A:U, and then
SAM-to-BAM)
and the intervals file (in
advanced GATK options).
The consensus genome is
hg_g1k_v37.
I got the following error
message:
An
error occurred running
this job: Picked up _JAVA_OPTIONS:
-Djava.io.tmpdir=/space/g2main #####
ERROR
------------------------------------------------------------------------------------------ #####
ERROR A USER ERROR has
occurred (version
1.4-18-g80a4ce0): #####
ERROR The invalid
argume
Is it a bug, or did I
do anything wrong?
I will be grateful for any
help.
Thanks!
Lilach
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/