Hi Jennifer, Is there a way to directly upload my files from the public Galaxy to my cloud Galaxy instance (in AWS)? Or should I download them first to my computer, and then to upload them? (It takes a lot of time because of the low uploading speed).
Currently, the human reference genome indexed for the GATK-beta
tools is 'hg_g1k_v37'. The GATK-beta tools are under active revision
by our team, so we expect there to be little to no change to the
beta version on the main public instance until this is completed.
Attempting to convert data between different builds is not
recommended. These tools are very sensitive to exact inputs, which
extends to naming conventions, etc. The best practice path is to
start and continue an analysis project with the same exact genome
build throughout.
If you want to use the hg19 indexes provided by the GATK project, a
cloud instance is the current option (using a hg19 genome as a
'custom genome' will exceed the processing limits available on the
public Galaxy instance). Following the links on the GATK tools can
provide more information about sources, including links on the GATK
web site which will note the exact contents of the both of these
genome versions, downloads, and other resources.
Hopefully this helps to clear up any confusion,
Best,
Jen
Galaxy team
On 6/21/12 7:50 AM, Lilach Friedman
wrote:
Hi Jennifer,
Thank you for this reply.
I made a new BWA file, this time using the hg19(full) genome.
However, when I am trying to use DepthOfCoverage, the reference
genomr is stucked on the hg_g1k_v37 (this is the only option to
select), and I cannot change it to hg19(full). Most probably,
because I selected hg_g1k_v37 in the previous time I tried to
use DepthOfCoverage.
It seems as a bug? How can I change it?
The problem with this analysis probably has to do with a
mismatch between the genomes: the intervals obtained from
UCSC (hg19) and the BAM from your BWA (hg_g1k_v37) run.
UCSC does not contain the genome 'hg_g1k_v37' - the genome
available from UCSC is 'hg19'.
Even though these are technically the same human release,
on a practical level, they have a different arrangement
for some of the chromosomes. You can compare NBCI GRCh37 with UCSC hg19 for an explanation. Reference
genomes must be exact in order to be used with
tools - base for base. When they are exact, the identifier
will be exact between Galaxy and the source (UCSC,
Ensembl) or the full Build name will provide enough
information to make a connection to NCBI or other.
Sometimes genomes are similar enough that a dataset
sourced from one can be used with another, if the database
attribute is changed and the data from the regions that
differ is removed. This may be possible in your case, only
trying will let you know how difficult it actually is with
your analysis. The GATK pipeline is very sensitive to
exact inputs. You will need to be careful with genome
database assignments, etc. Following the links on the tool
forms to the GATK help pages can provide some more detail
about expected inputs, if this is something that you are
going to try.
Good luck with the re-run!
Jen
Galaxy team
On 6/18/12 4:42 AM, Lilach Friedman wrote:
Hi,
I am trying to used Depth of Coverage to see the
coverages is specific intervals.
The intervals were taken from UCSC (exons of 2
genes), loaded to Galaxy and the file type was
changed to intervals.
I gave to Depth of Coverage two BAM files
(resulted from BWA, selection of only raws with
the Matching pattern: XT:A:U, and then
SAM-to-BAM)
and the intervals file (in advanced GATK
options).
The consensus genome is hg_g1k_v37.
I got the following error message:
An
error occurred running this job: Picked up
_JAVA_OPTIONS:
-Djava.io.tmpdir=/space/g2main ##### ERROR
------------------------------------------------------------------------------------------ ##### ERROR A
USER ERROR has occurred (version
1.4-18-g80a4ce0): ##### ERROR
The invalid argume
Is it a bug, or did I do anything wrong?
I will be grateful for any help.
Thanks!
Lilach
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/