Regarding the cloud instance, you can load data from the public main
instance of Galaxy just like any other URL. On the "Get Data ->
Upload Data" form on your cloud instance , paste in the URLs of the
datasets from main. The URL can be captured by right-clicking on a
dataset's disk icon and then "Copy link location" (on a Mac; do the
equivalent if using a PC).
It is generally better to transfer one URL per job, if the data is
large, since jobs have a certain amount of time to complete. If you
lump together several large file URLs into one job, there could be a
chance that it could time out. It is fine to execute several jobs
concurrently.
Best,
Jen
Galaxy team
On 6/27/12 6:51 AM, Lilach Friedman
wrote:
Hi Jennifer,
Is there a way to directly upload my files from the public
Galaxy to my cloud Galaxy instance (in AWS)? Or should I
download them first to my computer, and then to upload them? (It
takes a lot of time because of the low uploading speed).
Currently, the human reference genome indexed for the
GATK-beta tools is 'hg_g1k_v37'. The GATK-beta tools are
under active revision by our team, so we expect there to
be little to no change to the beta version on the main
public instance until this is completed.
Attempting to convert data between different builds is not
recommended. These tools are very sensitive to exact
inputs, which extends to naming conventions, etc. The best
practice path is to start and continue an analysis project
with the same exact genome build throughout.
If you want to use the hg19 indexes provided by the GATK
project, a cloud instance is the current option (using a
hg19 genome as a 'custom genome' will exceed the
processing limits available on the public Galaxy
instance). Following the links on the GATK tools can
provide more information about sources, including links on
the GATK web site which will note the exact contents of
the both of these genome versions, downloads, and other
resources.
Hopefully this helps to clear up any confusion,
Best,
Jen
Galaxy team
On 6/21/12 7:50 AM, Lilach Friedman wrote:
Hi Jennifer,
Thank you for this reply.
I made a new BWA file, this time using the
hg19(full) genome.
However, when I am trying to use DepthOfCoverage,
the reference genomr is stucked on the hg_g1k_v37
(this is the only option to select), and I cannot
change it to hg19(full). Most probably, because I
selected hg_g1k_v37 in the previous time I tried
to use DepthOfCoverage.
It seems as a bug? How can I change it?
The problem with this analysis probably has
to do with a mismatch between the genomes:
the intervals obtained from UCSC (hg19) and
the BAM from your BWA (hg_g1k_v37) run.
UCSC does not contain the genome
'hg_g1k_v37' - the genome available from
UCSC is 'hg19'.
Even though these are technically the same
human release, on a practical level, they
have a different arrangement for some of the
chromosomes. You can compare NBCI GRCh37
with UCSC hg19
for an explanation. Reference genomes must
be exact in order to be used with
tools - base for base. When they are exact,
the identifier will be exact between Galaxy
and the source (UCSC, Ensembl) or the full
Build name will provide enough information
to make a connection to NCBI or other.
Sometimes genomes are similar enough that a
dataset sourced from one can be used with
another, if the database attribute is
changed and the data from the regions that
differ is removed. This may be possible in
your case, only trying will let you know how
difficult it actually is with your analysis.
The GATK pipeline is very sensitive to exact
inputs. You will need to be careful with
genome database assignments, etc. Following
the links on the tool forms to the GATK help
pages can provide some more detail about
expected inputs, if this is something that
you are going to try.
Good luck with the re-run!
Jen
Galaxy team
On 6/18/12 4:42 AM, Lilach Friedman
wrote:
Hi,
I am trying to used Depth of
Coverage to see the coverages is
specific intervals.
The intervals were taken from UCSC
(exons of 2 genes), loaded to
Galaxy and the file type was
changed to intervals.
I gave to Depth of Coverage two
BAM files (resulted from BWA,
selection of only raws with the
Matching pattern: XT:A:U, and then
SAM-to-BAM)
and the intervals file (in
advanced GATK options).
The consensus genome is
hg_g1k_v37.
I got the following error message:
An
error occurred running this job:
Picked
up _JAVA_OPTIONS:
-Djava.io.tmpdir=/space/g2main #####
ERROR
------------------------------------------------------------------------------------------ #####
ERROR A USER ERROR has
occurred (version
1.4-18-g80a4ce0): #####
ERROR The invalid argume
Is it a bug, or did I do
anything wrong?
I will be grateful for any help.
Thanks!
Lilach
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/