Re: [galaxy-user] Galaxy on the Cloud/RNA-Seq

23 Nov 2010

      Your approach for terminating a cluster and starting it back up when it's
needed should continue to be fine for your purposes. That's the best and
pretty much the only way to minimize the cost.
The reason there are 45 EBS volumes created is because each time you start
an instance, a root EBS volume from snapshot 'snap-f3a64f99' is created to
serve as the root file system. When you terminate that particular instance,
that EBS volume is no longer needed and can be deleted (in the next AMI we
build, we will enable deletion of that volume automatically upon instance
termination). In other words, feel free to delete all EBS volumes that were
created from a snapshot; they can be and are recreated when needed. The only
volume that should not be deleted is your data volume. The ID of this volume
can be found in your cluster's bucket (cm-<HASH>) in your S3 account in file
named persistent_data.txt
As a reference, don't attach/detach EBS volumes manually to running Galaxy
Cloud instances because the application will lose track of them and not be
able to recover. In addition, always click 'Terminate cluster' on the Galaxy
Cloud main UI and wait for it to shutdown all of he services; then
*terminate* the master instance from AWS console (don't *stop* the
instance).

As far as uploading 200GB of data to a cloud instance and processing it
there. In principle, it should work. However, there is a 1TB limit on EBS
volumes imposed by Amazon. As a result, and considering the multiple
transformation steps your data will have to go through within Galaxy, I am
concerned that you will reach that 1TB limit. We will be working on
expanding beyond that limit by composing a filesystem from multiple EBS
volumes but that's not available yet.

Hope this helps; let us know if you have any more questions,
Enis

On Tue, Nov 23, 2010 at 3:17 PM, David Martin <dmarti@lsuhsc.edu> wrote:
...
Hello,
We are about to get about 200 GB of illumina reads(43 bp) from 20 samples,
two groups of 10 animals.  We are hoping to use Galaxy on the Cloud to
compare gene expression between the two groups.  First of all, do you think
this is possible with the current state of Galaxy Cloud development?
 Secondly, we are currently practicing with small drosophila datasets (4
sets of 2 GB each), and over the course of a few days of doing relatively
little besides grooming and filtering the data, we had already been charged
$60 by Amazon, which we thought was a bit inefficient.  What is the best way
to proceed working from one day to the next?  Should one terminate the
cluster at Cloud Console and then stop(pause) the cluster at the AWS
console, and then restart the instance the next day?  Does one have to
reattach all of the EBS volumes before restarting the cluster?  We were just
terminating the instance and then bringing it back up and all the data was
still there, ie it worked fine, but when we looked after a couple days there
were 45 EBS volumes there -  much of it was surely redundant as our data
wasn’t very large.   Perhaps we need to take a snapshot and reboot the
instance from this? Thank you for any hints regarding this matter, this is
all very new to me.  Let me know if you need clarification or more
information.
David Martin
dmarti@lsuhsc.edu
_______________________________________________
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user

Re: [galaxy-user] Galaxy on the Cloud/RNA-Seq

Enis Afgan