Galaxy on the Cloud/RNA-Seq

23 Nov 2010

      Hello,

We are about to get about 200 GB of illumina reads(43 bp) from 20 samples,
two groups of 10 animals.  We are hoping to use Galaxy on the Cloud to
compare gene expression between the two groups.  First of all, do you think
this is possible with the current state of Galaxy Cloud development?
Secondly, we are currently practicing with small drosophila datasets (4 sets
of 2 GB each), and over the course of a few days of doing relatively little
besides grooming and filtering the data, we had already been charged $60 by
Amazon, which we thought was a bit inefficient.  What is the best way to
proceed working from one day to the next?  Should one terminate the cluster
at Cloud Console and then stop(pause) the cluster at the AWS console, and
then restart the instance the next day?  Does one have to reattach all of
the EBS volumes before restarting the cluster?  We were just terminating the
instance and then bringing it back up and all the data was still there, ie
it worked fine, but when we looked after a couple days there were 45 EBS
volumes there -  much of it was surely redundant as our data wasn¹t very
large.   Perhaps we need to take a snapshot and reboot the instance from
this? Thank you for any hints regarding this matter, this is all very new to
me.  Let me know if you need clarification or more information.

David Martin
dmarti@lsuhsc.edu

David Martin

Maximilian Haussler

Anton Nekrutenko

Enis Afgan

Enis Afgan

Maximilian Haussler

Hiram Clawson

tags

participants (5)