Hey Andy, Financially, it's probably best to start small cluster-wise. What I'd probably recommend for your particular project would be using a single m1.xlarge instance as the head node, seeing how that goes, and then adding workers as you find it useful. Should you find that it isn't enough, it's trivially easy using cloudman to shut the cluster down and restart with a much larger AMI. Regarding extra nodes -- Given that you're a single user, in serial pipelines and workflows where each job depends on the previous one it isn't useful to have extra instances at all and you'd only waste money. If your analysis can be done in parallel, however, say you have multiple samples all requiring the same basic preliminary steps, then extra nodes can definitely help get the work done much faster. You could also use cloudman's autoscaling to handle this; it would automatically scale up the cluster (while adhering to your min/max parameters) as necessary to process jobs as fast as possible while trimming any idle nodes to prevent waste. Lastly, depending on the analysis you need to do, you may find you need a high memory instance. In this case (given your m1.xlarge head node) you can either restart your instance using the larger node, or even simpler disable job running on the head node in the interface and add a high memory worker instance temporarily to handle the special demand. Let me know if there's anything else I can do to help! -Dannon On Jan 23, 2013, at 3:40 PM, Andrew Norman <anorman07@gmail.com> wrote:
Hi all
I'd like to use AWS EC2 cluster to run Galaxy TopHat to analyze the RNA seq data for my project. I'm trying to price out the resources that I'll need to do this, but I don't have any experience setting up clusters, virtual or real, so I'd like to get the insight of someone who has done this. I have studied the wiki page dedicated to this topic (http://wiki.galaxyproject.org/CloudMan/AWS/CapacityPlanning) but I don't know how many worker nodes I'll need, and I will just be doing my personal analysis with this cluster (only one TopHat analysis at a time).
Can anyone help me out with this? Thanks!
Andy ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: