Hi Clare,
Jeremy (from the team) ran a similar workshop several months ago and used some resource intensive tools (e.g., Tophat). We were concerned about the same scalability issues so we just started 4 separate clusters and divided the users across those. The approach worked well and it turned out we did not see any scalability issues. I think we went way overboard with 4 clusters but the approach did demonstrate an additional 'coolness' of the project allowing one to spin up 4 complete, identical clusters in a matter of minutes...
So, I feel you could replicate a similar approach but could probably go with 2 clusters only? Jeremy can hopefully provide some first hand comments as well.

When ti comes to the instance types, especially for the master node, I would strongly suggest an instance with a lot of memory. This is one thing I've noticed that greatly aids with cluster responsiveness, plus BWA can be a bit memory hungry.

Please let us know how the workshop goes (because, like you said, it's hard to test such environments),
Enis


On Thu, Nov 17, 2011 at 5:27 AM, Clare Sloggett <sloc@unimelb.edu.au> wrote:
Hi all (especially Enis :) ),

We are planning to use Amazon (Galaxy CloudMan) to run a workshop for
about 50 people. We won't need to transfer any data during the
workshop, but need the virtual cluster to be reasonably responsive and
cope with:
a) the load on the front end
b) the workshop participants each trying to run a bwa alignment - at
the moment each alignment would be of about 2.8M reads, but we could
cut it down
c) any other scalability issues I may not have thought of?

I wanted to ask if anyone has used CloudMan for a similar purpose, or
has an understanding, based on running a Galaxy cluster, of any
problems we might encounter? I can add enough nodes to the cluster on
the day to cope with the computational load (I assume) but I'm not
sure if I should be expecting any other problems.

Is the size of the node (e.g. Amazon's 4-core vs 8-core nodes) very
important? I can scale out by adding more nodes, but should I be
concerned about the capacity of the master node which handles the
traffic?

Also, is there any sensible way for me to test it in advance (in terms
of the user load)?

Many thanks for any advice!

Clare

--
E: sloc@unimelb.edu.au
P: 03 903 53357
M: 0414 854 759
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/