Using Galaxy Cloudman for a workshop
Hi all (especially Enis :) ), We are planning to use Amazon (Galaxy CloudMan) to run a workshop for about 50 people. We won't need to transfer any data during the workshop, but need the virtual cluster to be reasonably responsive and cope with: a) the load on the front end b) the workshop participants each trying to run a bwa alignment - at the moment each alignment would be of about 2.8M reads, but we could cut it down c) any other scalability issues I may not have thought of? I wanted to ask if anyone has used CloudMan for a similar purpose, or has an understanding, based on running a Galaxy cluster, of any problems we might encounter? I can add enough nodes to the cluster on the day to cope with the computational load (I assume) but I'm not sure if I should be expecting any other problems. Is the size of the node (e.g. Amazon's 4-core vs 8-core nodes) very important? I can scale out by adding more nodes, but should I be concerned about the capacity of the master node which handles the traffic? Also, is there any sensible way for me to test it in advance (in terms of the user load)? Many thanks for any advice! Clare -- E: sloc@unimelb.edu.au P: 03 903 53357 M: 0414 854 759
Hi Clare, Jeremy (from the team) ran a similar workshop several months ago and used some resource intensive tools (e.g., Tophat). We were concerned about the same scalability issues so we just started 4 separate clusters and divided the users across those. The approach worked well and it turned out we did not see any scalability issues. I think we went way overboard with 4 clusters but the approach did demonstrate an additional 'coolness' of the project allowing one to spin up 4 complete, identical clusters in a matter of minutes... So, I feel you could replicate a similar approach but could probably go with 2 clusters only? Jeremy can hopefully provide some first hand comments as well. When ti comes to the instance types, especially for the master node, I would strongly suggest an instance with a lot of memory. This is one thing I've noticed that greatly aids with cluster responsiveness, plus BWA can be a bit memory hungry. Please let us know how the workshop goes (because, like you said, it's hard to test such environments), Enis On Thu, Nov 17, 2011 at 5:27 AM, Clare Sloggett <sloc@unimelb.edu.au> wrote:
Hi all (especially Enis :) ),
We are planning to use Amazon (Galaxy CloudMan) to run a workshop for about 50 people. We won't need to transfer any data during the workshop, but need the virtual cluster to be reasonably responsive and cope with: a) the load on the front end b) the workshop participants each trying to run a bwa alignment - at the moment each alignment would be of about 2.8M reads, but we could cut it down c) any other scalability issues I may not have thought of?
I wanted to ask if anyone has used CloudMan for a similar purpose, or has an understanding, based on running a Galaxy cluster, of any problems we might encounter? I can add enough nodes to the cluster on the day to cope with the computational load (I assume) but I'm not sure if I should be expecting any other problems.
Is the size of the node (e.g. Amazon's 4-core vs 8-core nodes) very important? I can scale out by adding more nodes, but should I be concerned about the capacity of the master node which handles the traffic?
Also, is there any sensible way for me to test it in advance (in terms of the user load)?
Many thanks for any advice!
Clare
-- E: sloc@unimelb.edu.au P: 03 903 53357 M: 0414 854 759 ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Jeremy (from the team) ran a similar workshop several months ago and used some resource intensive tools (e.g., Tophat). We were concerned about the same scalability issues so we just started 4 separate clusters and divided the users across those. The approach worked well and it turned out we did not see any scalability issues. I think we went way overboard with 4 clusters but the approach did demonstrate an additional 'coolness' of the project allowing one to spin up 4 complete, identical clusters in a matter of minutes... So, I feel you could replicate a similar approach but could probably go with 2 clusters only? Jeremy can hopefully provide some first hand comments as well.
Yes, this approach worked well when I did it. I created all the data and workflows needed for the workshop on one Galaxy instance, shared/cloned that instance and set up 3 additional instances, and divided users evenly amongst the instances. 2-3 clusters will probably meet your needs with 50 participants. Scalability issues are more likely to arise on the back end than the front end, so you'll want to ensure that you have enough compute nodes. BWA uses four nodes by default--Enis, does the cloud config change this parameter?--so you'll want 4x50 or 200 total nodes if you want everyone to be able to run a BWA job simultaneously. Good luck, J.
Hi Enis, Jeremy, Thanks, that's really helpful! Jeremy do you remember what kind of instance you used per node, e.g. Amazon's 'large' (2 core / 4ECU / 7.5GB) or 'xlarge' (4 core / 8ECU / 15GB)? This would be a good sanity check for me. Enis when you say a lot of memory, would you consider the 15GB nodes 'a lot'? I would generally consider that a lot in the context of a web server but not so much in NGS, so it's all relative! Amazon does have double-memory instances available. Also, when you say a lot of memory especially for the master node - does this imply that I can choose a different specification for the master node than the compute nodes? I thought they had to be all identical, but just checking. Thanks, Clare On Fri, Nov 18, 2011 at 12:56 AM, Jeremy Goecks <jeremy.goecks@emory.edu> wrote:
Jeremy (from the team) ran a similar workshop several months ago and used some resource intensive tools (e.g., Tophat). We were concerned about the same scalability issues so we just started 4 separate clusters and divided the users across those. The approach worked well and it turned out we did not see any scalability issues. I think we went way overboard with 4 clusters but the approach did demonstrate an additional 'coolness' of the project allowing one to spin up 4 complete, identical clusters in a matter of minutes... So, I feel you could replicate a similar approach but could probably go with 2 clusters only? Jeremy can hopefully provide some first hand comments as well.
Yes, this approach worked well when I did it. I created all the data and workflows needed for the workshop on one Galaxy instance, shared/cloned that instance and set up 3 additional instances, and divided users evenly amongst the instances. 2-3 clusters will probably meet your needs with 50 participants.
Scalability issues are more likely to arise on the back end than the front end, so you'll want to ensure that you have enough compute nodes. BWA uses four nodes by default--Enis, does the cloud config change this parameter?--so you'll want 4x50 or 200 total nodes if you want everyone to be able to run a BWA job simultaneously.
Good luck, J.
-- E: sloc@unimelb.edu.au P: 03 903 53357 M: 0414 854 759
Clare I wonder if you would be open to share the details of your setup and what you had to do for this once you are done. I think it would be incredibly useful to community (at least it would be for me ;-)) Thanks! On Nov 17, 2011, at 5:14 PM, Clare Sloggett wrote:
Hi Enis, Jeremy,
Thanks, that's really helpful! Jeremy do you remember what kind of instance you used per node, e.g. Amazon's 'large' (2 core / 4ECU / 7.5GB) or 'xlarge' (4 core / 8ECU / 15GB)? This would be a good sanity check for me.
Enis when you say a lot of memory, would you consider the 15GB nodes 'a lot'? I would generally consider that a lot in the context of a web server but not so much in NGS, so it's all relative! Amazon does have double-memory instances available.
Also, when you say a lot of memory especially for the master node - does this imply that I can choose a different specification for the master node than the compute nodes? I thought they had to be all identical, but just checking.
Thanks, Clare
On Fri, Nov 18, 2011 at 12:56 AM, Jeremy Goecks <jeremy.goecks@emory.edu> wrote:
Jeremy (from the team) ran a similar workshop several months ago and used some resource intensive tools (e.g., Tophat). We were concerned about the same scalability issues so we just started 4 separate clusters and divided the users across those. The approach worked well and it turned out we did not see any scalability issues. I think we went way overboard with 4 clusters but the approach did demonstrate an additional 'coolness' of the project allowing one to spin up 4 complete, identical clusters in a matter of minutes... So, I feel you could replicate a similar approach but could probably go with 2 clusters only? Jeremy can hopefully provide some first hand comments as well.
Yes, this approach worked well when I did it. I created all the data and workflows needed for the workshop on one Galaxy instance, shared/cloned that instance and set up 3 additional instances, and divided users evenly amongst the instances. 2-3 clusters will probably meet your needs with 50 participants.
Scalability issues are more likely to arise on the back end than the front end, so you'll want to ensure that you have enough compute nodes. BWA uses four nodes by default--Enis, does the cloud config change this parameter?--so you'll want 4x50 or 200 total nodes if you want everyone to be able to run a BWA job simultaneously.
Good luck, J.
-- E: sloc@unimelb.edu.au P: 03 903 53357 M: 0414 854 759
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Ravi Madduri madduri@mcs.anl.gov
On Fri, Nov 18, 2011 at 12:56 AM, Jeremy Goecks <jeremy.goecks@emory.edu> wrote:
Scalability issues are more likely to arise on the back end than the front end, so you'll want to ensure that you have enough compute nodes. BWA uses four nodes by default--Enis, does the cloud config change this parameter?--so you'll want 4x50 or 200 total nodes if you want everyone to be able to run a BWA job simultaneously.
Actually, one other question - this paragraph makes me realise that I don't really understand how Galaxy is distributing jobs. I had thought that each job would only use one node, and in some cases take advantage of multiple cores within that node. I'm taking a "node" to be a set of cores with their own shared memory, so in this case a VM instance, is this right? If some types of jobs can be distributed over multiple nodes, can I configure, in Galaxy, how many nodes they should use? Thanks again, Clare -- E: sloc@unimelb.edu.au P: 03 903 53357 M: 0414 854 759
On Fri, Nov 18, 2011 at 12:56 AM, Jeremy Goecks <jeremy.goecks@emory.edu> wrote:
Scalability issues are more likely to arise on the back end than the front end, so you'll want to ensure that you have enough compute nodes. BWA uses four nodes by default--Enis, does the cloud config change this parameter?--so you'll want 4x50 or 200 total nodes if you want everyone to be able to run a BWA job simultaneously.
Actually, one other question - this paragraph makes me realise that I don't really understand how Galaxy is distributing jobs. I had thought that each job would only use one node, and in some cases take advantage of multiple cores within that node. I'm taking a "node" to be a set of cores with their own shared memory, so in this case a VM instance, is this right? If some types of jobs can be distributed over multiple nodes, can I configure, in Galaxy, how many nodes they should use?
You're right -- my word choices were poor. Replace 'node' with 'core' in my paragraph to get an accurate suggestion for resources. Galaxy uses a job scheduler--SGE on the cloud--to distribute jobs to different cluster nodes. Jobs that require multiple cores typically run on a single node. Enis can chime in on whether CloudMan supports job submission over multiple nodes; this would require setup of an appropriate parallel environment and a tool that can make use of this environment. Good luck, J.
Hi Ravi, Yes of course.. it should all be finished by 8th December so if I forget please feel free to ask me about the details after that! Cheers, Clare On Nov 17, 2011 3:27 PM, "Clare Sloggett" <sloc@unimelb.edu.au> wrote:
Hi all (especially Enis :) ),
We are planning to use Amazon (Galaxy CloudMan) to run a workshop for about 50 people. We won't need to transfer any data during the workshop, but need the virtual cluster to be reasonably responsive and cope with: a) the load on the front end b) the workshop participants each trying to run a bwa alignment - at the moment each alignment would be of about 2.8M reads, but we could cut it down c) any other scalability issues I may not have thought of?
I wanted to ask if anyone has used CloudMan for a similar purpose, or has an understanding, based on running a Galaxy cluster, of any problems we might encounter? I can add enough nodes to the cluster on the day to cope with the computational load (I assume) but I'm not sure if I should be expecting any other problems.
Is the size of the node (e.g. Amazon's 4-core vs 8-core nodes) very important? I can scale out by adding more nodes, but should I be concerned about the capacity of the master node which handles the traffic?
Also, is there any sensible way for me to test it in advance (in terms of the user load)?
Many thanks for any advice!
Clare
-- E: sloc@unimelb.edu.au P: 03 903 53357 M: 0414 854 759
participants (4)
-
Clare Sloggett
-
Enis Afgan
-
Jeremy Goecks
-
Ravi Madduri