A Galaxy environment that can support up to 35 users?
Hi, Galaxy Developers, Is anybody out there managing a Galaxy environment that was designed and or has been tested to support 35 concurrent users? The reason why I am asking this is because we [the U of C] have a training session coming up this Thursday, and the environment we have deployed needs to support this number of users. We have put the server under as high as stress as possible with 6 users, and Galaxy has performed fine, however it has proven somewhat challenging to do load testing for all 35 concurrent users prior to the workshop. I can't help but feel we are rolling the dice a little bit as we've never put the server under anything close to this load level, so I figured I would try to dot my i's by sending an email to this list. Here are the configuration changes that are currently implemented (in terms of trying to performance tune and web scale our galaxy server): 1) Enabled proxy load balancing with six web front-ends (the number six pulled from Galaxy wiki) (Apache): <Proxy balancer://galaxy/> BalancerMember http://127.0.0.1:8080 BalancerMember http://127.0.0.1:8081 BalancerMember http://127.0.0.1:8082 BalancerMember http://127.0.0.1:8083 BalancerMember http://127.0.0.1:8084 BalancerMember http://127.0.0.1:8085 </Proxy> 2) Rewrite static URLs for static content (Apache): RewriteRule ^/static/style/(.*) /group/galaxy/galaxy-dist/static/uchicago_cri_august_2012_style/blue/$1 [L] RewriteRule ^/static/scripts/(.*) /group/galaxy/galaxy-dist/static/scripts/packed/$1 [L] RewriteRule ^/static/(.*) /group/galaxy/galaxy-dist/static/$1 [L] RewriteRule ^/robots.txt /group/galaxy/galaxy-dist/static/robots.txt [L] RewriteRule ^(.*) balancer://galaxy$1 [P] 3) Enabled compression and caching (Apache): <Location "/"> SetOutputFilter DEFLATE SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip dont-vary SetEnvIfNoCase Request_URI \.(?:t?gz|zip|bz2)$ no-gzip dont-vary </Location> <Location "/static"> ExpiresActive On ExpiresDefault "access plus 6 hours" </Location> 4) Configured web scaling (universe_wsgi.ini) : a) six web server processes (threadpool_workers = 7) b) a single job manager (threadpool_workers = 5) c) two job handlers (threadpool_workers = 5) 5) Configured a pbs_mom external job runner (our cluster), and commented out the default tool runners (to use pbs) (we are not using the other tools for the workshop). #ucsc_table_direct1 = local:/// #ucsc_table_direct_archaea1 = local:/// #ucsc_table_direct_test1 = local:/// #upload1 = local:/// 6) Changed the following database parameters (universe_wsgi.ini): database_engine_option_pool_size = 10 database_engine_option_max_overflow = 20 7) Disable the developer settings (universe_wsgi.ini): debug = False use_interactive = False #filter-with = gzip The server I have is a VM with the following resources: 2GB of RAM 4CPU Cores I feel that it is also worthwhile to mention that users will not be downloading datasets during the workshop, so as of now, the implementation of "XSendFile" as specified in the Apache Proxy documentation is not of immediate concern. Does anybody see any blaring mistakes where they think this configuration might fall short with respect to capacity planning for an environment of 35 concurrent users, or additional tuning that could potentially assist in ensuring the availability of the server during the workshop? Thank-you so much for your opinion(s), and please wish us luck this Thursday :-) Dan Sullivan
Hello Dan, Couple of lessons we learned from setting up similar workshop-galaxies: Dan Sullivan wrote, On 09/17/2012 01:04 PM:
Hi, Galaxy Developers,
Is anybody out there managing a Galaxy environment that was designed and or has been tested to support 35 concurrent users? The reason why I am asking this is because we [the U of C] have a training session coming up this Thursday, and the environment we have deployed needs to support this number of users. We have put the server under as high as stress as possible with 6 users, and Galaxy has performed fine, however it has proven somewhat challenging to do load testing for all 35 concurrent users prior to the workshop. I can't help but feel we are rolling the dice a little bit as we've never put the server under anything close to this load level, so I figured I would try to dot my i's by sending an email to this list.
Here are the configuration changes that are currently implemented (in terms of trying to performance tune and web scale our galaxy server):
1) Enabled proxy load balancing with six web front-ends (the number six pulled from Galaxy wiki) (Apache):
When configured correctly, 3 or 4 web-front-ends seemed sufficient. (When configured incorrectly, it doesn't matter how many you have, performances will suffer :) ). Given that you only have 4 CPUs/cores for your machine, having six front-ends seems too much.
2) Rewrite static URLs for static content (Apache): 3) Enabled compression and caching (Apache):
This might sounds obvious, but test that it actually works (e.g. check in the apache logs that the static files were served by apache, not by galaxy). Typos and other minor incompatibilities can cause the URLs to be served by galaxy, which will waste resources.
4) Configured web scaling (universe_wsgi.ini) : a) six web server processes (threadpool_workers = 7) b) a single job manager (threadpool_workers = 5) c) two job handlers (threadpool_workers = 5)
Again, with a system of only 4 CPUs, you might overload your server.
5) Configured a pbs_mom external job runner (our cluster), and commented out the default tool runners (to use pbs) (we are not using the other tools for the workshop).
#ucsc_table_direct1 = local:/// #ucsc_table_direct_archaea1 = local:/// #ucsc_table_direct_test1 = local:/// #upload1 = local:///
Unless your workshop is *tightly* scripted, you can't really tell which tool users will use. If this is an introduction to galaxy, users will experiment with some tools (even if you don't tell them to). (also, I'm not sure if those data import tools can run on your cluster node).
6) Changed the following database parameters (universe_wsgi.ini): database_engine_option_pool_size = 10 database_engine_option_max_overflow = 20
Assuming you're using PostgreSQL (and you shouldn't use anything else, in practice), add the following: database_engine_option_server_side_cursors = True And I would set "pool_size" to 50 and "max_overflow" to 100 - seems excessive, but under the load of 20 users hammering at galaxy at the same time in a short time window, I got the "database connection pool size" errors within 10 minutes.
The server I have is a VM with the following resources:
2GB of RAM 4CPU Cores
IMHO, that's too little memory and CPUs. A ball-park figures for our servers: memory-wise: each web-front-end python process takes ~300MB (and you plan for 6 of them). and you also have 3 more python processes (1 job manager + 2 job handlers). CPU-wise: In addition to 9 python processes, you will have several PostgreSQL processes, few apache threads, and some other system processes running. Even when each python process doesn't run at full capacity (ie. 100% CPU), your system already sounds overloaded. When jobs are running (at least on our system) the job-handlers consume some CPU time by just monitoring the jobs. When users submit large workflows with many jobs, the job-handlers take 100% cpu for a short time. with all of the above combined, I would say 4 CPUs sounds a bit weak.
I feel that it is also worthwhile to mention that users will not be downloading datasets during the workshop, so as of now, the implementation of "XSendFile" as specified in the Apache Proxy documentation is not of immediate concern.
IMHO, this is a wrong assumption. You can not fully control what users in a workshop are doing. Imagine that only two users out of your 35 will click (even accidentally) on the download icon, and start downloading a big file - if downloads are handled by the python processes - then immediately two of your six web-front-ends are now busy and can't serve other users. Also - regardless of how big the downloaded files are, Apache+XSendFile will be more efficient at sending files to the user (then python), and with just 4 CPUs you definitely want to conserve as many resources as possible.
Does anybody see any blaring mistakes where they think this configuration might fall short with respect to capacity planning for an environment of 35 concurrent users, or additional tuning that could potentially assist in ensuring the availability of the server during the workshop? Thank-you so much for your opinion(s), and please wish us luck this Thursday :-)
Another important lesson we learned for workshop is to carefully plan each example, and especially measure the upload, download and processing time. It's hard to give specific details without knowing more about your workshop, but generally: 1. work with small datasets (e.g. if you show-case NGS workflows, take a tiny subset of a HiSeq run). 2. work with small genomes (e.g. yeast). If you must work with bigger genomes, work with a single chromosome, and ensure (before hand) that the input files contain reads reads / intervals that would map to that chromosome and would give meaningful output. 3. Rehearse and workflow you are going to present - measure how long it takes to submit it, and for it to complete. To to submit the same workflow in parallel from 10 different machines (at the same time) - see if your server can handle it, and how long it takes to complete. (The reason being - if you plan it wrong, the instructor might tell the users to do something, and it can take 20 minutes for all the jobs of all the users to complete, before the workshop can go on - very frustrating). Try a workflow of 10 tools at least (or better, similar to the one actually presented in the workshop) - submitting large workflows is somewhat of a bottle-neck for the galaxy processes (at least on our server). 4. Publish the workflow in a way the users can easily import it (e.g. put the URL in place the users can click and import it) - for users who don't want to build it them selves. 5. Publish *the results* of running the workflow on your example input files (the actual input files used in the workflow) - will save *a lot* of time for users who don't want to run things, or (embarrassingly) if something goes wrong with the server, and you must show the results and keep the workshop going (speaking from experience). We even reproduced the exact workflow and history with full results on the public Galaxy server, and we could easily divert the users to the public server and tell them that these are the results they would get (with an apology that our local server couldn't handle their load) - so they could still explore the resulting files (and learn the file formats) when our server couldn't handle it. 6. Test "input" methods (how users will put data in your galaxy). Best way (IMHO) is uploading through URL - publish your input files somewhere, and give users simple URLs they can paste in the "get data" tool. Uploading from local computer is error prone (especially with big files), and uploading with FTP is confusing and complicated to explain in a workshop (imaging telling new users they need to install an FTP program to send files... too distracting). Then, measure how long it takes to upload those files into galaxy, and (if possible) how long it takes for 30 users to upload the same file at the same time - the instructor must be prepare to stall while files are uploading and jobs are running :) 7. Test "output" methods (how users will view the results). If your tool outputs HTML, make sure your galaxy can display HTML properly (look for "sanitize_all_html" in universe_wsgi.ini). If the output is BAM/BigWig/VCF/etc, make sure your galaxy can easily send tracks to the UCSC Genome browser (or another browser) - and make sure your server can handle the load of sending BAM files to UCSC (which brings XSEndFile again). If you plan on using IGV - better prepare the users in advance to have java working properly. Hope this helps, -gordon
Hi, Assaf, Thank-you for your very detailed, thorough, and thoughtful reply. I have responses to some stuff that you said; my comments are in-line; On Sep 17, 2012, at 1:11 PM, Assaf Gordon <gordon@cshl.edu> wrote:
Hello Dan,
Couple of lessons we learned from setting up similar workshop-galaxies:
Dan Sullivan wrote, On 09/17/2012 01:04 PM:
Hi, Galaxy Developers,
Is anybody out there managing a Galaxy environment that was designed and or has been tested to support 35 concurrent users? The reason why I am asking this is because we [the U of C] have a training session coming up this Thursday, and the environment we have deployed needs to support this number of users. We have put the server under as high as stress as possible with 6 users, and Galaxy has performed fine, however it has proven somewhat challenging to do load testing for all 35 concurrent users prior to the workshop. I can't help but feel we are rolling the dice a little bit as we've never put the server under anything close to this load level, so I figured I would try to dot my i's by sending an email to this list.
Here are the configuration changes that are currently implemented (in terms of trying to performance tune and web scale our galaxy server):
1) Enabled proxy load balancing with six web front-ends (the number six pulled from Galaxy wiki) (Apache):
When configured correctly, 3 or 4 web-front-ends seemed sufficient. (When configured incorrectly, it doesn't matter how many you have, performances will suffer :) ).
Given that you only have 4 CPUs/cores for your machine, having six front-ends seems too much.
Since we are not running on bare metal hardware, I can definitely increase memory and CPU count on the Galaxy VM. I am going to increase these to 8 Cores w/8GB of RAM for the purpose of the workshop, based on the rough numbers you provided.
2) Rewrite static URLs for static content (Apache): 3) Enabled compression and caching (Apache):
This might sounds obvious, but test that it actually works (e.g. check in the apache logs that the static files were served by apache, not by galaxy). Typos and other minor incompatibilities can cause the URLs to be served by galaxy, which will waste resources.
This is a very good idea. Thank-you.
4) Configured web scaling (universe_wsgi.ini) : a) six web server processes (threadpool_workers = 7) b) a single job manager (threadpool_workers = 5) c) two job handlers (threadpool_workers = 5)
Again, with a system of only 4 CPUs, you might overload your server.
As I said, I am going to increase the CPU core count to 8 based on your recommendations.
5) Configured a pbs_mom external job runner (our cluster), and commented out the default tool runners (to use pbs) (we are not using the other tools for the workshop).
#ucsc_table_direct1 = local:/// #ucsc_table_direct_archaea1 = local:/// #ucsc_table_direct_test1 = local:/// #upload1 = local:///
Unless your workshop is *tightly* scripted, you can't really tell which tool users will use. If this is an introduction to galaxy, users will experiment with some tools (even if you don't tell them to).
(also, I'm not sure if those data import tools can run on your cluster node).
Based on some limited testing, these data import tools can run on our cluster node. We have NAT configured with outbound HTTP from the cluster. I think we're alright on this one, although I will report back if I find any new meaningful lessons learned using this configuration.
6) Changed the following database parameters (universe_wsgi.ini): database_engine_option_pool_size = 10 database_engine_option_max_overflow = 20
Assuming you're using PostgreSQL (and you shouldn't use anything else, in practice), add the following: database_engine_option_server_side_cursors = True
And I would set "pool_size" to 50 and "max_overflow" to 100 - seems excessive, but under the load of 20 users hammering at galaxy at the same time in a short time window, I got the "database connection pool size" errors within 10 minutes.
This is good information from your experience. Thank-you for sharing this. I will implement this as you suggested.
The server I have is a VM with the following resources:
2GB of RAM 4CPU Cores
IMHO, that's too little memory and CPUs.
A ball-park figures for our servers:
memory-wise: each web-front-end python process takes ~300MB (and you plan for 6 of them). and you also have 3 more python processes (1 job manager + 2 job handlers).
CPU-wise: In addition to 9 python processes, you will have several PostgreSQL processes, few apache threads, and some other system processes running. Even when each python process doesn't run at full capacity (ie. 100% CPU), your system already sounds overloaded. When jobs are running (at least on our system) the job-handlers consume some CPU time by just monitoring the jobs. When users submit large workflows with many jobs, the job-handlers take 100% cpu for a short time. with all of the above combined, I would say 4 CPUs sounds a bit weak.
I am going to increase this to 8GB of memory w/8CPU cores, as per your recommendation.
I feel that it is also worthwhile to mention that users will not be downloading datasets during the workshop, so as of now, the implementation of "XSendFile" as specified in the Apache Proxy documentation is not of immediate concern.
IMHO, this is a wrong assumption. You can not fully control what users in a workshop are doing. Imagine that only two users out of your 35 will click (even accidentally) on the download icon, and start downloading a big file - if downloads are handled by the python processes - then immediately two of your six web-front-ends are now busy and can't serve other users.
Also - regardless of how big the downloaded files are, Apache+XSendFile will be more efficient at sending files to the user (then python), and with just 4 CPUs you definitely want to conserve as many resources as possible.
Fair enough. I fully plan on implementing Apache+XSendFile, whether or not I can get it done by Thursday is a different story :)
Does anybody see any blaring mistakes where they think this configuration might fall short with respect to capacity planning for an environment of 35 concurrent users, or additional tuning that could potentially assist in ensuring the availability of the server during the workshop? Thank-you so much for your opinion(s), and please wish us luck this Thursday :-)
Another important lesson we learned for workshop is to carefully plan each example, and especially measure the upload, download and processing time.
It's hard to give specific details without knowing more about your workshop, but generally:
1. work with small datasets (e.g. if you show-case NGS workflows, take a tiny subset of a HiSeq run).
Noted.
2. work with small genomes (e.g. yeast). If you must work with bigger genomes, work with a single chromosome, and ensure (before hand) that the input files contain reads reads / intervals that would map to that chromosome and would give meaningful output.
Noted.
3. Rehearse and workflow you are going to present - measure how long it takes to submit it, and for it to complete. To to submit the same workflow in parallel from 10 different machines (at the same time) - see if your server can handle it, and how long it takes to complete. (The reason being - if you plan it wrong, the instructor might tell the users to do something, and it can take 20 minutes for all the jobs of all the users to complete, before the workshop can go on - very frustrating). Try a workflow of 10 tools at least (or better, similar to the one actually presented in the workshop) - submitting large workflows is somewhat of a bottle-neck for the galaxy processes (at least on our server).
I didn't think of this. I will share this recommendation with the person running the workshop.
4. Publish the workflow in a way the users can easily import it (e.g. put the URL in place the users can click and import it) - for users who don't want to build it them selves.
5. Publish *the results* of running the workflow on your example input files (the actual input files used in the workflow) - will save *a lot* of time for users who don't want to run things, or (embarrassingly) if something goes wrong with the server, and you must show the results and keep the workshop going (speaking from experience). We even reproduced the exact workflow and history with full results on the public Galaxy server, and we could easily divert the users to the public server and tell them that these are the results they would get (with an apology that our local server couldn't handle their load) - so they could still explore the resulting files (and learn the file formats) when our server couldn't handle it.
This is a good recommendation. Thank-you.
6. Test "input" methods (how users will put data in your galaxy). Best way (IMHO) is uploading through URL - publish your input files somewhere, and give users simple URLs they can paste in the "get data" tool. Uploading from local computer is error prone (especially with big files), and uploading with FTP is confusing and complicated to explain in a workshop (imaging telling new users they need to install an FTP program to send files... too distracting). Then, measure how long it takes to upload those files into galaxy, and (if possible) how long it takes for 30 users to upload the same file at the same time - the instructor must be prepare to stall while files are uploading and jobs are running :)
We have for the most part, done this already.
7. Test "output" methods (how users will view the results). If your tool outputs HTML, make sure your galaxy can display HTML properly (look for "sanitize_all_html" in universe_wsgi.ini). If the output is BAM/BigWig/VCF/etc, make sure your galaxy can easily send tracks to the UCSC Genome browser (or another browser) - and make sure your server can handle the load of sending BAM files to UCSC (which brings XSEndFile again). If you plan on using IGV - better prepare the users in advance to have java working properly.
Thank-you again Gordon for your reply. I greatly appreciate the time you took to draft this email. Cheers, and thank-you again. Dan Sullivan
Hope this helps, -gordon
participants (2)
-
Assaf Gordon
-
Dan Sullivan