I need some advice on what type of Galaxy server to implement
Hi I have been tasked with getting a Galaxy server up and running for a group at work. 1. No-one can tell me how many users (concurrent or otherwise) there will be 2. Most of the analyses will be NGS. 3. Tools will be developed in-house but we will use public domain tools also. 4. There will be a guy running the server/developing tools pretty much full time. I have two favoured solutions at the moment: 1. A pipeline processor ( 64 Core, 512G Ram, with DAS of about 150TB ), and a Web server to act as frontend and database server, and another, smaller box for a total install of galaxy, but doing only the development work. 2. An all-in-one server with 128 Cores, 1TB ram, DAS storage of 150TB, and development work done on a VM. Any input would be hepfull. Thanks Shane
When you say NGS, is it genome assembly? If so, what type of genomes and do you have experience with its memory and cpu requirements. We noted that servers with large amount of memory and cores have a memory bus bottleneck. The other aspect is high processing on the server will impact the performance of Galaxy unless it is given higher priority. Note that if you overcommit the server, it can destabilize and bring down the Galaxy web app and database. My general approach is Galaxy web app + proxy on a separate machine from the handlers. The analysis server is either running a grid or the handlers. I recommend multiple smaller servers if you can get away with it as long as you have one that can accommodate your LARGE workloads. If you don't care about overall performance, large servers are the way to go as they are more "versatile". Regards, Iyad Kandalaft Acting Chief Bioinformatician in Biodiversity, STB Agriculture and Agri-Food Canada / Government of Canada Iyad.Kandalaft@Agr.gc.ca / Tel: 613-759-1228 / TTY: 613-773-2600 Bioinformaticien chef de la biodiversite interim, Direction générale des Science et de la technologie Agriculture et Agroalimentaire Canada / Gouvernement du Canada Iyad.Kandalaft@Agr.gc.ca / Tel: 613-759-1228 / TTY: 613-773-2600 -----Original Message----- From: galaxy-dev [mailto:galaxy-dev-bounces@lists.galaxyproject.org] On Behalf Of Shane Kelly Sent: July-28-15 8:23 AM To: galaxy-dev@lists.galaxyproject.org Subject: [galaxy-dev] I need some advice on what type of Galaxy server to implement Hi I have been tasked with getting a Galaxy server up and running for a group at work. 1. No-one can tell me how many users (concurrent or otherwise) there will be 2. Most of the analyses will be NGS. 3. Tools will be developed in-house but we will use public domain tools also. 4. There will be a guy running the server/developing tools pretty much full time. I have two favoured solutions at the moment: 1. A pipeline processor ( 64 Core, 512G Ram, with DAS of about 150TB ), and a Web server to act as frontend and database server, and another, smaller box for a total install of galaxy, but doing only the development work. 2. An all-in-one server with 128 Cores, 1TB ram, DAS storage of 150TB, and development work done on a VM. Any input would be hepfull. Thanks Shane ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Iyad, Thanks for taking the time to get back to me. I am an IT guy, and would not know how to answer these questions except to say that the server is to be used to facilitate biological and medical research in the areas of genomics, transcriptomics, epigenomics and metagenomics. (straight from the mission statement :-) ) I think that they would say that overall performance is less important than being able to run large datasets (1/2-3TB). I like the idea of a separate box for the web-server, but I am not sure how the web server would communicate with the pipeline box - is that ability built into galaxy, or is it a well worn path with plenty of examples that I could plagiarize? Sorry to be such a newb, but I don't know much about galaxy at all. Luckily I have 2-3 months to put this in place... Again, thank you for your time. Regards, Shane
When you say NGS, is it genome assembly? If so, what type of genomes and do you have experience with its memory and cpu requirements. We noted that servers with large amount of memory and cores have a memory bus bottleneck. The other aspect is high processing on the server will impact the performance of Galaxy unless it is given higher priority. Note that if you overcommit the server, it can destabilize and bring down the Galaxy web app and database.
My general approach is Galaxy web app + proxy on a separate machine from the handlers. The analysis server is either running a grid or the handlers.
I recommend multiple smaller servers if you can get away with it as long as you have one that can accommodate your LARGE workloads. If you don't care about overall performance, large servers are the way to go as they are more "versatile".
Regards,
Iyad Kandalaft
Acting Chief Bioinformatician in Biodiversity, STB Agriculture and Agri-Food Canada / Government of Canada Iyad.Kandalaft@Agr.gc.ca / Tel: 613-759-1228 / TTY: 613-773-2600
Bioinformaticien chef de la biodiversite interim, Direction générale des Science et de la technologie Agriculture et Agroalimentaire Canada / Gouvernement du Canada Iyad.Kandalaft@Agr.gc.ca / Tel: 613-759-1228 / TTY: 613-773-2600
-----Original Message----- From: galaxy-dev [mailto:galaxy-dev-bounces@lists.galaxyproject.org] On Behalf Of Shane Kelly Sent: July-28-15 8:23 AM To: galaxy-dev@lists.galaxyproject.org Subject: [galaxy-dev] I need some advice on what type of Galaxy server to implement
Hi I have been tasked with getting a Galaxy server up and running for a group at work.
1. No-one can tell me how many users (concurrent or otherwise) there will be 2. Most of the analyses will be NGS. 3. Tools will be developed in-house but we will use public domain tools also. 4. There will be a guy running the server/developing tools pretty much full time.
I have two favoured solutions at the moment:
1. A pipeline processor ( 64 Core, 512G Ram, with DAS of about 150TB ), and a Web server to act as frontend and database server, and another, smaller box for a total install of galaxy, but doing only the development work.
2. An all-in-one server with 128 Cores, 1TB ram, DAS storage of 150TB, and development work done on a VM.
Any input would be hepfull.
Thanks Shane ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
What you should do (in my opinion) is install a grid scheduler (SGE, torque, etc) on the big server. If you run Galaxy on a separate server, it can be configured to submit jobs to the scheduler. Galaxy also has the concept of a Web App and Handler components. Essentially, handlers take care of talking with the scheduler while the Web App will serve pages to users. By default, the Web App and Handlers are combined in the same process. you can configure galaxy to start up multiple handler processes and multiple web app processes. Then, you can use Apache or Nginx to load balance user requests between the various Galaxy Web Apps. My recommendation is that you start small to accommodate about 5 concurrent users without noticeable performance issues: 1 Galaxy web app 1 Galaxy handler Recommended reading: https://wiki.galaxyproject.org/Admin/Config/Performance/Scaling https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster -----Original Message----- From: Shane Kelly [mailto:skk@shanek54.co.uk] Sent: July-29-15 7:32 AM To: Kandalaft, Iyad Cc: galaxy-dev@lists.galaxyproject.org Subject: Re: [galaxy-dev] I need some advice on what type of Galaxy server to implement Hi Iyad, Thanks for taking the time to get back to me. I am an IT guy, and would not know how to answer these questions except to say that the server is to be used to facilitate biological and medical research in the areas of genomics, transcriptomics, epigenomics and metagenomics. (straight from the mission statement :-) ) I think that they would say that overall performance is less important than being able to run large datasets (1/2-3TB). I like the idea of a separate box for the web-server, but I am not sure how the web server would communicate with the pipeline box - is that ability built into galaxy, or is it a well worn path with plenty of examples that I could plagiarize? Sorry to be such a newb, but I don't know much about galaxy at all. Luckily I have 2-3 months to put this in place... Again, thank you for your time. Regards, Shane
When you say NGS, is it genome assembly? If so, what type of genomes and do you have experience with its memory and cpu requirements. We noted that servers with large amount of memory and cores have a memory bus bottleneck. The other aspect is high processing on the server will impact the performance of Galaxy unless it is given higher priority. Note that if you overcommit the server, it can destabilize and bring down the Galaxy web app and database.
My general approach is Galaxy web app + proxy on a separate machine from the handlers. The analysis server is either running a grid or the handlers.
I recommend multiple smaller servers if you can get away with it as long as you have one that can accommodate your LARGE workloads. If you don't care about overall performance, large servers are the way to go as they are more "versatile".
Regards,
Iyad Kandalaft
Acting Chief Bioinformatician in Biodiversity, STB Agriculture and Agri-Food Canada / Government of Canada Iyad.Kandalaft@Agr.gc.ca / Tel: 613-759-1228 / TTY: 613-773-2600
Bioinformaticien chef de la biodiversite interim, Direction générale des Science et de la technologie Agriculture et Agroalimentaire Canada / Gouvernement du Canada Iyad.Kandalaft@Agr.gc.ca / Tel: 613-759-1228 / TTY: 613-773-2600
-----Original Message----- From: galaxy-dev [mailto:galaxy-dev-bounces@lists.galaxyproject.org] On Behalf Of Shane Kelly Sent: July-28-15 8:23 AM To: galaxy-dev@lists.galaxyproject.org Subject: [galaxy-dev] I need some advice on what type of Galaxy server to implement
Hi I have been tasked with getting a Galaxy server up and running for a group at work.
1. No-one can tell me how many users (concurrent or otherwise) there will be 2. Most of the analyses will be NGS. 3. Tools will be developed in-house but we will use public domain tools also. 4. There will be a guy running the server/developing tools pretty much full time.
I have two favoured solutions at the moment:
1. A pipeline processor ( 64 Core, 512G Ram, with DAS of about 150TB ), and a Web server to act as frontend and database server, and another, smaller box for a total install of galaxy, but doing only the development work.
2. An all-in-one server with 128 Cores, 1TB ram, DAS storage of 150TB, and development work done on a VM.
Any input would be hepfull.
Thanks Shane ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Iyad, thanks for those links, and the sound advice. I'll read and absorb, and see what kind of a plan I can come up with. So good of you to take the time to answer, many thanks. Regards, Shane
What you should do (in my opinion) is install a grid scheduler (SGE, torque, etc) on the big server. If you run Galaxy on a separate server, it can be configured to submit jobs to the scheduler. Galaxy also has the concept of a Web App and Handler components. Essentially, handlers take care of talking with the scheduler while the Web App will serve pages to users. By default, the Web App and Handlers are combined in the same process. you can configure galaxy to start up multiple handler processes and multiple web app processes. Then, you can use Apache or Nginx to load balance user requests between the various Galaxy Web Apps.
My recommendation is that you start small to accommodate about 5 concurrent users without noticeable performance issues: 1 Galaxy web app 1 Galaxy handler
Recommended reading: https://wiki.galaxyproject.org/Admin/Config/Performance/Scaling https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster
-----Original Message----- From: Shane Kelly [mailto:skk@shanek54.co.uk] Sent: July-29-15 7:32 AM To: Kandalaft, Iyad Cc: galaxy-dev@lists.galaxyproject.org Subject: Re: [galaxy-dev] I need some advice on what type of Galaxy server to implement
Hi Iyad, Thanks for taking the time to get back to me. I am an IT guy, and would not know how to answer these questions except to say that the server is to be used to facilitate biological and medical research in the areas of genomics, transcriptomics, epigenomics and metagenomics. (straight from the mission statement :-) )
I think that they would say that overall performance is less important than being able to run large datasets (1/2-3TB).
I like the idea of a separate box for the web-server, but I am not sure how the web server would communicate with the pipeline box - is that ability built into galaxy, or is it a well worn path with plenty of examples that I could plagiarize?
Sorry to be such a newb, but I don't know much about galaxy at all. Luckily I have 2-3 months to put this in place...
Again, thank you for your time.
Regards, Shane
When you say NGS, is it genome assembly? If so, what type of genomes and do you have experience with its memory and cpu requirements. We noted that servers with large amount of memory and cores have a memory bus bottleneck. The other aspect is high processing on the server will impact the performance of Galaxy unless it is given higher priority. Note that if you overcommit the server, it can destabilize and bring down the Galaxy web app and database.
My general approach is Galaxy web app + proxy on a separate machine from the handlers. The analysis server is either running a grid or the handlers.
I recommend multiple smaller servers if you can get away with it as long as you have one that can accommodate your LARGE workloads. If you don't care about overall performance, large servers are the way to go as they are more "versatile".
Regards,
Iyad Kandalaft
Acting Chief Bioinformatician in Biodiversity, STB Agriculture and Agri-Food Canada / Government of Canada Iyad.Kandalaft@Agr.gc.ca / Tel: 613-759-1228 / TTY: 613-773-2600
Bioinformaticien chef de la biodiversite interim, Direction générale des Science et de la technologie Agriculture et Agroalimentaire Canada / Gouvernement du Canada Iyad.Kandalaft@Agr.gc.ca / Tel: 613-759-1228 / TTY: 613-773-2600
-----Original Message----- From: galaxy-dev [mailto:galaxy-dev-bounces@lists.galaxyproject.org] On Behalf Of Shane Kelly Sent: July-28-15 8:23 AM To: galaxy-dev@lists.galaxyproject.org Subject: [galaxy-dev] I need some advice on what type of Galaxy server to implement
Hi I have been tasked with getting a Galaxy server up and running for a group at work.
1. No-one can tell me how many users (concurrent or otherwise) there will be 2. Most of the analyses will be NGS. 3. Tools will be developed in-house but we will use public domain tools also. 4. There will be a guy running the server/developing tools pretty much full time.
I have two favoured solutions at the moment:
1. A pipeline processor ( 64 Core, 512G Ram, with DAS of about 150TB ), and a Web server to act as frontend and database server, and another, smaller box for a total install of galaxy, but doing only the development work.
2. An all-in-one server with 128 Cores, 1TB ram, DAS storage of 150TB, and development work done on a VM.
Any input would be hepfull.
Thanks Shane ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (3)
-
Kandalaft, Iyad
-
Shane Kelly
-
Shane Kelly