Hello all, Feel free to point it out if I have missed something obvious, I have done a fair bit of investigation and haven't quite found the solution yet. We have some hardware that has been around for a while for the purpose of processing Genetic data and other related tasks. To this end Galaxy fits the bill nicely in that it enables researchers to analyse data without being Linux geeks. The problem I have is that while the hand built galaxy server (running on SUSE for historical reasons) we have works to a point it is difficult to maintain and installing new tools and reference genomes is fiddly at best given that our server doesn't conform to the way the instructions for other systems expect it to work. We have had success using Cloudman on AWS to run training on how to use galaxy, and I would like to know more about how to customise and instance to contain all of the tools (mostly the NGS tools) we need by default. Ultimately we wont be able to use AWS to process much of the "real" data we have, because of a need to keep the data we are processing in-house due to ethics agreements. Fortunately we do have access to a modest pool of hardware (which is about to get bigger) to implement some kind of private cloud solution. How would I go about setting up a "private cloud" version of the cloudman style "galaxy instance on demand" system where researchers can start an instance, have it connect to a shared storage volume and process some data then terminate the instance? And is this even the best way to go? I have found it should be possible to use the scripts to install and configure the galaxy instances but I have not found any information on how to setup the environment that is required to make this work as a private cloud. Conversely I have found information about some private cloud scenarios such as Eucalyptis, and OpenStack but have not been able to join the dots to determine how to and if I can make the cloudman/galaxy usecase work on it. I should mention that I'm primarily a Windows Sys Admin (who dabbles in Linux) who is looking at this due to a lack of a dedicated Linux admin. At the end of the day I need to be able to setup this system and make it as low maintenance as possible whilst being useful and accessible to the researchers who aren't Linux admins. Any advice gratefully accepted. Regards, Alistair
On Tue, Nov 12, 2013 at 1:41 AM, Alistair Chilcott <Alistair.Chilcott@utas.edu.au> wrote:
Hello all,
Feel free to point it out if I have missed something obvious, I have done a fair bit of investigation and haven't quite found the solution yet.
We have some hardware that has been around for a while for the purpose of processing Genetic data and other related tasks. To this end Galaxy fits the bill nicely in that it enables researchers to analyse data without being Linux geeks.
The problem I have is that while the hand built galaxy server (running on SUSE for historical reasons) we have works to a point it is difficult to maintain and installing new tools and reference genomes is fiddly at best given that our server doesn't conform to the way the instructions for other systems expect it to work.
We have had success using Cloudman on AWS to run training on how to use galaxy, and I would like to know more about how to customise and instance to contain all of the tools (mostly the NGS tools) we need by default.
Ultimately we wont be able to use AWS to process much of the "real" data we have, because of a need to keep the data we are processing in-house due to ethics agreements. Fortunately we do have access to a modest pool of hardware (which is about to get bigger) to implement some kind of private cloud solution.
How would I go about setting up a "private cloud" version of the cloudman style "galaxy instance on demand" system where researchers can start an instance, have it connect to a shared storage volume and process some data then terminate the instance? And is this even the best way to go?
I have found it should be possible to use the scripts to install and configure the galaxy instances but I have not found any information on how to setup the environment that is required to make this work as a private cloud.
Conversely I have found information about some private cloud scenarios such as Eucalyptis, and OpenStack but have not been able to join the dots to determine how to and if I can make the cloudman/galaxy usecase work on it.
I think the way to do this would be to setup OpenStack and then install and configure CloudMan on your OpenStack cloud. I am not aware of something like CloudMan that deploys Galaxy without an existing cloud infrastructure. I have done this to some extent, I have put together some (dated) scripts for bootstrapping CloudMan on OpenStack still used by my old employer (MSI): https://github.com/jmchilton/cloudman_openstack_bootstrap A more modern starting point would be to extend the CloudBioLinux deployer instructions and scripts targeting Amazon for OpenStack (this will require some (small?) development effort though): https://github.com/chapmanb/cloudbiolinux/blob/master/deploy/cloudman.md Either path will take a lot of sys-admin-y work though. I hope I don't get in trouble for saying this, but my personal opinion is you should not do this though :). If you consider managing Galaxy fiddly (a fair criticism) than redoing everything an a cloudy manner is going to be several orders of magnitude more fiddly and more work. I have deployed OpenStack and worked with others doing another deployment, it will take a good system administrator months of effort to do this well and then it will take some amount of her or his time each week ongoing to maintain that layer of the infrastructure. Then instead of installing and maintaining Galaxy, you will have to do that still but with the added complexity of CloudMan. CloudMan works well on Amazon because you have a corporation full of amazing engineers operating the infrastructure, you have Dannon and Enis doing a heroic job configuring Galaxy and CloudMan, and dozens of users find and reporting problems. Trying to replicate that all yourself is quite difficult...
I should mention that I'm primarily a Windows Sys Admin (who dabbles in Linux) who is looking at this due to a lack of a dedicated Linux admin.
This is going to require more than dabbling...
At the end of the day I need to be able to setup this system and make it as low maintenance as possible whilst being useful and accessible to the researchers who aren't Linux admins.
Ekkk....
Any advice gratefully accepted.
It is all going to be on Amazon someday, work to make that happen sooner? If you are more comfortable with Windows, you could buy some sort of VMware server solution and try to create a simple gold standard VMware image of something that just contains Galaxy and just spin up single instances on a per project basis. Ultimately though the easiest number of Galaxy instances to manage is 1 :). Sorry this e-mail is kind of pessimistic, hopefully more well adjusted people will respond with happier advice. -John
Regards,
Alistair
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (2)
-
Alistair Chilcott
-
John Chilton