Re: [galaxy-dev] Galaxy environment on local resources

13 Nov 2013

      On Tue, Nov 12, 2013 at 1:41 AM, Alistair Chilcott
<Alistair.Chilcott@utas.edu.au> wrote:
...
Hello all,
Feel free to point it out if I have missed something obvious, I have done a fair bit of investigation and haven't quite found the solution yet.
We have some hardware that has been around for a while for the purpose of processing Genetic data and other related tasks.
To this end Galaxy fits the bill nicely in that it enables researchers to analyse data without being Linux geeks.
The problem I have is that while the hand built galaxy server (running on SUSE for historical reasons) we have works to a point it is difficult to maintain and installing new tools and reference genomes is fiddly at best given that our server doesn't conform to the way the instructions for other systems expect it to work.
We have had success using Cloudman on AWS to run training on how to use galaxy, and I would like to know more about how to customise and instance to contain all of the tools (mostly the NGS tools) we need by default.
Ultimately we wont be able to use AWS to process much of the "real" data we have, because of a need to keep the data we are processing in-house due to ethics agreements. Fortunately we do have access to a modest pool of hardware (which is about to get bigger) to implement some kind of private cloud solution.
How would I go about setting up a "private cloud" version of the cloudman style "galaxy instance on demand" system where researchers can start an instance, have it connect to a shared storage volume and process some data then terminate the instance? And is this even the best way to go?
I have found it should be possible to use the scripts to install and configure the galaxy instances but I have not found any information on how to setup the environment that is required to make this work as a private cloud.
Conversely I have found information about some private cloud scenarios such as Eucalyptis, and OpenStack but have not been able to join the dots to determine how to and if I can make the cloudman/galaxy usecase work on it.
I think the way to do this would be to setup OpenStack and then
install and configure CloudMan on your OpenStack cloud. I am not aware
of something like CloudMan that deploys Galaxy without an existing
cloud infrastructure. I have done this to some extent, I have put
together some (dated) scripts for bootstrapping CloudMan on OpenStack
still used by my old employer (MSI):

https://github.com/jmchilton/cloudman_openstack_bootstrap

A more modern starting point would be to extend the CloudBioLinux
deployer instructions and scripts targeting Amazon for OpenStack (this
will require some (small?) development effort though):

https://github.com/chapmanb/cloudbiolinux/blob/master/deploy/cloudman.md

Either path will take a lot of sys-admin-y work though.

I hope I don't get in trouble for saying this, but my personal opinion
is you should not do this though :). If you consider managing Galaxy
fiddly (a fair criticism) than redoing everything an a cloudy manner
is going to be several orders of magnitude more fiddly and more work.
I have deployed OpenStack and worked with others doing another
deployment, it will take a good system administrator months of effort
to do this well and then it will take some amount of her or his time
each week ongoing to maintain that layer of the infrastructure. Then
instead of installing and maintaining Galaxy, you will have to do that
still but with the added complexity of CloudMan.

CloudMan works well on Amazon  because you have a corporation full of
amazing engineers operating the infrastructure, you have Dannon and
Enis doing a heroic job configuring Galaxy and CloudMan, and dozens of
users find and reporting problems. Trying to replicate that all
yourself is quite difficult...
...
I should mention that I'm primarily a Windows Sys Admin (who dabbles in Linux) who is looking at this due to a lack of a dedicated Linux admin.
This is going to require more than dabbling...
...
At the end of the day I need to be able to setup this system and make it as low maintenance as possible whilst being useful and accessible to the researchers who aren't Linux admins.
Ekkk....
...
Any advice gratefully accepted.
It is all going to be on Amazon someday, work to make that happen sooner?

If you are more comfortable with Windows, you could buy some sort of
VMware server solution and try to create a simple gold standard VMware
image of something that just contains Galaxy and just spin up single
instances on a per project basis. Ultimately though the easiest number
of Galaxy instances to manage is 1 :).

Sorry this e-mail is kind of pessimistic, hopefully more well adjusted
people will respond with happier advice.

-John
...
Regards,
Alistair
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Galaxy environment on local resources

John Chilton