Re: [galaxy-dev] Galaxy environment on local resources

15 Nov 2013

      I am responding to galaxy-dev because others may have better answers than me.

On Wed, Nov 13, 2013 at 9:38 PM, Alistair Chilcott
<Alistair.Chilcott@utas.edu.au> wrote:
...
John,
Thanks for the advice, a dose of realism is always useful.  :D
I suspect that our current installation is fiddly primarily because it has been built on top of a Linux distro that was not installed initially just for Galaxy.
It is running SUSE which varies enough in its behaviour from UBUNTU or BioLinux to make the job of interpreting/implementing the various instructions for "how to setup/maintain" much more difficult.
As I suspected a full blown private cloud is optimistic at best, the question still needed to be asked. I still need to come up with a solution of some kind for the researchers.
If I could pick your brain for a few minutes .....
In your experience:
What is the best (easiest to setup and maintain) way to deploy a galaxy instance (including tools the NGS suite in particular) onto a local VM or physical host?
The easiest way to setup everything is probably BioLinux (which you
mention below). I am not sure if it results in the easiest maintenance
however.

There is not a turn key solution yet and with things like data
managers and the tool shed the Galaxy team is constantly trying to
make these things more automated. When I was working on the Galaxy-P
project I put together getgalaxyp.org/install.html which is a fairly
automated way to configure Galaxy-P environment on Ubuntu and CentOS -
dozens of  OS packages, custom tools, etc... It was built on
CloudBIoLinux. This is a superset of Galaxy, so it could (and should)
be trimmed back for a genomics focused setup. This install wouldn't
include the data, nginx server, or Galaxy itself but all of this could
be added to the procedure - the recipes are available in CloudBioLinux
becuase CloudMan uses them. I just need to find a free day sometime to
put all the pieces together and document them.
...
What Linux distribution (and version) is most compatible with galaxy?
My recommended stack would be Ubuntu / postgres / nginx. CloudMan runs
this stack and so you can always pop a cloud image and see what a
working configuration looks like, also the community will likely have
the most answers for the configuration as well. Also, even if you
don't use the automated configuration scripts in CloudBioLinux - you
can always look at them to see what to do as well.

There is also this: http://bioteam.net/slipstream/galaxy-edition/.

Hope this helps,
-John
...
Regards,
Alistair
-----Original Message-----
From: jmchilton@gmail.com [mailto:jmchilton@gmail.com] On Behalf Of John Chilton
Sent: Thursday, 14 November 2013 5:10 AM
To: Alistair Chilcott
Cc: Galaxy Dev
Subject: Re: [galaxy-dev] Galaxy environment on local resources
On Tue, Nov 12, 2013 at 1:41 AM, Alistair Chilcott <Alistair.Chilcott@utas.edu.au> wrote:
...
Hello all,
Feel free to point it out if I have missed something obvious, I have done a fair bit of investigation and haven't quite found the solution yet.
We have some hardware that has been around for a while for the purpose of processing Genetic data and other related tasks.
To this end Galaxy fits the bill nicely in that it enables researchers to analyse data without being Linux geeks.
The problem I have is that while the hand built galaxy server (running on SUSE for historical reasons) we have works to a point it is difficult to maintain and installing new tools and reference genomes is fiddly at best given that our server doesn't conform to the way the instructions for other systems expect it to work.
We have had success using Cloudman on AWS to run training on how to use galaxy, and I would like to know more about how to customise and instance to contain all of the tools (mostly the NGS tools) we need by default.
Ultimately we wont be able to use AWS to process much of the "real" data we have, because of a need to keep the data we are processing in-house due to ethics agreements. Fortunately we do have access to a modest pool of hardware (which is about to get bigger) to implement some kind of private cloud solution.
How would I go about setting up a "private cloud" version of the cloudman style "galaxy instance on demand" system where researchers can start an instance, have it connect to a shared storage volume and process some data then terminate the instance? And is this even the best way to go?
I have found it should be possible to use the scripts to install and configure the galaxy instances but I have not found any information on how to setup the environment that is required to make this work as a private cloud.
Conversely I have found information about some private cloud scenarios such as Eucalyptis, and OpenStack but have not been able to join the dots to determine how to and if I can make the cloudman/galaxy usecase work on it.
I think the way to do this would be to setup OpenStack and then install and configure CloudMan on your OpenStack cloud. I am not aware of something like CloudMan that deploys Galaxy without an existing cloud infrastructure. I have done this to some extent, I have put together some (dated) scripts for bootstrapping CloudMan on OpenStack still used by my old employer (MSI):
https://github.com/jmchilton/cloudman_openstack_bootstrap
A more modern starting point would be to extend the CloudBioLinux deployer instructions and scripts targeting Amazon for OpenStack (this will require some (small?) development effort though):
https://github.com/chapmanb/cloudbiolinux/blob/master/deploy/cloudman.md
Either path will take a lot of sys-admin-y work though.
I hope I don't get in trouble for saying this, but my personal opinion is you should not do this though :). If you consider managing Galaxy fiddly (a fair criticism) than redoing everything an a cloudy manner is going to be several orders of magnitude more fiddly and more work.
I have deployed OpenStack and worked with others doing another deployment, it will take a good system administrator months of effort to do this well and then it will take some amount of her or his time each week ongoing to maintain that layer of the infrastructure. Then instead of installing and maintaining Galaxy, you will have to do that still but with the added complexity of CloudMan.
CloudMan works well on Amazon  because you have a corporation full of amazing engineers operating the infrastructure, you have Dannon and Enis doing a heroic job configuring Galaxy and CloudMan, and dozens of users find and reporting problems. Trying to replicate that all yourself is quite difficult...
...
I should mention that I'm primarily a Windows Sys Admin (who dabbles in Linux) who is looking at this due to a lack of a dedicated Linux admin.
This is going to require more than dabbling...
...
At the end of the day I need to be able to setup this system and make it as low maintenance as possible whilst being useful and accessible to the researchers who aren't Linux admins.
Ekkk....
...
Any advice gratefully accepted.
It is all going to be on Amazon someday, work to make that happen sooner?
If you are more comfortable with Windows, you could buy some sort of VMware server solution and try to create a simple gold standard VMware image of something that just contains Galaxy and just spin up single instances on a per project basis. Ultimately though the easiest number of Galaxy instances to manage is 1 :).
Sorry this e-mail is kind of pessimistic, hopefully more well adjusted people will respond with happier advice.
-John
...
Regards,
Alistair
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this and other
Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/