I am responding to galaxy-dev because others may have better answers than me.
On Wed, Nov 13, 2013 at 9:38 PM, Alistair Chilcott
<Alistair.Chilcott(a)utas.edu.au> wrote:
John,
Thanks for the advice, a dose of realism is always useful. :D
I suspect that our current installation is fiddly primarily because it has been built on
top of a Linux distro that was not installed initially just for Galaxy.
It is running SUSE which varies enough in its behaviour from UBUNTU or BioLinux to make
the job of interpreting/implementing the various instructions for "how to
setup/maintain" much more difficult.
As I suspected a full blown private cloud is optimistic at best, the question still
needed to be asked. I still need to come up with a solution of some kind for the
researchers.
If I could pick your brain for a few minutes .....
In your experience:
What is the best (easiest to setup and maintain) way to deploy a galaxy instance
(including tools the NGS suite in particular) onto a local VM or physical host?
The easiest way to setup everything is probably BioLinux (which you
mention below). I am not sure if it results in the easiest maintenance
however.
There is not a turn key solution yet and with things like data
managers and the tool shed the Galaxy team is constantly trying to
make these things more automated. When I was working on the Galaxy-P
project I put together
getgalaxyp.org/install.html which is a fairly
automated way to configure Galaxy-P environment on Ubuntu and CentOS -
dozens of OS packages, custom tools, etc... It was built on
CloudBIoLinux. This is a superset of Galaxy, so it could (and should)
be trimmed back for a genomics focused setup. This install wouldn't
include the data, nginx server, or Galaxy itself but all of this could
be added to the procedure - the recipes are available in CloudBioLinux
becuase CloudMan uses them. I just need to find a free day sometime to
put all the pieces together and document them.
What Linux distribution (and version) is most compatible with galaxy?
My recommended stack would be Ubuntu / postgres / nginx. CloudMan runs
this stack and so you can always pop a cloud image and see what a
working configuration looks like, also the community will likely have
the most answers for the configuration as well. Also, even if you
don't use the automated configuration scripts in CloudBioLinux - you
can always look at them to see what to do as well.
There is also this:
http://bioteam.net/slipstream/galaxy-edition/.
Hope this helps,
-John
Regards,
Alistair
-----Original Message-----
From: jmchilton(a)gmail.com [mailto:jmchilton@gmail.com] On Behalf Of John Chilton
Sent: Thursday, 14 November 2013 5:10 AM
To: Alistair Chilcott
Cc: Galaxy Dev
Subject: Re: [galaxy-dev] Galaxy environment on local resources
On Tue, Nov 12, 2013 at 1:41 AM, Alistair Chilcott <Alistair.Chilcott(a)utas.edu.au>
wrote:
> Hello all,
>
> Feel free to point it out if I have missed something obvious, I have done a fair bit
of investigation and haven't quite found the solution yet.
>
> We have some hardware that has been around for a while for the purpose of processing
Genetic data and other related tasks.
> To this end Galaxy fits the bill nicely in that it enables researchers to analyse
data without being Linux geeks.
>
> The problem I have is that while the hand built galaxy server (running on SUSE for
historical reasons) we have works to a point it is difficult to maintain and installing
new tools and reference genomes is fiddly at best given that our server doesn't
conform to the way the instructions for other systems expect it to work.
>
> We have had success using Cloudman on AWS to run training on how to use galaxy, and I
would like to know more about how to customise and instance to contain all of the tools
(mostly the NGS tools) we need by default.
>
> Ultimately we wont be able to use AWS to process much of the "real" data we
have, because of a need to keep the data we are processing in-house due to ethics
agreements. Fortunately we do have access to a modest pool of hardware (which is about to
get bigger) to implement some kind of private cloud solution.
>
> How would I go about setting up a "private cloud" version of the cloudman
style "galaxy instance on demand" system where researchers can start an
instance, have it connect to a shared storage volume and process some data then terminate
the instance? And is this even the best way to go?
>
> I have found it should be possible to use the scripts to install and configure the
galaxy instances but I have not found any information on how to setup the environment that
is required to make this work as a private cloud.
>
> Conversely I have found information about some private cloud scenarios such as
Eucalyptis, and OpenStack but have not been able to join the dots to determine how to and
if I can make the cloudman/galaxy usecase work on it.
I think the way to do this would be to setup OpenStack and then install and configure
CloudMan on your OpenStack cloud. I am not aware of something like CloudMan that deploys
Galaxy without an existing cloud infrastructure. I have done this to some extent, I have
put together some (dated) scripts for bootstrapping CloudMan on OpenStack still used by my
old employer (MSI):
https://github.com/jmchilton/cloudman_openstack_bootstrap
A more modern starting point would be to extend the CloudBioLinux deployer instructions
and scripts targeting Amazon for OpenStack (this will require some (small?) development
effort though):
https://github.com/chapmanb/cloudbiolinux/blob/master/deploy/cloudman.md
Either path will take a lot of sys-admin-y work though.
I hope I don't get in trouble for saying this, but my personal opinion is you should
not do this though :). If you consider managing Galaxy fiddly (a fair criticism) than
redoing everything an a cloudy manner is going to be several orders of magnitude more
fiddly and more work.
I have deployed OpenStack and worked with others doing another deployment, it will take a
good system administrator months of effort to do this well and then it will take some
amount of her or his time each week ongoing to maintain that layer of the infrastructure.
Then instead of installing and maintaining Galaxy, you will have to do that still but with
the added complexity of CloudMan.
CloudMan works well on Amazon because you have a corporation full of amazing engineers
operating the infrastructure, you have Dannon and Enis doing a heroic job configuring
Galaxy and CloudMan, and dozens of users find and reporting problems. Trying to replicate
that all yourself is quite difficult...
>
> I should mention that I'm primarily a Windows Sys Admin (who dabbles in Linux)
who is looking at this due to a lack of a dedicated Linux admin.
This is going to require more than dabbling...
>
> At the end of the day I need to be able to setup this system and make it as low
maintenance as possible whilst being useful and accessible to the researchers who
aren't Linux admins.
Ekkk....
>
> Any advice gratefully accepted.
It is all going to be on Amazon someday, work to make that happen sooner?
If you are more comfortable with Windows, you could buy some sort of VMware server
solution and try to create a simple gold standard VMware image of something that just
contains Galaxy and just spin up single instances on a per project basis. Ultimately
though the easiest number of Galaxy instances to manage is 1 :).
Sorry this e-mail is kind of pessimistic, hopefully more well adjusted people will
respond with happier advice.
-John
>
> Regards,
>
> Alistair
>
>
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this and other
> Galaxy lists, please use the interface at:
>
http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>
http://galaxyproject.org/search/mailinglists/