Using Mesos to Enable distributed computing under Galaxy?

26 Oct 2013

      I think one of the aspects where Galaxy is a bit soft is the ability to do
distributed tasks. The current system of split/replicate/merge tasks based
on file type is a bit limited and hard for tool developers to expand upon.
Distributed computing is a non-trival thing to implement and I think it
would be a better use of our time to use an already existing framework. And
it would also mean one less API for tool writers to have to develop for.
I was wondering if anybody has looked at Mesos (
http://mesos.apache.org/). You can see an overview of the Mesos
architecture at
https://github.com/apache/mesos/blob/master/docs/Mesos-Architecture.md
The important thing about Mesos is that it provides an API for C/C++,
Java/Scala and Python to write distributed frameworks. There are already
implementations of frameworks for common parallel programming systems such
as:
 - Hadoop (https://github.com/mesos/hadoop)
 - MPI (
https://github.com/apache/mesos/blob/master/docs/Running-torque-or-mpi-on-me...
)
 - Spark (http://spark-project.org)
And you can find example Python framework at
https://github.com/apache/mesos/tree/master/src/examples/python

Integration with Galaxy would have three parts:
1) Add a system config variable to Galaxy called 'MESOS_URL' that is then
passed to tool wrappers and allows them to contact the local mesos
infrastructure (assuming the system has been configured) or pass a null if
the system isn't available.
2) Write a tool runner that works as a mesos framework to executes single
cpu jobs on the distributed system.
3) For instances where mesos is not available at a system wide level (say
they only have access to an SGE based cluster), but the user wants to run
distributed jobs, write a wrapper that can create a mesos cluster using the
existing queueing system. For example, right now I run a Mesos system under
the SGE queue system.

I'm curious to see what other people think.

Kyle

Kyle Ellrott

John Chilton

Ravi K Madduri

Kyle Ellrott

Ketan Maheshwari

Kyle Ellrott

Ketan Maheshwari

John Chilton

Kyle Ellrott

tags

participants (5)