Hey Kyle, all,
If anyone wants to play with running Galaxy jobs within an Apache
Mesos environment I have added a prototype of this feature to the LWR.
This work distributes jobs across a Mesos cluster and injects a
MESOS_URL environment variable into the job runtime environment in
case the jobs themselves want to take advantage of Mesos.
The advantage of the LWR versus a traditional Galaxy runner is that
the job can be staged to remote resources without shared disk. Prior
to this I was imaging the LWR to be useful in cases where Galaxy and
remote cluster don't share common disk but where there is in fact a
shared scratch directory or something across the remote cluster as
well a resource manager. The LWR Mesos framework however has the
actual compute servers themselves stage the job up and down - so you
could imagine distributing Galaxy across large clusters without any
shared disk whatsoever - that could be very cool and help scale say
Downsides of an LWR-based approach versus a Galaxy approach is that it
is less mature and there is more stuff to configure - need to
configure a Galaxy job_conf plugin and destination, need to configure
the LWR itself, need to configure a message queue (for this variant of
LWR operation anyway - it should be possible to drive this via the LWR
in web server mode but I haven't added it yet). I would be more than
happy to continue to see progress toward Mesos support in Galaxy
It is strictly a prototype so far - a sort of playground if anyone
wants to play with these ideas and build something cool. It really is
a "framework" right - not so much a job scheduler so I am not sure it
is very immediately useful - but I imagine one could build cool stuff
on top of it.
Next, I think I would like to add Apache Aurora
(http://aurora.incubator.apache.org/) support - because it seems like
a much more traditional resource manager but built on top of Mesos so
it would be more practical for traditional Galaxy-style jobs. Doesn't
buy you anything in terms of parallelization but it would "fit better"
On Sat, Oct 26, 2013 at 2:43 PM, Kyle Ellrott <firstname.lastname@example.org> wrote:
> I think one of the aspects where Galaxy is a bit soft is the ability to do
> distributed tasks. The current system of split/replicate/merge tasks based
> on file type is a bit limited and hard for tool developers to expand upon.
> Distributed computing is a non-trival thing to implement and I think it
> would be a better use of our time to use an already existing framework. And
> it would also mean one less API for tool writers to have to develop for.
> I was wondering if anybody has looked at Mesos ( http://mesos.apache.org/ ).
> You can see an overview of the Mesos architecture at
> The important thing about Mesos is that it provides an API for C/C++,
> Java/Scala and Python to write distributed frameworks. There are already
> implementations of frameworks for common parallel programming systems such
> - Hadoop (https://github.com/mesos/hadoop)
> - MPI
> - Spark (http://spark-project.org)
> And you can find example Python framework at
> Integration with Galaxy would have three parts:
> 1) Add a system config variable to Galaxy called 'MESOS_URL' that is then
> passed to tool wrappers and allows them to contact the local mesos
> infrastructure (assuming the system has been configured) or pass a null if
> the system isn't available.
> 2) Write a tool runner that works as a mesos framework to executes single
> cpu jobs on the distributed system.
> 3) For instances where mesos is not available at a system wide level (say
> they only have access to an SGE based cluster), but the user wants to run
> distributed jobs, write a wrapper that can create a mesos cluster using the
> existing queueing system. For example, right now I run a Mesos system under
> the SGE queue system.
> I'm curious to see what other people think.
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> To search Galaxy mailing lists use the unified search at: