Glad to see someone else is playing around with Mesos.
I have a mesos branch that is getting a little long in the tooth. I'd like
to get a straight job runner (non-LWR, with a shared file system) running
under mesos for Galaxy before I submit that work for a pull request.
The hackathon is only 12 days away! Hopefully we'll be able to make some
progress on these sorts of projects.
On Sun, Jun 15, 2014 at 4:06 PM, John Chilton <jmchilton(a)gmail.com> wrote:
Hey Kyle, all,
If anyone wants to play with running Galaxy jobs within an Apache
Mesos environment I have added a prototype of this feature to the LWR.
This work distributes jobs across a Mesos cluster and injects a
MESOS_URL environment variable into the job runtime environment in
case the jobs themselves want to take advantage of Mesos.
The advantage of the LWR versus a traditional Galaxy runner is that
the job can be staged to remote resources without shared disk. Prior
to this I was imaging the LWR to be useful in cases where Galaxy and
remote cluster don't share common disk but where there is in fact a
shared scratch directory or something across the remote cluster as
well a resource manager. The LWR Mesos framework however has the
actual compute servers themselves stage the job up and down - so you
could imagine distributing Galaxy across large clusters without any
shared disk whatsoever - that could be very cool and help scale say
Downsides of an LWR-based approach versus a Galaxy approach is that it
is less mature and there is more stuff to configure - need to
configure a Galaxy job_conf plugin and destination, need to configure
the LWR itself, need to configure a message queue (for this variant of
LWR operation anyway - it should be possible to drive this via the LWR
in web server mode but I haven't added it yet). I would be more than
happy to continue to see progress toward Mesos support in Galaxy
It is strictly a prototype so far - a sort of playground if anyone
wants to play with these ideas and build something cool. It really is
a "framework" right - not so much a job scheduler so I am not sure it
is very immediately useful - but I imagine one could build cool stuff
on top of it.
Next, I think I would like to add Apache Aurora
) support - because it seems like
a much more traditional resource manager but built on top of Mesos so
it would be more practical for traditional Galaxy-style jobs. Doesn't
buy you anything in terms of parallelization but it would "fit better"
On Sat, Oct 26, 2013 at 2:43 PM, Kyle Ellrott <kellrott(a)soe.ucsc.edu>
> I think one of the aspects where Galaxy is a bit soft is the ability to
> distributed tasks. The current system of split/replicate/merge tasks
> on file type is a bit limited and hard for tool developers to expand
> Distributed computing is a non-trival thing to implement and I think it
> would be a better use of our time to use an already existing framework.
> it would also mean one less API for tool writers to have to develop for.
> I was wondering if anybody has looked at Mesos (
> You can see an overview of the Mesos architecture at
> The important thing about Mesos is that it provides an API for C/C++,
> Java/Scala and Python to write distributed frameworks. There are already
> implementations of frameworks for common parallel programming systems
> - Hadoop (https://github.com/mesos/hadoop
> - MPI
> - Spark (http://spark-project.org
> And you can find example Python framework at
> Integration with Galaxy would have three parts:
> 1) Add a system config variable to Galaxy called 'MESOS_URL' that is then
> passed to tool wrappers and allows them to contact the local mesos
> infrastructure (assuming the system has been configured) or pass a null
> the system isn't available.
> 2) Write a tool runner that works as a mesos framework to executes single
> cpu jobs on the distributed system.
> 3) For instances where mesos is not available at a system wide level (say
> they only have access to an SGE based cluster), but the user wants to run
> distributed jobs, write a wrapper that can create a mesos cluster using
> existing queueing system. For example, right now I run a Mesos system
> the SGE queue system.
> I'm curious to see what other people think.
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> To search Galaxy mailing lists use the unified search at: