Re: [galaxy-dev] Galaxy job runner queue management questions

15 Mar 2011

      Ry4an Brase wrote:
...
As use of our Galaxy installation is picking up, we're getting a lot of
requests for greater fairness and transparency in the Galaxy job runner
area.
As I understand things the primary tool Galaxy gives us to affect
processing order and wait times with our torque-based setup is the
ability to map specific tools to varying queues or to keep them on a
local-runner.
On one end of the spectrum I could see a simple division of
small/fast/light jobs on local and big/heavy/slow job on a single
cluster queue.  On the other extreme one could set up a queue per tool
and use sophisticated queue management stuff on the torque side of
things to balance capacity across tools, users, expected processing
time, etc.
How are other sites handling this?
Hi Ry4an,

I'd prefer to keep most of the scheduling in the DRM (Torque, SGE, etc.)
since that's what it's designed to do.  That said, we want to make it as
easy as possible to do this, and Galaxy currently only sort of has the
ability to do it.  By currently I mean that you can set DRM parameters
per-tool in the config file.

There are a couple of pieces that need to exist.  For environments like
our public site where Galaxy users can't map one-to-one with system
users, Galaxy itself needs to be able to limit the number of jobs a user
can run on a particular cluster.  Work on this component is under way.

On environments where Galaxy users *are* system users, Galaxy needs to
do things that interact with the system, such as reading files from disk
for upload, exporting files for download, and submittings cluster jobs
as the real user.  Writing this is near-ish to the top of my list.

There's a final piece which we've discussed here quite a few times but
are not very close to implemeting.  That would be a config language to
allow Galaxy to make decisions about DRM parameters to set based on
variables like input size or sequence count, parameters selected, and so
forth.  A good example of where this is needed is in the mappers, which
currently have a hardcoded multiprocessor setting of 4 that is almost
certainly not appropriate for all environments.  Ideally Galaxy would be
able to decide where to run the job and based on that information, know
how many threads/processes to start based on the resources the job is
given.  I'd love to see this also be able to make assumptions about
runtime so that DRM backfill could be properly employed, but this may
not be possible since most job runtimes are probably not a calculable
function of the size of the input data and the selected parameters.

--nate
...
-- 
Ry4an Brase                                         612-626-6575
Software Developer                                  Application Development
University of Minnesota Supercomputing Institute    http://www.msi.umn.edu
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
http://lists.bx.psu.edu/

Re: [galaxy-dev] Galaxy job runner queue management questions

Nate Coraor