Glen Beane wrote:
I'd prefer to keep most of the scheduling in the DRM (Torque, SGE, etc.) since that's what it's designed to do. That said, we want to make it as easy as possible to do this, and Galaxy currently only sort of has the ability to do it. By currently I mean that you can set DRM parameters per-tool in the config file.
There are a couple of pieces that need to exist. For environments like our public site where Galaxy users can't map one-to-one with system users, Galaxy itself needs to be able to limit the number of jobs a user can run on a particular cluster. Work on this component is under way.
On environments where Galaxy users *are* system users, Galaxy needs to do things that interact with the system, such as reading files from disk for upload, exporting files for download, and submittings cluster jobs as the real user. Writing this is near-ish to the top of my list.
There's a final piece which we've discussed here quite a few times but are not very close to implemeting. That would be a config language to allow Galaxy to make decisions about DRM parameters to set based on variables like input size or sequence count, parameters selected, and so forth. A good example of where this is needed is in the mappers, which currently have a hardcoded multiprocessor setting of 4 that is almost certainly not appropriate for all environments. Ideally Galaxy would be able to decide where to run the job and based on that information, know how many threads/processes to start based on the resources the job is given. I'd love to see this also be able to make assumptions about runtime so that DRM backfill could be properly employed, but this may not be possible since most job runtimes are probably not a calculable function of the size of the input data and the selected parameters.
--nate
are there issues open for these Galaxy changes? I would like to follow the development.
Yes: https://bitbucket.org/galaxy/galaxy-central/issue/106/run-cluster-jobs-as-th... --nate
-- Glen L. Beane Senior Software Engineer The Jackson Laboratory (207) 288-6153