More flexible use of Sun Grid Engine by Galaxy
Hi there At the moment Galaxy's choice of Sun Grid Engine settings are done on a per-tool basis - that is, you can define different queues and projects on a per-tool basis. However, I've got two use cases that currently aren't supported. 1) Per-user settings. The SGEJobRunner potentially has access to user information (via the job.session_id), and while all jobs get run as user galaxy, per-user settings could be simulated by mapping Galaxy users to SGE projects. Then the SGE admin can do load-balancing, etc on a per-project basis. For a site with a relatively small and static set of users this could work. Alternately you need to map Galaxy users to SGE users - this is much trickier, requiring elevated privileges and thus probably a separate job runner daemon. 2) Queue selection. Again this could be done on a per-tool basis, but usage patterns don't always support that. So for example, our site has two queues - a long running one and a short running one. Certain resources are reserved for only the short running queue's usage. How to select a different queue for jobs raises a bit of a design problem for me - effectively you're telling Galaxy "I want to run this workflow, but with this extra parameter" - its almost like having a "meta parameter" since clearly you don't want to have queue selection as an input for any particular tool. So I'd be interested in the Galaxy team's input as to how best to address these two use cases. Thanks, Peter
Hi Peter, I had similar issues and had success setting up galaxy this way: (1) Per-group settings: I wanted per-group settings as different groups in my organization owned their own SGE clusters or fair-share allocation (each group either submits to the same cluster but different queue as different user OR they have their own cluster entirely). To do this, you may set up multiple galaxy sites (i.e. hg clone). I have each group configured as subdirs on my galaxy domain (e.g. mygalaxy.gov/main, mygalaxy.gov/public, mygalaxy.gov/group1, etc.). Each site has (a) different environment variables pointing to its desired cluster (e.g. $SGE_CELL) -- I added these to the run.sh script actually just to make sure i didn't get confused and (b) universe_wsgi.ini has different job runner definition lines for the tools (specifying queue, resource, project). Each group is to login to their own site and I used Apache group authentication to enforce this. A few comments: for these per-group sites, i'm not using the load balancing by separate job and web runners for the group sites, i am not saving job data in the database, and consequently there is no recovery on restart. since the group sites have a relatively small number of users, this doesn't seem to be a problem. Also, i use a hardware-based solution for this (load balancing router), but it hardly seems necessary for the workloads I've seen so far (but is necessary for redundancy in case of server hardware failure which may not be an issue for you, depending on your SLA). I should also mention all group sites use the same postgresql db, so workflows/histories/datasets can be shared across the organization. However this requires that updates from galaxy-central be done in concert in case there are db schema changes (i have sudo su access to all other groups' galaxy-xx user), but i allow groups to push/pull other changes anytime (groups are not allowed to change galaxy internals/db schema, just add tools and datatypes). (2) queue selection this functionality is already there as of a few months ago. the fields in the job runner defline/url are: 0 : sge or pbs 1 : ?? 2 : cell (only one allowed per galaxy instance -- that's why i have multiple group sites) 3 : queue 4 : project 5 : params (e.g. resources) so my default job runner looks like this: sge:///galaxy.q//-b y -V -l medium.c/ for shorter or longer running jobs (or large ram requirements), I use different -l options (must specify on a per-tool basis). This is a minor inconvenience because when a group adds a tool, their configuration won't automatically appear in the universe file because it's not tracked (cannot be, since they have different ports and names). Hope this helps, Ed On Mon, Jun 21, 2010 at 6:34 AM, Peter van Heusden <pvh@sanbi.ac.za> wrote:
Hi there
At the moment Galaxy's choice of Sun Grid Engine settings are done on a per-tool basis - that is, you can define different queues and projects on a per-tool basis. However, I've got two use cases that currently aren't supported.
1) Per-user settings. The SGEJobRunner potentially has access to user information (via the job.session_id), and while all jobs get run as user galaxy, per-user settings could be simulated by mapping Galaxy users to SGE projects. Then the SGE admin can do load-balancing, etc on a per-project basis. For a site with a relatively small and static set of users this could work. Alternately you need to map Galaxy users to SGE users - this is much trickier, requiring elevated privileges and thus probably a separate job runner daemon.
2) Queue selection. Again this could be done on a per-tool basis, but usage patterns don't always support that. So for example, our site has two queues - a long running one and a short running one. Certain resources are reserved for only the short running queue's usage. How to select a different queue for jobs raises a bit of a design problem for me - effectively you're telling Galaxy "I want to run this workflow, but with this extra parameter" - its almost like having a "meta parameter" since clearly you don't want to have queue selection as an input for any particular tool.
So I'd be interested in the Galaxy team's input as to how best to address these two use cases.
Thanks, Peter
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
Peter van Heusden wrote:
Hi there
At the moment Galaxy's choice of Sun Grid Engine settings are done on a per-tool basis - that is, you can define different queues and projects on a per-tool basis. However, I've got two use cases that currently aren't supported.
1) Per-user settings. The SGEJobRunner potentially has access to user information (via the job.session_id), and while all jobs get run as user galaxy, per-user settings could be simulated by mapping Galaxy users to SGE projects. Then the SGE admin can do load-balancing, etc on a per-project basis. For a site with a relatively small and static set of users this could work. Alternately you need to map Galaxy users to SGE users - this is much trickier, requiring elevated privileges and thus probably a separate job runner daemon.
Hi Peter, There's an issue in our tracker to implement functionality allowing jobs to run on the cluster as real users instead of the Galaxy user: http://bitbucket.org/galaxy/galaxy-central/issue/106 Once implemented, you could then define which resources used by which users directly in Grid Engine. I think this would be the cleanest way to do #1.
2) Queue selection. Again this could be done on a per-tool basis, but usage patterns don't always support that. So for example, our site has two queues - a long running one and a short running one. Certain resources are reserved for only the short running queue's usage. How to select a different queue for jobs raises a bit of a design problem for me - effectively you're telling Galaxy "I want to run this workflow, but with this extra parameter" - its almost like having a "meta parameter" since clearly you don't want to have queue selection as an input for any particular tool.
So I'd be interested in the Galaxy team's input as to how best to address these two use cases.
This has been a pretty difficult one to define, since it's almost impossible to look at a job and decide how long it's going to run before you run it. Your solution of giving the users the choice is interesting, although would be a pretty site-specific implementation, since that choice may be a queue, a project, a different cell, or just different qsub parameters. --nate
Thanks, Peter
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
participants (3)
-
Edward Kirton
-
Nate Coraor
-
Peter van Heusden