how to use projects for fair-share on compute-cluster
Galaxy sites usually do all work a compute cluster, with all jobs submitted as a "galaxy" unix user, so there isn't any "fair-share" accounting between users. Other sysops have created a solution to run jobs as the actual unix user, which may be feasible for an intranet site but is undesirable for a site accessible via the internet due to security reasons. A simpler and more secure method to enable fair-share is by using projects. Here's a simple scenario and straightforward solution: Multiple groups in an organization use the same galaxy site and it is desirable to enable fair-share accounting between the groups. All users in a group consume the same fair-share, which is generally acceptable. 1) configure scheduler with a project for each group, configure each user to use their group's project by default, and grant galaxy user access to submit jobs to any project; all users should be associated with a project. There's a good chance your grid is already configured this way. 2) create a database which maps galaxy user id to a project; i use a cron job to create a standalone sqlite3 db. since this is site-specific, code is not provided but hints are given below. Rather than having a separate database, the proj could have been added to the galaxy db, but i sought to minimize my changes. 3) add a snippet of code to drmaa.py's queue_job method to lookup proj from job_wrapper.user_id and append to jt.nativeSpecification; see below Here are the changes required. It's small enough that I didn't do this as a clone/patch. (1) lib/galaxy/jobs/runners/drmaa.py: 11 import sqlite3 12 ... 155 native_spec = self.get_native_spec( runner_url ) 156 157 # BEGIN ADD USER'S PROJ 158 if self.app.config.user_proj_map_db is not None: 159 try: 160 conn = sqlite3.connect(self.app.config.user_proj_map_db) 161 c = conn.cursor() 162 c.execute('SELECT PROJ FROM USER_PROJ WHERE GID=?', [job_wrapper.user_id]) 163 row = c.fetchone() 164 c.close 165 native_spec += ' -P ' + row[0] 166 except: 167 log.debug("Cannot look up proj of user %s" % job_wrapper.user_id) 168 # END ADD USER'S PROJ (2) lib/galaxy/config.py: add support for user_proj_map_db variable self.user_proj_map_db = resolve_path( kwargs.get( "user_proj_map_db", None ), self.root ) (3) universe_wsgi.ini: user_proj_map_db = /some/path/to/user_proj_map_db.sqlite (4) here's some suggestions to help get you started on a script to make the sqlite3 db. a) parse ldap tree example: (to get uid:email) ldapsearch -LLL -x -b 'ou=aliases,dc=jgi,dc=gov' b) parse scheduler config: (to get uid:proj) qconf -suserl | /usr/bin/xargs -I '{}' qconf -suser '{}' | egrep 'name|default_project' c) query galaxy db: (to get gid:email) select id, email from galaxy_user; The limitation of this method is that all jobs submitted by a user will always be charged to the same project (which may be okay, depending on how your organization uses projects). However a user may have access to several projects and may wish to associate some jobs with a particular project. This could be accomplished by adding an option to the user preferences; a user would chose a project from their available projects and any jobs submitted would have to record their currently chosen project. Alternatively, histories could be associated with a particular project. This solution would require significant changes to galaxy, so i haven't implemented it (and the simple solution works well enough for me). Edward Kirton US DOE JGI
correction: i didn't adequately test what happens if the user_proj_map_db was not defined in the universe file; here's the changes: 157 # BEGIN ADD USER'S PROJ 158 try: 159 conn = sqlite3.connect(self.app.config.user_proj_map_db) 160 c = conn.cursor() 161 c.execute('SELECT PROJ FROM USER_PROJ WHERE GID=?', [job_wrapper.user_id]) 162 row = c.fetchone() 163 c.close 164 native_spec += ' -P ' + row[0] 165 except: 166 log.debug("Cannot look up proj of user %s" % job_wrapper.user_id) 167 # /END ADD USER PROJ also, in the config, define a default instead of using None: self.user_proj_map_db = resolve_path( kwargs.get( "user_proj_map_db", "database/user_proj_map.sqlite" ), self.root ) one last note: there doesn't seem to be any error displayed to the user if the job cannot be scheduled because the galaxy user doesn't have permissions to use the user's project (although there is a log entry), but the job will never be scheduled. so be sure the galaxy user has permissions to submit to all possible projects.
Hey Ed, This is a neat approach. You could possibly also do this in the Galaxy database by associating users and groups with roles that match project names. A select list or history default that allowed users to select their "current" project/role would remove the single-project-per-user limitation. --nate On Jan 13, 2012, at 3:17 PM, Edward Kirton wrote:
correction: i didn't adequately test what happens if the user_proj_map_db was not defined in the universe file; here's the changes:
157 # BEGIN ADD USER'S PROJ 158 try: 159 conn = sqlite3.connect(self.app.config.user_proj_map_db) 160 c = conn.cursor() 161 c.execute('SELECT PROJ FROM USER_PROJ WHERE GID=?', [job_wrapper.user_id]) 162 row = c.fetchone() 163 c.close 164 native_spec += ' -P ' + row[0] 165 except: 166 log.debug("Cannot look up proj of user %s" % job_wrapper.user_id) 167 # /END ADD USER PROJ
also, in the config, define a default instead of using None: self.user_proj_map_db = resolve_path( kwargs.get( "user_proj_map_db", "database/user_proj_map.sqlite" ), self.root )
one last note: there doesn't seem to be any error displayed to the user if the job cannot be scheduled because the galaxy user doesn't have permissions to use the user's project (although there is a log entry), but the job will never be scheduled. so be sure the galaxy user has permissions to submit to all possible projects. ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Great idea, Nate (hint! hint!). On Thu, Jan 19, 2012 at 10:27 AM, Nate Coraor <nate@bx.psu.edu> wrote:
Hey Ed,
This is a neat approach. You could possibly also do this in the Galaxy database by associating users and groups with roles that match project names. A select list or history default that allowed users to select their "current" project/role would remove the single-project-per-user limitation.
--nate
On Jan 13, 2012, at 3:17 PM, Edward Kirton wrote:
correction: i didn't adequately test what happens if the user_proj_map_db was not defined in the universe file; here's the changes:
157 # BEGIN ADD USER'S PROJ 158 try: 159 conn = sqlite3.connect(self.app.config.user_proj_map_db) 160 c = conn.cursor() 161 c.execute('SELECT PROJ FROM USER_PROJ WHERE GID=?', [job_wrapper.user_id]) 162 row = c.fetchone() 163 c.close 164 native_spec += ' -P ' + row[0] 165 except: 166 log.debug("Cannot look up proj of user %s" % job_wrapper.user_id) 167 # /END ADD USER PROJ
also, in the config, define a default instead of using None: self.user_proj_map_db = resolve_path( kwargs.get( "user_proj_map_db", "database/user_proj_map.sqlite" ), self.root )
one last note: there doesn't seem to be any error displayed to the user if the job cannot be scheduled because the galaxy user doesn't have permissions to use the user's project (although there is a log entry), but the job will never be scheduled. so be sure the galaxy user has permissions to submit to all possible projects. ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
I really want this for Torque/Moab where the native spec flag is -A Sent from my iPhone On Jan 20, 2012, at 7:34 PM, "Edward Kirton" <eskirton@lbl.gov> wrote:
Great idea, Nate (hint! hint!).
On Thu, Jan 19, 2012 at 10:27 AM, Nate Coraor <nate@bx.psu.edu> wrote:
Hey Ed,
This is a neat approach. You could possibly also do this in the Galaxy database by associating users and groups with roles that match project names. A select list or history default that allowed users to select their "current" project/role would remove the single-project-per-user limitation.
--nate
On Jan 13, 2012, at 3:17 PM, Edward Kirton wrote:
correction: i didn't adequately test what happens if the user_proj_map_db was not defined in the universe file; here's the changes:
157 # BEGIN ADD USER'S PROJ 158 try: 159 conn = sqlite3.connect(self.app.config.user_proj_map_db) 160 c = conn.cursor() 161 c.execute('SELECT PROJ FROM USER_PROJ WHERE GID=?', [job_wrapper.user_id]) 162 row = c.fetchone() 163 c.close 164 native_spec += ' -P ' + row[0] 165 except: 166 log.debug("Cannot look up proj of user %s" % job_wrapper.user_id) 167 # /END ADD USER PROJ
also, in the config, define a default instead of using None: self.user_proj_map_db = resolve_path( kwargs.get( "user_proj_map_db", "database/user_proj_map.sqlite" ), self.root )
one last note: there doesn't seem to be any error displayed to the user if the job cannot be scheduled because the galaxy user doesn't have permissions to use the user's project (although there is a log entry), but the job will never be scheduled. so be sure the galaxy user has permissions to submit to all possible projects. ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Sure, it's on the "todo" list. Ping me again in a year. ;) --nate On Jan 20, 2012, at 7:33 PM, Edward Kirton wrote:
Great idea, Nate (hint! hint!).
On Thu, Jan 19, 2012 at 10:27 AM, Nate Coraor <nate@bx.psu.edu> wrote:
Hey Ed,
This is a neat approach. You could possibly also do this in the Galaxy database by associating users and groups with roles that match project names. A select list or history default that allowed users to select their "current" project/role would remove the single-project-per-user limitation.
--nate
On Jan 13, 2012, at 3:17 PM, Edward Kirton wrote:
correction: i didn't adequately test what happens if the user_proj_map_db was not defined in the universe file; here's the changes:
157 # BEGIN ADD USER'S PROJ 158 try: 159 conn = sqlite3.connect(self.app.config.user_proj_map_db) 160 c = conn.cursor() 161 c.execute('SELECT PROJ FROM USER_PROJ WHERE GID=?', [job_wrapper.user_id]) 162 row = c.fetchone() 163 c.close 164 native_spec += ' -P ' + row[0] 165 except: 166 log.debug("Cannot look up proj of user %s" % job_wrapper.user_id) 167 # /END ADD USER PROJ
also, in the config, define a default instead of using None: self.user_proj_map_db = resolve_path( kwargs.get( "user_proj_map_db", "database/user_proj_map.sqlite" ), self.root )
one last note: there doesn't seem to be any error displayed to the user if the job cannot be scheduled because the galaxy user doesn't have permissions to use the user's project (although there is a log entry), but the job will never be scheduled. so be sure the galaxy user has permissions to submit to all possible projects. ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Jan 23, 2012, at 11:09 AM, Nate Coraor wrote:
Sure, it's on the "todo" list. Ping me again in a year. ;)
Now in a ticket: https://bitbucket.org/galaxy/galaxy-central/issue/709/add-more-control-over-...
--nate
On Jan 20, 2012, at 7:33 PM, Edward Kirton wrote:
Great idea, Nate (hint! hint!).
On Thu, Jan 19, 2012 at 10:27 AM, Nate Coraor <nate@bx.psu.edu> wrote:
Hey Ed,
This is a neat approach. You could possibly also do this in the Galaxy database by associating users and groups with roles that match project names. A select list or history default that allowed users to select their "current" project/role would remove the single-project-per-user limitation.
--nate
On Jan 13, 2012, at 3:17 PM, Edward Kirton wrote:
correction: i didn't adequately test what happens if the user_proj_map_db was not defined in the universe file; here's the changes:
157 # BEGIN ADD USER'S PROJ 158 try: 159 conn = sqlite3.connect(self.app.config.user_proj_map_db) 160 c = conn.cursor() 161 c.execute('SELECT PROJ FROM USER_PROJ WHERE GID=?', [job_wrapper.user_id]) 162 row = c.fetchone() 163 c.close 164 native_spec += ' -P ' + row[0] 165 except: 166 log.debug("Cannot look up proj of user %s" % job_wrapper.user_id) 167 # /END ADD USER PROJ
also, in the config, define a default instead of using None: self.user_proj_map_db = resolve_path( kwargs.get( "user_proj_map_db", "database/user_proj_map.sqlite" ), self.root )
one last note: there doesn't seem to be any error displayed to the user if the job cannot be scheduled because the galaxy user doesn't have permissions to use the user's project (although there is a log entry), but the job will never be scheduled. so be sure the galaxy user has permissions to submit to all possible projects. ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (3)
-
Edward Kirton
-
Glen Beane
-
Nate Coraor