Hi John,
Does your plan involve having DRMAA handle the multiuser capability itself? I was under the impression that DRMAA cannot do it:
Indeed, I couldn't find a way to do it inside DRMAA. My patch uses sudo+setuid .
I've likewise started coding this myself. We run LSF, but I'm hoping there won't be anything cluster-specific about it -- so far I'm just doing (e)uid/(e)gid switching plus plain DRMAA.
My plan is exactly that. Have the regular galaxy SGE-runner do everything upto building the DRMAAs' jobTemplate, then, instead of "drmaa.runJob", write the JobTemplate to file (JSON/Pickle/Whatever), and sudo a tiny python script the will setuid() to the right user and execute the job based on the JobTemplate's data. This tiny script will return the SGE-JobID (or an error), and from there on the SGE-Runner module will go on as before. Hopefully this will incur minimum changes to the galaxy code (basically, just calling a different function at /lib/galaxy/jobs/runners/sge.py:217 ). The external script + 'sudo' is required because I don't want to have the Galaxy python script run as root (and without root I can't change the UID back-and-forth between users).
It's not done yet, though -- jobs are submitted as the appropriate user but galaxy is losing track of them.
My plan is to have the script return the job-id, and so galaxy will just pick it up and use it as if it run it directly.
It would of course be even better if DRMAA did this itself, but I expect there will also be many changes to galaxy required, especially regarding dataset management where the actual files are owned by different cluster users.
for starters I will require all users to be part of the 'galaxy' group, and all files will be 'g+rw' . Regards, -gordon