Hello all, QUESTION: When submitting jobs to the cluster as the real user, how should sudo scripts/drmaa_external_runner.py be told which Python to use, and how would it activate the venv if needed for the DRMAA dependency? BACKGROUND: We're currently trying Galaxy out on a new CentOS 6 VM, with matching CentOS 6 cluster, where jobs are submitted to SGE via DRMAA and run as the Linux user rather than a generic Galaxy Linux user account. This is documented on the wiki here: https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster This all seemed to be working under Galaxy v15.10 (using eggs), but we're now targeting the recently released Galaxy v16.10 (using wheels) instead and have run into problems. https://github.com/galaxyproject/galaxy/issues/1596 Because Galaxy is deprecating support for Python 2.6 (the default bundled with CentOS 6), we're now using a local copy of Python 2.7 (compiled from source) on a shared mount. This mismatch seems to be the root cause of the problem I will now describe. During job submission to SGE, Galaxy will attempt to run a command like this: $ sudo scripts/drmaa_external_runner.py 1005 /mnt/shared/galaxy/galaxy-dist/database/sge/132.jt_json
From the terminal output from ./run.sh we'd see:
RuntimeError: External_runjob failed (exit code 1) Child process reported error: Traceback (most recent call last): File "/mnt/shared/galaxy/galaxy-dist/scripts/drmaa_external_runner.py", line 15, in <module> import drmaa ImportError: No module named drmaa Although a drmaa wheel was installed within the Python 2.7 virtual environment under ~/galaxy-dist/.venv Galaxy makes no attempt to activate the venv for scripts/drmaa_external_runner.py We then installed DRMAA under our local copy of Python 2.7, and realised sudo scripts/drmaa_external_runner.py was not even using this copy of Python. Changing the hash bang line was a crude way to solve that (see below). This in turn lead to finding $DRMAA_LIBRARY_PATH and $SGE_ROOT were not set in the sudo environment. Again, you can hack around this by modifying scripts/drmaa_external_runner.py (see below). In our case, I suspect the least invasive change would be to install the DRMAA libraries under the system provided Python 2.6, and let sudo scripts/drmaa_external_runner.py execute that way. We still need to solve why sudo scripts/drmaa_external_runner.py does not see $DRMAA_LIBRARY_PATH and $SGE_ROOT but we have some clues to follow up on that: http://stackoverflow.com/questions/257616/sudo-changes-path-why Peter P.S. See also https://twitter.com/pjacock/status/704335582651162624 -- Here's our workaround diff - lots of hard coded strings, not portable at all but it worked for testing/debugging: $ git diff scripts/drmaa_external_runner.py diff --git a/scripts/drmaa_external_runner.py b/scripts/drmaa_external_runner.py index a1474fe..61d2383 100755 --- a/scripts/drmaa_external_runner.py +++ b/scripts/drmaa_external_runner.py @@ -1,5 +1,6 @@ +#!/mnt/shared/galaxy/apps/python/2.7.11/bin/python +#Was #!/usr/bin/env python - """ Submit a DRMAA job given a user id and a job template file (in JSON format) defining any or all of the following: args, remoteCommand, outputPath, @@ -12,8 +13,15 @@ import os import pwd import sys +# Hack +# print "$DRMAA_LIBRARY_PATH is %s" % os.environ.get('DRMAA_LIBRARY_PATH') +# print "$SGE_ROOT is %s" % os.environ.get('SGE_ROOT') +os.environ['DRMAA_LIBRARY_PATH'] = '/mnt/sge/lib/lx-amd64/libdrmaa.so' +os.environ['SGE_ROOT'] = '/mnt/sge' + import drmaa DRMAA_jobTemplate_attributes = [ 'args', 'remoteCommand', 'outputPath', 'errorPath', 'nativeSpecification', 'workingDirectory', 'jobName', 'email', 'project' ]