Hello all,
QUESTION: When submitting jobs to the cluster as the real user, how should sudo scripts/drmaa_external_runner.py be told which Python to use, and how would it activate the venv if needed for the DRMAA dependency?
BACKGROUND:
We're currently trying Galaxy out on a new CentOS 6 VM, with matching CentOS 6 cluster, where jobs are submitted to SGE via DRMAA and run as the Linux user rather than a generic Galaxy Linux user account.
This is documented on the wiki here:
https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster
This all seemed to be working under Galaxy v15.10 (using eggs), but we're now targeting the recently released Galaxy v16.10 (using wheels) instead and have run into problems.
https://github.com/galaxyproject/galaxy/issues/1596
Because Galaxy is deprecating support for Python 2.6 (the default bundled with CentOS 6), we're now using a local copy of Python 2.7 (compiled from source) on a shared mount. This mismatch seems to be the root cause of the problem I will now describe.
During job submission to SGE, Galaxy will attempt to run a command like this:
$ sudo scripts/drmaa_external_runner.py 1005 /mnt/shared/galaxy/galaxy-dist/database/sge/132.jt_json
From the terminal output from ./run.sh we'd see:
RuntimeError: External_runjob failed (exit code 1) Child process reported error: Traceback (most recent call last): File "/mnt/shared/galaxy/galaxy-dist/scripts/drmaa_external_runner.py", line 15, in <module> import drmaa ImportError: No module named drmaa
Although a drmaa wheel was installed within the Python 2.7 virtual environment under ~/galaxy-dist/.venv Galaxy makes no attempt to activate the venv for scripts/drmaa_external_runner.py
We then installed DRMAA under our local copy of Python 2.7, and realised sudo scripts/drmaa_external_runner.py was not even using this copy of Python. Changing the hash bang line was a crude way to solve that (see below).
This in turn lead to finding $DRMAA_LIBRARY_PATH and $SGE_ROOT were not set in the sudo environment. Again, you can hack around this by modifying scripts/drmaa_external_runner.py (see below).
In our case, I suspect the least invasive change would be to install the DRMAA libraries under the system provided Python 2.6, and let sudo scripts/drmaa_external_runner.py execute that way.
We still need to solve why sudo scripts/drmaa_external_runner.py does not see $DRMAA_LIBRARY_PATH and $SGE_ROOT but we have some clues to follow up on that:
http://stackoverflow.com/questions/257616/sudo-changes-path-why
Peter
P.S. See also https://twitter.com/pjacock/status/704335582651162624
--
Here's our workaround diff - lots of hard coded strings, not portable at all but it worked for testing/debugging:
$ git diff scripts/drmaa_external_runner.py diff --git a/scripts/drmaa_external_runner.py b/scripts/drmaa_external_runner.py index a1474fe..61d2383 100755 --- a/scripts/drmaa_external_runner.py +++ b/scripts/drmaa_external_runner.py @@ -1,5 +1,6 @@ +#!/mnt/shared/galaxy/apps/python/2.7.11/bin/python +#Was #!/usr/bin/env python - """ Submit a DRMAA job given a user id and a job template file (in JSON format) defining any or all of the following: args, remoteCommand, outputPath, @@ -12,8 +13,15 @@ import os import pwd import sys
+# Hack +# print "$DRMAA_LIBRARY_PATH is %s" % os.environ.get('DRMAA_LIBRARY_PATH') +# print "$SGE_ROOT is %s" % os.environ.get('SGE_ROOT') +os.environ['DRMAA_LIBRARY_PATH'] = '/mnt/sge/lib/lx-amd64/libdrmaa.so' +os.environ['SGE_ROOT'] = '/mnt/sge' + import drmaa
DRMAA_jobTemplate_attributes = [ 'args', 'remoteCommand', 'outputPath', 'errorPath', 'nativeSpecification', 'workingDirectory', 'jobName', 'email', 'project' ]
On Mon, Feb 29, 2016 at 12:10 PM, Peter Cock p.j.a.cock@googlemail.com wrote:
Hello all,
QUESTION: When submitting jobs to the cluster as the real user, how should sudo scripts/drmaa_external_runner.py be told which Python to use, and how would it activate the venv if needed for the DRMAA dependency?
Hi Peter,
I think the easiest solution to this is probably to write a wrapper script that sets up the environment for drmaa_external_runner.py and call this wrapper (with the same args), which can then pass the args to drmaa_external_runner.py.
--nate
BACKGROUND:
We're currently trying Galaxy out on a new CentOS 6 VM, with matching CentOS 6 cluster, where jobs are submitted to SGE via DRMAA and run as the Linux user rather than a generic Galaxy Linux user account.
This is documented on the wiki here:
https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster
This all seemed to be working under Galaxy v15.10 (using eggs), but we're now targeting the recently released Galaxy v16.10 (using wheels) instead and have run into problems.
https://github.com/galaxyproject/galaxy/issues/1596
Because Galaxy is deprecating support for Python 2.6 (the default bundled with CentOS 6), we're now using a local copy of Python 2.7 (compiled from source) on a shared mount. This mismatch seems to be the root cause of the problem I will now describe.
During job submission to SGE, Galaxy will attempt to run a command like this:
$ sudo scripts/drmaa_external_runner.py 1005 /mnt/shared/galaxy/galaxy-dist/database/sge/132.jt_json
From the terminal output from ./run.sh we'd see:
RuntimeError: External_runjob failed (exit code 1) Child process reported error: Traceback (most recent call last): File "/mnt/shared/galaxy/galaxy-dist/scripts/drmaa_external_runner.py", line 15, in <module> import drmaa ImportError: No module named drmaa
Although a drmaa wheel was installed within the Python 2.7 virtual environment under ~/galaxy-dist/.venv Galaxy makes no attempt to activate the venv for scripts/drmaa_external_runner.py
We then installed DRMAA under our local copy of Python 2.7, and realised sudo scripts/drmaa_external_runner.py was not even using this copy of Python. Changing the hash bang line was a crude way to solve that (see below).
This in turn lead to finding $DRMAA_LIBRARY_PATH and $SGE_ROOT were not set in the sudo environment. Again, you can hack around this by modifying scripts/drmaa_external_runner.py (see below).
In our case, I suspect the least invasive change would be to install the DRMAA libraries under the system provided Python 2.6, and let sudo scripts/drmaa_external_runner.py execute that way.
We still need to solve why sudo scripts/drmaa_external_runner.py does not see $DRMAA_LIBRARY_PATH and $SGE_ROOT but we have some clues to follow up on that:
http://stackoverflow.com/questions/257616/sudo-changes-path-why
Peter
P.S. See also https://twitter.com/pjacock/status/704335582651162624
--
Here's our workaround diff - lots of hard coded strings, not portable at all but it worked for testing/debugging:
$ git diff scripts/drmaa_external_runner.py diff --git a/scripts/drmaa_external_runner.py b/scripts/drmaa_external_runner.py index a1474fe..61d2383 100755 --- a/scripts/drmaa_external_runner.py +++ b/scripts/drmaa_external_runner.py @@ -1,5 +1,6 @@ +#!/mnt/shared/galaxy/apps/python/2.7.11/bin/python +#Was #!/usr/bin/env python
""" Submit a DRMAA job given a user id and a job template file (in JSON format) defining any or all of the following: args, remoteCommand, outputPath, @@ -12,8 +13,15 @@ import os import pwd import sys
+# Hack +# print "$DRMAA_LIBRARY_PATH is %s" % os.environ.get('DRMAA_LIBRARY_PATH') +# print "$SGE_ROOT is %s" % os.environ.get('SGE_ROOT') +os.environ['DRMAA_LIBRARY_PATH'] = '/mnt/sge/lib/lx-amd64/libdrmaa.so' +os.environ['SGE_ROOT'] = '/mnt/sge'
import drmaa
DRMAA_jobTemplate_attributes = [ 'args', 'remoteCommand', 'outputPath', 'errorPath', 'nativeSpecification', 'workingDirectory', 'jobName', 'email', 'project' ] ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
On Wed, Mar 2, 2016 at 4:59 PM, Nate Coraor nate@bx.psu.edu wrote:
On Mon, Feb 29, 2016 at 12:10 PM, Peter Cock p.j.a.cock@googlemail.com wrote:
Hello all,
QUESTION: When submitting jobs to the cluster as the real user, how should sudo scripts/drmaa_external_runner.py be told which Python to use, and how would it activate the venv if needed for the DRMAA dependency?
Hi Peter,
I think the easiest solution to this is probably to write a wrapper script that sets up the environment for drmaa_external_runner.py and call this wrapper (with the same args), which can then pass the args to drmaa_external_runner.py.
--nate
Thanks Nate,
So we'd write a shell script called drmaa_external_runner.py which setups the Python environment and then calls a copy of the original Python script drmaa_external_runner.py - has anyone else tried this?
Clearly using Python 2.7 on a system with a Python 2.6 default (CentOS 6) is part of the problem. Maybe we can just install the Python DRMAA under the system Python 2.6 instead?
Thanks,
Peter
On Wed, Mar 2, 2016 at 12:07 PM, Peter Cock p.j.a.cock@googlemail.com wrote:
On Wed, Mar 2, 2016 at 4:59 PM, Nate Coraor nate@bx.psu.edu wrote:
On Mon, Feb 29, 2016 at 12:10 PM, Peter Cock p.j.a.cock@googlemail.com wrote:
Hello all,
QUESTION: When submitting jobs to the cluster as the real user, how should sudo scripts/drmaa_external_runner.py be told which Python to use, and how would it activate the venv if needed for the DRMAA dependency?
Hi Peter,
I think the easiest solution to this is probably to write a wrapper
script
that sets up the environment for drmaa_external_runner.py and call this wrapper (with the same args), which can then pass the args to drmaa_external_runner.py.
--nate
Thanks Nate,
So we'd write a shell script called drmaa_external_runner.py which setups the Python environment and then calls a copy of the original Python script drmaa_external_runner.py - has anyone else tried this?
The name of the run job script to call is controlled by the drmaa_external_runjob_script variable in galaxy.ini:
https://github.com/galaxyproject/galaxy/blob/dev/config/galaxy.ini.sample#L1...
Here you can change this to your wrapper.
Clearly using Python 2.7 on a system with a Python 2.6 default (CentOS 6) is part of the problem. Maybe we can just install the Python DRMAA under the system Python 2.6 instead?
This part would work, but wouldn't you still need the environment variables to be set?
--nate
Thanks,
Peter
On Wed, Mar 2, 2016 at 5:27 PM, Nate Coraor nate@bx.psu.edu wrote:
On Wed, Mar 2, 2016 at 12:07 PM, Peter Cock p.j.a.cock@googlemail.com wrote:
On Wed, Mar 2, 2016 at 4:59 PM, Nate Coraor nate@bx.psu.edu wrote:
On Mon, Feb 29, 2016 at 12:10 PM, Peter Cock p.j.a.cock@googlemail.com wrote:
Hello all,
QUESTION: When submitting jobs to the cluster as the real user, how should sudo scripts/drmaa_external_runner.py be told which Python to use, and how would it activate the venv if needed for the DRMAA dependency?
Hi Peter,
I think the easiest solution to this is probably to write a wrapper script that sets up the environment for drmaa_external_runner.py and call this wrapper (with the same args), which can then pass the args to drmaa_external_runner.py.
--nate
Thanks Nate,
So we'd write a shell script called drmaa_external_runner.py which setups the Python environment and then calls a copy of the original Python script drmaa_external_runner.py - has anyone else tried this?
The name of the run job script to call is controlled by the drmaa_external_runjob_script variable in galaxy.ini:
https://github.com/galaxyproject/galaxy/blob/dev/config/galaxy.ini.sample#L1...
Here you can change this to your wrapper.
Thanks Nate - I should have realised this was a configuration option.
Clearly using Python 2.7 on a system with a Python 2.6 default (CentOS 6) is part of the problem. Maybe we can just install the Python DRMAA under the system Python 2.6 instead?
This part would work, but wouldn't you still need the environment variables to be set?
--nate
Good point. The custom script option with hard coded environment variables sounds more straightforward that fighting the Linux OS settings.
Peter
galaxy-dev@lists.galaxyproject.org