external_chown_script.py and the local job runner
On Mon, Jan 25, 2016 at 11:33 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hello all,
We're currently looking at changing our Galaxy setup to link user accounts with Linux user accounts for better cluster integration (running jobs as the actual user on SGE). As part of this, we've tried setting up a fresh installation on a new VM which has thrown up some issues.
[snip]
We eventually got Galaxy to talk to SGE and submit jobs successfully as the individual user's Linux account, with external_chown_script.py being called to handle ownership of the files. However, we would like to have some jobs (like the upload tool) configured to run on the Galaxy server itself - and the easiest way to do that is via the local job runner. We tried both the "upload1" and "Convert characters1" tools. These jobs would start and seem to run, but then fail with a file permission error (I don't have a stack trace to hand). From watching the Galaxy terminal output from run.sh we could see external_chown_script.py being called. As a test, when we disabled the external_chown_script setting in config/galaxy.ini then the local jobs would work. When using the local job runner, does Galaxy run the child process as the Galaxy user (my guess - no chown needed), or as the job owner's Linux account (calling chown would be needed)? Thanks, Peter
On Mon, Jan 25, 2016 at 3:26 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Mon, Jan 25, 2016 at 11:33 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hello all,
We're currently looking at changing our Galaxy setup to link user accounts with Linux user accounts for better cluster integration (running jobs as the actual user on SGE). As part of this, we've tried setting up a fresh installation on a new VM which has thrown up some issues.
[snip]
We eventually got Galaxy to talk to SGE and submit jobs successfully as the individual user's Linux account, with external_chown_script.py being called to handle ownership of the files.
However, we would like to have some jobs (like the upload tool) configured to run on the Galaxy server itself - and the easiest way to do that is via the local job runner.
We tried both the "upload1" and "Convert characters1" tools. These jobs would start and seem to run, but then fail with a file permission error (I don't have a stack trace to hand). From watching the Galaxy terminal output from run.sh we could see external_chown_script.py being called.
As a test, when we disabled the external_chown_script setting in config/galaxy.ini then the local jobs would work.
When using the local job runner, does Galaxy run the child process as the Galaxy user (my guess - no chown needed), or as the job owner's Linux account (calling chown would be needed)?
Yes, only the drmaa and pulsar runners really support running jobs as a different user. The local job runner pretty explicitly will only ever run as the Galaxy user.
Thanks,
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
On Mon, Jan 25, 2016 at 3:33 PM, John Chilton <jmchilton@gmail.com> wrote:
On Mon, Jan 25, 2016 at 3:26 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Mon, Jan 25, 2016 at 11:33 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hello all,
We're currently looking at changing our Galaxy setup to link user accounts with Linux user accounts for better cluster integration (running jobs as the actual user on SGE). As part of this, we've tried setting up a fresh installation on a new VM which has thrown up some issues.
[snip]
We eventually got Galaxy to talk to SGE and submit jobs successfully as the individual user's Linux account, with external_chown_script.py being called to handle ownership of the files.
However, we would like to have some jobs (like the upload tool) configured to run on the Galaxy server itself - and the easiest way to do that is via the local job runner.
We tried both the "upload1" and "Convert characters1" tools. These jobs would start and seem to run, but then fail with a file permission error (I don't have a stack trace to hand). From watching the Galaxy terminal output from run.sh we could see external_chown_script.py being called.
As a test, when we disabled the external_chown_script setting in config/galaxy.ini then the local jobs would work.
When using the local job runner, does Galaxy run the child process as the Galaxy user (my guess - no chown needed), or as the job owner's Linux account (calling chown would be needed)?
Yes, only the drmaa and pulsar runners really support running jobs as a different user. The local job runner pretty explicitly will only ever run as the Galaxy user.
Thanks John, In that case is there a bug that external_chown_script.py gets called unconditionally, when it should only happen for the drmaa and pulsar runners? Peter
I think the problem is more that it is configured globally and not per-destination. The real user stuff should all be per-destination and not globally configured - since it should be possible to have like a dedicated cluster for Galaxy jobs that just run jobs normally and a general purpose cluster that submits jobs as the real user for accounting purposes. I have created a WIP pull request to move the configuration of these options in that direction: https://github.com/galaxyproject/galaxy/pull/1573 I haven't tested any of the changes in the PR yet - I just wanted to open something to ensure I'd come back and finish things this cycle. It is something that I have wanted to do for a while now - see https://trello.com/c/6w8bples. In general there are a bunch of Galaxy options for jobs that can only be configured in galaxy.ini but that you may wish to have different values for depending on the job destination. You are going to want a smaller hack that can be backported just to run the local job runner when that option is configured huh? -John On Mon, Jan 25, 2016 at 3:38 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Mon, Jan 25, 2016 at 3:33 PM, John Chilton <jmchilton@gmail.com> wrote:
On Mon, Jan 25, 2016 at 3:26 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Mon, Jan 25, 2016 at 11:33 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hello all,
We're currently looking at changing our Galaxy setup to link user accounts with Linux user accounts for better cluster integration (running jobs as the actual user on SGE). As part of this, we've tried setting up a fresh installation on a new VM which has thrown up some issues.
[snip]
We eventually got Galaxy to talk to SGE and submit jobs successfully as the individual user's Linux account, with external_chown_script.py being called to handle ownership of the files.
However, we would like to have some jobs (like the upload tool) configured to run on the Galaxy server itself - and the easiest way to do that is via the local job runner.
We tried both the "upload1" and "Convert characters1" tools. These jobs would start and seem to run, but then fail with a file permission error (I don't have a stack trace to hand). From watching the Galaxy terminal output from run.sh we could see external_chown_script.py being called.
As a test, when we disabled the external_chown_script setting in config/galaxy.ini then the local jobs would work.
When using the local job runner, does Galaxy run the child process as the Galaxy user (my guess - no chown needed), or as the job owner's Linux account (calling chown would be needed)?
Yes, only the drmaa and pulsar runners really support running jobs as a different user. The local job runner pretty explicitly will only ever run as the Galaxy user.
Thanks John,
In that case is there a bug that external_chown_script.py gets called unconditionally, when it should only happen for the drmaa and pulsar runners?
Peter
On Mon, Jan 25, 2016 at 4:44 PM, John Chilton <jmchilton@gmail.com> wrote:
I think the problem is more that it is configured globally and not per-destination. The real user stuff should all be per-destination and not globally configured - since it should be possible to have like a dedicated cluster for Galaxy jobs that just run jobs normally and a general purpose cluster that submits jobs as the real user for accounting purposes.
Excellent point.
I have created a WIP pull request to move the configuration of these options in that direction:
https://github.com/galaxyproject/galaxy/pull/1573
I haven't tested any of the changes in the PR yet - I just wanted to open something to ensure I'd come back and finish things this cycle.
It is something that I have wanted to do for a while now - see https://trello.com/c/6w8bples. In general there are a bunch of Galaxy options for jobs that can only be configured in galaxy.ini but that you may wish to have different values for depending on the job destination.
That makes a lot of sense :)
You are going to want a smaller hack that can be backported just to run the local job runner when that option is configured huh?
-John
i.e. Only attempt to call external_chown_script.py for the drmaa and pulsar runners? The full fix looks more involved and likely to take a while to gestate. If you think it sensible to get a quick hack into the main development branch now (and the next stable release), that would be handy. Has anyone else hit this problem, and how did you solve it? We're considering trying an alternative of using SGE with a separate queue consisting of just the Galaxy server, as a substitute for the local worker - but we won't have a chance to try that till next week. Peter
On Mon, Jan 25, 2016 at 4:59 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Mon, Jan 25, 2016 at 4:44 PM, John Chilton <jmchilton@gmail.com> wrote:
I think the problem is more that it is configured globally and not per-destination. The real user stuff should all be per-destination and not globally configured - since it should be possible to have like a dedicated cluster for Galaxy jobs that just run jobs normally and a general purpose cluster that submits jobs as the real user for accounting purposes.
Excellent point.
I have created a WIP pull request to move the configuration of these options in that direction:
https://github.com/galaxyproject/galaxy/pull/1573
I haven't tested any of the changes in the PR yet - I just wanted to open something to ensure I'd come back and finish things this cycle.
It is something that I have wanted to do for a while now - see https://trello.com/c/6w8bples. In general there are a bunch of Galaxy options for jobs that can only be configured in galaxy.ini but that you may wish to have different values for depending on the job destination.
That makes a lot of sense :)
Recap: Running jobs on a cluster as the real user can break the local job runner under Galaxy v16.01. This was a Galaxy bug due to the external_chown_script setting being used on all job runners (even when the file owner should have stayed as the Galaxy Linux account). John had a pull request to improve this, https://github.com/galaxyproject/galaxy/pull/1573 This was superseded by the following which has been merged: https://github.com/galaxyproject/galaxy/pull/1688 I think this means that per-job-runner config will be possible in the next release, so we can set external_chown_script for only for DRMAA jobs sent to the cluster.
We're considering trying an alternative of using SGE with a separate queue consisting of just the Galaxy server, as a substitute for the local worker - but we won't have a chance to try that till next week.
We are currently doing this on our test Galaxy v16.01 server, but it does make the cluster integration even more complicated (the Galaxy server must be both an SGE submit node and also an SGE worker node, and you need an extra queue just for sending upload jobs etc to run on the Galaxy server). Peter
participants (2)
-
John Chilton
-
Peter Cock