Python on cluster for setting metadata after jobs complete
On Mon, Jan 25, 2016 at 11:33 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hello all,
We're currently looking at changing our Galaxy setup to link user accounts with Linux user accounts for better cluster integration (running jobs as the actual user on SGE). As part of this, we've tried setting up a fresh installation on a new VM which has thrown up some issues.
[snip]
The next problem on my list was jobs successfully submitting to SGE and running, then failing with a Python exception: galaxy.eggs.EggNotFetchable These emails from Donald Shrum and Jingchao Zhang (BCC'd) look very similar, although I don't see they ever had a reply/resolution: https://lists.galaxyproject.org/pipermail/galaxy-dev/2014-October/020719.htm... https://lists.galaxyproject.org/pipermail/galaxy-dev/2014-January/018034.htm... We eventually realised that the new Galaxy VM server was running Python 2.7 (default under CentOS 7), but the cluster nodes were running Python 2.6 (default under CentOS 6). (Our current Galaxy server is also still running CentOS 6, which is likely why I never hit this before.) Replacing the VM with CentOS 6 we were able to get this to work - but was that necessary? How exactly do the cluster jobs invoke Python (especially when run as the associated Linux user account rather than under the Galaxy Linux account)? Sadly the cluster documentation does not mention Python at all: https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster/ This seems to be an oversight, Regards, Peter
The script generated to call Galaxy is here: https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/metada... The job template stuff that setups of the environment the job runs in is here: https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/jobs/runners/uti... This second file changes in a large way with 16.01 which ditches eggs for virtual environments and wheels. We don't explicitly support running different versions of Python on the worker and handler it seems - but I have seen it work before. You could probably hack up DEFAULT_JOB_FILE_TEMPLATE.sh to point at a different instance of Galaxy for your version. I'd hope there was something easier though. On Mon, Jan 25, 2016 at 3:05 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Mon, Jan 25, 2016 at 11:33 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Hello all,
We're currently looking at changing our Galaxy setup to link user accounts with Linux user accounts for better cluster integration (running jobs as the actual user on SGE). As part of this, we've tried setting up a fresh installation on a new VM which has thrown up some issues.
[snip]
The next problem on my list was jobs successfully submitting to SGE and running, then failing with a Python exception:
galaxy.eggs.EggNotFetchable
These emails from Donald Shrum and Jingchao Zhang (BCC'd) look very similar, although I don't see they ever had a reply/resolution:
https://lists.galaxyproject.org/pipermail/galaxy-dev/2014-October/020719.htm... https://lists.galaxyproject.org/pipermail/galaxy-dev/2014-January/018034.htm...
We eventually realised that the new Galaxy VM server was running Python 2.7 (default under CentOS 7), but the cluster nodes were running Python 2.6 (default under CentOS 6).
(Our current Galaxy server is also still running CentOS 6, which is likely why I never hit this before.)
Replacing the VM with CentOS 6 we were able to get this to work - but was that necessary? How exactly do the cluster jobs invoke Python (especially when run as the associated Linux user account rather than under the Galaxy Linux account)?
Sadly the cluster documentation does not mention Python at all:
https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster/
This seems to be an oversight,
Regards,
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Thanks John, On Mon, Jan 25, 2016 at 3:29 PM, John Chilton <jmchilton@gmail.com> wrote:
The script generated to call Galaxy is here:
https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/metada...
The job template stuff that setups of the environment the job runs in is here:
https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/jobs/runners/uti...
This second file changes in a large way with 16.01 which ditches eggs for virtual environments and wheels.
We were trying with v15.10 (I think), but since its late January 2016, can we expect a v16.01 release shortly? That might be quite timely as we're not going to be working on our new VM server and its cluster integration till next week...
We don't explicitly support running different versions of Python on the worker and handler it seems - but I have seen it work before. You could probably hack up DEFAULT_JOB_FILE_TEMPLATE.sh to point at a different instance of Galaxy for your version. I'd hope there was something easier though.
So there should probably be a note on the cluster wiki page about recommending having the same version of Python on the cluster nodes and Galaxy sever? https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster Peter
On Mon, Jan 25, 2016 at 3:44 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Thanks John,
On Mon, Jan 25, 2016 at 3:29 PM, John Chilton <jmchilton@gmail.com> wrote:
The script generated to call Galaxy is here:
https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/metada...
The job template stuff that setups of the environment the job runs in is here:
https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/jobs/runners/uti...
This second file changes in a large way with 16.01 which ditches eggs for virtual environments and wheels.
We were trying with v15.10 (I think), but since its late January 2016, can we expect a v16.01 release shortly? That might be quite timely as we're not going to be working on our new VM server and its cluster integration till next week...
If I was doing something new and doing configuration testing - I would target 16.01 - the release will probably happen this week - it has been running on main for some time now. More than other recent releases I would target this one a little ahead of time because it makes some biggish changes to how Galaxy is configured - uwsgi needs to be tweaked, we are swapping from eggs to wheels, there are some changes to job script generation to isolate tool dependency evaluation. More than other recent upgrade processes (15.04 -> 15.07 and 15.07 -> 15.10) I think there is the potential for little hiccups.
We don't explicitly support running different versions of Python on the worker and handler it seems - but I have seen it work before. You could probably hack up DEFAULT_JOB_FILE_TEMPLATE.sh to point at a different instance of Galaxy for your version. I'd hope there was something easier though.
So there should probably be a note on the cluster wiki page about recommending having the same version of Python on the cluster nodes and Galaxy sever?
https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster
If you can find a clear place to add that note, I'd go for it ;). In general though it is a broader problem - all tool shed installations happen on the web servers - not on the cluster workers. I know many institutions run different OSes between those machines - but if you think about it the tool shed doesn't really support it - it just sort of happens that usually there isn't problems. Thanks Peter, -John
Peter
On Mon, Jan 25, 2016 at 5:50 PM, John Chilton <jmchilton@gmail.com> wrote:
On Mon, Jan 25, 2016 at 3:44 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Thanks John,
On Mon, Jan 25, 2016 at 3:29 PM, John Chilton <jmchilton@gmail.com> wrote:
The script generated to call Galaxy is here:
https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/metada...
The job template stuff that setups of the environment the job runs in is here:
https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/jobs/runners/uti...
This second file changes in a large way with 16.01 which ditches eggs for virtual environments and wheels.
We were trying with v15.10 (I think), but since its late January 2016, can we expect a v16.01 release shortly? That might be quite timely as we're not going to be working on our new VM server and its cluster integration till next week...
If I was doing something new and doing configuration testing - I would target 16.01 - the release will probably happen this week - it has been running on main for some time now. More than other recent releases I would target this one a little ahead of time because it makes some biggish changes to how Galaxy is configured - uwsgi needs to be tweaked, we are swapping from eggs to wheels, there are some changes to job script generation to isolate tool dependency evaluation. More than other recent upgrade processes (15.04 -> 15.07 and 15.07 -> 15.10) I think there is the potential for little hiccups.
Useful food for thought - thank you. We'll give that a go in Feb 2016, hopefully early next week :)
We don't explicitly support running different versions of Python on the worker and handler it seems - but I have seen it work before. You could probably hack up DEFAULT_JOB_FILE_TEMPLATE.sh to point at a different instance of Galaxy for your version. I'd hope there was something easier though.
So there should probably be a note on the cluster wiki page about recommending having the same version of Python on the cluster nodes and Galaxy sever?
https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster
If you can find a clear place to add that note, I'd go for it ;).
Done: https://wiki.galaxyproject.org/Admin/Config/Performance/Cluster?action=diff&rev1=52&rev2=53
In general though it is a broader problem - all tool shed installations happen on the web servers - not on the cluster workers. I know many institutions run different OSes between those machines - but if you think about it the tool shed doesn't really support it - it just sort of happens that usually there isn't problems.
Good point - that's quite a minefield with consistent versions of Python on the Galaxy server and cluster just the tip of the iceberg. That does strongly suggest sticking with the same version of Linux on both if possible... perhaps another note on the wiki? Thanks, Peter
participants (2)
-
John Chilton
-
Peter Cock