pbs_submit failed, PBS error 15024: Max hop count exceeded
Dear all, I'm not sure if its a galaxy issue, but perhaps someone could give me some advice on where to look or some configuration to set or... I am trying to set up a local installation of Galaxy in for a multi-uer environment. I've installed it on a compute cluster, running centos 5.5, python2.4, a TORQUE scheduler with Maui. Server name: b110-sc-hdn Compute cluster has three nodes. These are called: headnode, computenode1, computenode2 The default queue on the cluster is called "galaxy_queue". I've set jobs to run on this queue in the universe_wsgi.ini file However, when I try to run a job, I get the following error: "galaxy.jobs.runners.pbs DEBUG 2010-11-14 16:48:56,410 (59) pbs_submit failed, PBS error 15024: Max hop count exceeded"
From the research I have done, seemingly this occurs when there are issues resolving the domain name. But, I don't know how to check this or fix this.
I've ran the attached python test script which imports pbs python package. And it works fine. In the script, if I change the server name to "b110-sc-hdn", I recreate the error. Is there something obvious I am not getting? Any help would be very much appreciated and thanks alot for an exceptional tool! Grainne.
Grainne Kerr wrote:
Dear all,
I'm not sure if its a galaxy issue, but perhaps someone could give me some advice on where to look or some configuration to set or...
I am trying to set up a local installation of Galaxy in for a multi-uer environment. I've installed it on a compute cluster, running centos 5.5, python2.4, a TORQUE scheduler with Maui.
Server name: b110-sc-hdn Compute cluster has three nodes. These are called: headnode, computenode1, computenode2
The default queue on the cluster is called "galaxy_queue". I've set jobs to run on this queue in the universe_wsgi.ini file
However, when I try to run a job, I get the following error:
"galaxy.jobs.runners.pbs DEBUG 2010-11-14 16:48:56,410 (59) pbs_submit failed, PBS error 15024: Max hop count exceeded"
From the research I have done, seemingly this occurs when there are issues resolving the domain name. But, I don't know how to check this or fix this.
I've ran the attached python test script which imports pbs python package. And it works fine. In the script, if I change the server name to "b110-sc-hdn", I recreate the error. Is there something obvious I am not getting?
Hi Grainne, In your test script, could you see what value pbs.pbs_default() is returning? Also, check that one of the following is true: 'b110-sc-hdn' is set in /etc/hosts and is the same IP address that is resolved by your DNS server? OR 'b110-sc-hdn' does not appear in /etc/hosts, and resolves via DNS to the IP address of your server. --nate
Any help would be very much appreciated and thanks alot for an exceptional tool!
Grainne.
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
Hi Nate, Thanks a lot for your answer. Embarrasingly, it turns out that it was something obvious I was not getting. I have no excuse as it is pointed out clearly in the wiki - "run Galaxy as a non-root user" - torque does not like jobs to be submitted as root apparently. I've now set up a galaxy user, installed galaxy into the galaxy_user home directory and configured apache's httpd.conf file (all as indicated in the wiki) and it works! Thanks alot, Grainne.
Hi Grainne,
In your test script, could you see what value pbs.pbs_default() is returning? Also, check that one of the following is true:
'b110-sc-hdn' is set in /etc/hosts and is the same IP address that is resolved by your DNS server?
OR
'b110-sc-hdn' does not appear in /etc/hosts, and resolves via DNS to the IP address of your server.
--nate
Any help would be very much appreciated and thanks alot for an exceptional tool!
Grainne.
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
participants (2)
-
Grainne Kerr
-
Nate Coraor