Hi, Galaxy Community,
Greetings from The University of Chicago; I hope all who attended the
Galaxy conference enjoyed it as much as I did. I have searched the
mailing list archives as well as Google to a resolve problem I am
seeing, however I am somewhat at a loss as to the next course of
action I should be taking to bring this issue to a close. I am hoping
that one of the bright minds on this mailing list could help me shed
some light on the solution to my problem, or at least help me identify
a root cause. I have configured Galaxy to integrate with TORQUE
(version 4.0.2) server, and successfully built the PBS python egg as
specified in the Galaxy documentation. I am using Python version 2.6
and the latest build of Galaxy. Whenever I launch a job from the
Galaxy UI, I get the following error message(s) on the PBS server:
07/30/2012 15:42:48;0080;PBS_Server;Req;dis_request_read;conflicting
version numbers, 1 detected, 2 expected
07/30/2012 15:42:48;0080;PBS_Server;Req;req_reject;Reject reply
code=15058(Bad DIS based Request Protocol MSG=cannot decode message),
aux=0, type=AlternateUserAuthentication, from galaxy@
07/30/2012 15:42:48;0080;PBS_Server;Req;dis_request_read;conflicting
version numbers, 1 detected, 2 expected
07/30/2012 15:42:48;0080;PBS_Server;Req;req_reject;Reject reply
code=15058(Bad DIS based Request Protocol MSG=cannot decode message),
aux=0, type=QueueJob, from galaxy@
07/30/2012 15:42:48;0080;PBS_Server;Req;dis_request_read;conflicting
version numbers, 1 detected, 2 expected
07/30/2012 15:42:48;0080;PBS_Server;Req;req_reject;Reject reply
code=15058(Bad DIS based Request Protocol MSG=cannot decode message),
aux=0, type=Disconnect, from galaxy@
07/30/2012 15:43:01;0002;PBS_Server;Svr;PBS_Server;Torque Server
Version = 4.0.2, loglevel = 1
One thing I did notice, that suggests there might be a problem, is
that there is no hostname after the galaxy@; most of the other
messages in this log file have a host name appended to log entry,
i.e.:
07/30/2012 15:43:38;0100;PBS_Server;Req;;Type StatusJob request
received from root@sc01, sock=10
I have completed a tcpdump on the schedule node, and I can definitely
see bi-directional traffic between the Galaxy server and the scheduler
node on TCP port 15001. In addition to this, I have installed the
TORQUE client tools on on the Galaxy server, and can spawn an
interactive job with qsub -I, as well as check the status of queued
jobs using qstat (from the Galaxy server). This suggests to me that
there a potential problem with the PBS Egg, although I am not certain.
Has anybody seen something like this before, or could somebody point
me in the right direction? We do have a support contract with
Adaptive Computing, and I am opening a ticket with them as well,
however I wanted to reach out to the Galaxy community to cover all of
my bases. Thank-you so much for taking the time to read my email.
Dan Sullivan