Sonali Amonkar wrote:
I am currently facing another issue. When I run my Workflow, I am seeing the following error on the server log. This error is not consistent, and occurs in an erratic manner.
galaxy.jobs INFO 2011-02-03 05:17:03,522 job 151 dispatched galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,755 (150/69156.<primaryserver>) PBS job has left queue galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,879 (151) submitting file galaxy-dist/database/pbs/151.sh galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,880 (151) command is: java -cp galaxy-dist/tools/my_tools/jars/PreRef1.jar RefFilterModule galaxy-dist/database/files/000/dataset_192.dat galaxy-dist/database/files/000/dataset_194.dat galaxy-dist/database/files/000/dataset_195.dat galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,880 (151) pbs_submit failed, PBS error 15031: Protocol (ASN.1) error galaxy.jobs DEBUG 2011-02-03 05:17:13,363 job 150 ended galaxy.jobs ERROR 2011-02-03 05:17:15,816 Unable to cleanup job 152
Hi Sonali, I am pretty sure this problem is somehow specific to the TORQUE setup, and I see you also posted this to the torquedev list, but unfortunately received no response. I am not sure what is up here, but you may want to try adjusting the 'tcp_timeout' server setting. (qmgr -c 'set server tcp_timeout = X') --nate
Any help/pointer would be appreciated for this issue. Thank you very much for your time Nate.
Regards, Sonali