Nate Coraor wrote:
Sonali Amonkar wrote:
I am currently facing another issue. When I run my Workflow, I am seeing the following error on the server log. This error is not consistent, and occurs in an erratic manner.
galaxy.jobs INFO 2011-02-03 05:17:03,522 job 151 dispatched galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,755 (150/69156.<primaryserver>) PBS job has left queue galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,879 (151) submitting file galaxy-dist/database/pbs/151.sh galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,880 (151) command is: java -cp galaxy-dist/tools/my_tools/jars/PreRef1.jar RefFilterModule galaxy-dist/database/files/000/dataset_192.dat galaxy-dist/database/files/000/dataset_194.dat galaxy-dist/database/files/000/dataset_195.dat galaxy.jobs.runners.pbs DEBUG 2011-02-03 05:17:09,880 (151) pbs_submit failed, PBS error 15031: Protocol (ASN.1) error galaxy.jobs DEBUG 2011-02-03 05:17:13,363 job 150 ended galaxy.jobs ERROR 2011-02-03 05:17:15,816 Unable to cleanup job 152
Hi Sonali,
I am pretty sure this problem is somehow specific to the TORQUE setup, and I see you also posted this to the torquedev list, but unfortunately received no response.
I am not sure what is up here, but you may want to try adjusting the 'tcp_timeout' server setting. (qmgr -c 'set server tcp_timeout = X')
Also, you may want to see if PBS is making an attempt to queue this job on a particular node, and if so, check the mom_logs for that node.
--nate
Any help/pointer would be appreciated for this issue. Thank you very much for your time Nate.
Regards, Sonali
To manage your subscriptions to this and other Galaxy lists, please use the interface at: