Hi List,
I'm having this same issue - if I dispatch a job to dynamic job runner, and it then hands the job to pbs, then it will flip out if I restart the server. In my case, though, there is no error in the job state - it is sitting in the Torque queue where it should
be. As far as I can tell, galaxy successfully reloads the job back into the history. But it falters on recovering this destination for pbs.
If I delete the queued jobs, it quiets down, but this is not a sustainable solution - I need to be able to reboot the server without all of my users' cluster jobs causing faults every second.
Log:
galaxy.jobs.runners.pbs DEBUG 2013-10-04 15:50:32,454 Set default PBS server to m1.mason.indiana.edu
galaxy.jobs.runners DEBUG 2013-10-04 15:50:32,455 Starting 4 PBSRunner workers
galaxy.jobs DEBUG 2013-10-04 15:50:32,459 Loaded job runner 'galaxy.jobs.runners.pbs:PBSJobRunner' as 'pbs'
galaxy.jobs.handler DEBUG 2013-10-04 15:50:32,459 Loaded job runners plugins: local:pbs
galaxy.jobs.handler INFO 2013-10-04 15:50:32,460 job handler stop queue started
galaxy.jobs DEBUG 2013-10-04 15:50:32,478 (514) Working directory for job is: /N/dc/projects/galaxy/job_working_directory/000/514
galaxy.jobs.handler DEBUG 2013-10-04 15:50:32,478 recovering job 514 in pbs runner
galaxy.jobs WARNING 2013-10-04 15:50:32,478 (514) Job runner URLs are deprecated, use destinations instead.
galaxy.jobs.runners.pbs DEBUG 2013-10-04 15:50:32,479 (514/176938.m1.mason) is still in PBS queued state, adding to the PBS queue
galaxy.jobs.handler INFO 2013-10-04 15:50:32,485 job handler queue started
...
galaxy.jobs.runners ERROR 2013-10-04 15:50:33,456 Unhandled exception checking active jobs
Traceback (most recent call last):
File "/N/hd03/galaxy/Mason/galaxy-ncgas/lib/galaxy/jobs/runners/__init__.py", line 362, in monitor
self.check_watched_items()
File "/N/hd03/galaxy/Mason/galaxy-ncgas/lib/galaxy/jobs/runners/pbs.py", line 385, in check_watched_items
( failures, statuses ) = self.check_all_jobs()
File "/N/hd03/galaxy/Mason/galaxy-ncgas/lib/galaxy/jobs/runners/pbs.py", line 466, in check_all_jobs
pbs_server_name = self.__get_pbs_server(pbs_job_state.job_destination.params)
File "/N/hd03/galaxy/Mason/galaxy-ncgas/lib/galaxy/jobs/runners/pbs.py", line 222, in __get_pbs_server
return job_destination_params['destination'].split('@')[-1]
KeyError: 'destination'
Thanks so much for wisdom on this.
Sincerely,
Carrie Ganote