We have updated our galaxy to 20 Dec 2012 release. Recently we found that some submitted jobs could not start (stay gray forever).
We found that it was caused by the job manager sent jobs to a handler (handler0) whose python process crashed and died.
galaxy.jobs.handler DEBUG 2013-01-18 15:00:34,481 Stopping job 3032:
galaxy.jobs.handler DEBUG 2013-01-18 15:00:34,481 stopping job 3032 in drmaa runner
We restarted the galaxy, handler0 is up for few seconds then died again with the same error messages except the job number moved to the next one.
We observed that the jobs it was trying to stop are all previous jobs whose status is either "deleted" or "deleted_new".
We have never seen this in the past, so wondering if there is bugs in the new release?
Cheers,
Derrick