Re: [galaxy-dev] Job handler keeps crashing

21 Jan 2013

      I had a close look at the code in

galaxy-dist / lib / galaxy / jobs / handler.py
galaxy-dist / lib / galaxy / jobs / runners / drmaa.py

and found that stopping "deleted" and "deleted_new" seems to be normal
routine for the job handler. Could not find any exception that caused the
shutdown.

I do notice in the galaxy-dist on bitbucket, there is one commit with
comment "Fix shutdown on python >= 2.6.2 by calling setDaemon when creating
threads (these are still...", it seems to be relevant?

I will do the update to 11 Jan release and see if it fixes the issue.

D

On Fri, Jan 18, 2013 at 4:03 PM, Derrick Lin <klin938@gmail.com> wrote:
...
Hi guys,
We have updated our galaxy to 20 Dec 2012 release. Recently we found that
some submitted jobs could not start (stay gray forever).
We found that it was caused by the job manager sent jobs to a handler
(handler0) whose python process crashed and died.
From the handler log we found the last messages right before the crash:
galaxy.jobs.handler DEBUG 2013-01-18 15:00:34,481 Stopping job 3032:
galaxy.jobs.handler DEBUG 2013-01-18 15:00:34,481 stopping job 3032 in
drmaa runner
We restarted the galaxy, handler0 is up for few seconds then died again
with the same error messages except the job number moved to the next one.
We observed that the jobs it was trying to stop are all previous jobs
whose status is either "deleted" or "deleted_new".
We have never seen this in the past, so wondering if there is bugs in the
new release?
Cheers,
Derrick