Re: [galaxy-dev] Galaxy Hang after DrmCommunicationException

23 Jan 2012

      On Jan 20, 2012, at 7:31 PM, Edward Kirton wrote:
...
yes, nate but that fails the job but it is, in fact, still running and
the error should be ignored
except Exception, e:
               # so we don't kill the monitor thread
               log.exception("(%s/%s) Unable to check job status" % (
galaxy_job_id, job_id ) )
               log.warning("(%s/%s) job will now be errored" % (
galaxy_job_id, job_id ) )
               drm_job_state.fail_message = "Cluster could not complete job"
               self.work_queue.put( ( 'fail', drm_job_state ) )
               continue
I was curious why Ann's DrmCommunicationException appeared to be uncaught.  I see now I made a mistake in reading, it was caught and then printed via log.exception().

Okay, I applied your catch in 6578:84ee6eeedb41.  Thanks!

--nate
...
On Fri, Jan 20, 2012 at 9:40 AM, Nate Coraor <nate@bx.psu.edu> wrote:
...
Hi Ann,
The cause of the exception aside, this should be caught by the except block below it in drmaa.py (in check_watched_items()):
except Exception, e:
               # so we don't kill the monitor thread
               log.exception("(%s/%s) Unable to check job status" % ( galaxy_job_id, job_id ) )
What changeset are you running?
--nate