Galaxy is failing due to a segfault in libdrmaa
[9116874.391434] python[5211]: segfault at 0 ip 00007fcb9fd8ae62 sp 00007fcb9affe490 error 4 in libdrmaa.so.1.0[7fcb9fc29000+1b9000]

I first started observing this in the last few weeks. After the first event I pulled in this changeset
Handle invalid job ids in the drmaa runner.


but I'm still seeing the segfault.

I think this is some correlated log information from before the patch…

Error - <type 'exceptions.UnboundLocalError'>: local variable 'job' referenced before assignment
URL: http://galaxy.neb.com/datasets/c3d98ec09a23e847/show_params
File '/mnt/ngswork/galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/exceptions/errormiddleware.py', line 143 in __call__
  app_iter = self.application(environ, start_response)
File '/mnt/ngswork/galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/recursive.py', line 80 in __call__
  return self.application(environ, start_response)
File '/mnt/ngswork/galaxy/galaxy-dist/lib/galaxy/web/framework/middleware/remoteuser.py', line 91 in __call__
  return self.app( environ, start_response )
File '/mnt/ngswork/galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpexceptions.py', line 632 in __call__
  return self.application(environ, start_response)
File '/mnt/ngswork/galaxy/galaxy-dist/lib/galaxy/web/framework/base.py', line 160 in __call__
  body = method( trans, **kwargs )
File '/mnt/ngswork/galaxy/galaxy-dist/lib/galaxy/webapps/galaxy/controllers/dataset.py', line 1025 in show_params
  return trans.fill_template( "show_params.mako", inherit_chain=inherit_chain, history=trans.get_history(), hda=hda, job=job, tool=tool, params_objects=params_objects )
UnboundLocalError: local variable 'job' referenced before assignment

after applying 4a95ae9
I see this


galaxy.jobs.handler DEBUG 2012-12-05 10:34:20,968 Stopping job 25519:
galaxy.jobs.handler DEBUG 2012-12-05 10:34:20,971 stopping job 25519 in drmaa runner
galaxy.jobs.runners.drmaa DEBUG 2012-12-05 10:34:20,983 (25519/22378) User killed running job, but it was already dead
172.17.121.186 - - [05/Dec/2012:10:34:19 -0400] "GET /datasets/414fa4e8d28bb2be/delete_async HTTP/1.1" 200 - "http://galaxy.neb.com/history?status=done&show_deleted=False&filename=None&dataset_id=6152b5966ba797a7" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; InfoPath.2; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
galaxy.jobs.handler INFO 2012-12-05 10:34:21,073 (25520) Job unable to run: one or more inputs deleted
galaxy.jobs.handler DEBUG 2012-12-05 10:34:22,251 Stopping job 25520:
galaxy.jobs.handler DEBUG 2012-12-05 10:34:22,253 stopping job 25520 in drmaa runner


Any ideas?



Brad


--
Brad Langhorst