Galaxy is failing due to a segfault in libdrmaa
[9116874.391434] python[5211]: segfault at 0 ip 00007fcb9fd8ae62 sp 00007fcb9affe490 error 4 in libdrmaa.so.1.0[7fcb9fc29000+1b9000]
I first started observing this in the last few weeks. After the first event I pulled in this changeset
|
Handle invalid job ids in the drmaa runner.
|
but I'm still seeing the segfault.
I think this is some correlated log information from before the patch…
Error - <type 'exceptions.UnboundLocalError'>: local variable 'job' referenced before assignment
File '/mnt/ngswork/galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/exceptions/errormiddleware.py', line 143 in __call__
app_iter = self.application(environ, start_response)
File '/mnt/ngswork/galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/recursive.py', line 80 in __call__
return self.application(environ, start_response)
File '/mnt/ngswork/galaxy/galaxy-dist/lib/galaxy/web/framework/middleware/remoteuser.py', line 91 in __call__
return self.app( environ, start_response )
File '/mnt/ngswork/galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpexceptions.py', line 632 in __call__
return self.application(environ, start_response)
File '/mnt/ngswork/galaxy/galaxy-dist/lib/galaxy/web/framework/base.py', line 160 in __call__
body = method( trans, **kwargs )
File '/mnt/ngswork/galaxy/galaxy-dist/lib/galaxy/webapps/galaxy/controllers/dataset.py', line 1025 in show_params
return trans.fill_template( "show_params.mako", inherit_chain=inherit_chain, history=trans.get_history(), hda=hda, job=job, tool=tool, params_objects=params_objects )
UnboundLocalError: local variable 'job' referenced before assignment
after applying 4a95ae9
I see this
galaxy.jobs.handler DEBUG 2012-12-05 10:34:20,968 Stopping job 25519:
galaxy.jobs.handler DEBUG 2012-12-05 10:34:20,971 stopping job 25519 in drmaa runner
galaxy.jobs.runners.drmaa DEBUG 2012-12-05 10:34:20,983 (25519/22378) User killed running job, but it was already dead
galaxy.jobs.handler INFO 2012-12-05 10:34:21,073 (25520) Job unable to run: one or more inputs deleted
galaxy.jobs.handler DEBUG 2012-12-05 10:34:22,251 Stopping job 25520:
galaxy.jobs.handler DEBUG 2012-12-05 10:34:22,253 stopping job 25520 in drmaa runner
Any ideas?
Brad