segfault in libdrmaa -> galaxy front end failure
Galaxy is failing due to a segfault in libdrmaa [9116874.391434] python[5211]: segfault at 0 ip 00007fcb9fd8ae62 sp 00007fcb9affe490 error 4 in libdrmaa.so.1.0[7fcb9fc29000+1b9000] I first started observing this in the last few weeks. After the first event I pulled in this changeset 4a95ae9<https://bitbucket.org/galaxy/galaxy-central/commits/4a95ae9a26d96f0dc9a0fe3b083a2c7b99b0466b> Handle invalid job ids in the drmaa runner. but I'm still seeing the segfault. I think this is some correlated log information from before the patch… Error - <type 'exceptions.UnboundLocalError'>: local variable 'job' referenced before assignment URL: http://galaxy.neb.com/datasets/c3d98ec09a23e847/show_params File '/mnt/ngswork/galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/exceptions/errormiddleware.py', line 143 in __call__ app_iter = self.application(environ, start_response) File '/mnt/ngswork/galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/recursive.py', line 80 in __call__ return self.application(environ, start_response) File '/mnt/ngswork/galaxy/galaxy-dist/lib/galaxy/web/framework/middleware/remoteuser.py', line 91 in __call__ return self.app( environ, start_response ) File '/mnt/ngswork/galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpexceptions.py', line 632 in __call__ return self.application(environ, start_response) File '/mnt/ngswork/galaxy/galaxy-dist/lib/galaxy/web/framework/base.py', line 160 in __call__ body = method( trans, **kwargs ) File '/mnt/ngswork/galaxy/galaxy-dist/lib/galaxy/webapps/galaxy/controllers/dataset.py', line 1025 in show_params return trans.fill_template( "show_params.mako", inherit_chain=inherit_chain, history=trans.get_history(), hda=hda, job=job, tool=tool, params_objects=params_objects ) UnboundLocalError: local variable 'job' referenced before assignment after applying 4a95ae9 I see this galaxy.jobs.handler DEBUG 2012-12-05 10:34:20,968 Stopping job 25519: galaxy.jobs.handler DEBUG 2012-12-05 10:34:20,971 stopping job 25519 in drmaa runner galaxy.jobs.runners.drmaa DEBUG 2012-12-05 10:34:20,983 (25519/22378) User killed running job, but it was already dead 172.17.121.186 - - [05/Dec/2012:10:34:19 -0400] "GET /datasets/414fa4e8d28bb2be/delete_async HTTP/1.1" 200 - "http://galaxy.neb.com/history?status=done&show_deleted=False&filename=None&dataset_id=6152b5966ba797a7" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; InfoPath.2; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)" galaxy.jobs.handler INFO 2012-12-05 10:34:21,073 (25520) Job unable to run: one or more inputs deleted galaxy.jobs.handler DEBUG 2012-12-05 10:34:22,251 Stopping job 25520: galaxy.jobs.handler DEBUG 2012-12-05 10:34:22,253 stopping job 25520 in drmaa runner Any ideas? Brad -- Brad Langhorst langhorst@neb.com<mailto:langhorst@neb.com>
participants (1)
-
Langhorst, Brad