Re: [galaxy-dev] AttributeError: type object 'InvalidJobException' has no attribute 'name'

10 Oct 2013

On Tue, Oct 8, 2013 at 5:03 PM, Adhemar <azneto@gmail.com> wrote:
...
Hi,
After the last update I'm getting the following error.
The job is submitted to SGE e executed, but galaxy doesn't get the result
and keeps showing the job is executing (yellow box).
Any clues?
Thanks,
Adhemar
galaxy.jobs.runners ERROR 2013-10-08 13:01:18,488 Unhandled exception
checking active jobs
Traceback (most recent call last):
  File
"/opt/bioinformatics/share/galaxy20130410/lib/galaxy/jobs/runners/__init__.py",
line 362, in monitor
    self.check_watched_items()
  File
"/opt/bioinformatics/share/galaxy20130410/lib/galaxy/jobs/runners/drmaa.py",
line 217, in check_watched_items
    log.warning( "(%s/%s) job check resulted in %s: %s", galaxy_id_tag,
external_job_id, e.__class__.name, e )
AttributeError: type object 'InvalidJobException' has no attribute 'name'
Same here, running galaxy-central with an SGE cluster (actually UGE
but the same DRMAA wrapper etc) when cancelling several jobs via
qdel at the command line:

Galaxy.jobs.runners ERROR 2013-10-10 15:16:35,731 Unhandled exception
checking active jobs
Traceback (most recent call last):
  File "/mnt/galaxy/galaxy-central/lib/galaxy/jobs/runners/__init__.py",
line 362, in monitor
    self.check_watched_items()
  File "/mnt/galaxy/galaxy-central/lib/galaxy/jobs/runners/drmaa.py",
line 217, in check_watched_items
    log.warning( "(%s/%s) job check resulted in %s: %s",
galaxy_id_tag, external_job_id, e.__class__.name, e )
AttributeError: type object 'InvalidJobException' has no attribute 'name'

$ hg branch
default
[galaxy@ppserver galaxy-central]$ hg heads | more
changeset:   11871:c8b55344e779
tag:         tip
user:        Ross Lazarus <ross.lazarus@gmail.com>
date:        Tue Oct 08 16:30:54 2013 +1100
summary:     Proper removal of rgenetics deprecated tool wrappers

changeset:   11818:1f0e7ae9e324
branch:      stable
parent:      11761:a477486bf18e
user:        Daniel Blankenberg <dan@bx.psu.edu>
date:        Sun Sep 29 16:04:31 2013 +1000
summary:     Add additional check and slice to _sniffnfix_pg9_hex().
Fixes issue seen when attempting to view saved visualizations. Further
investigation may be needed.
...

Killing Galaxy and restarting didn't fix this, the errors persist.
I tried this fix to solve the attribute error in the logging call:

$ hg diff /mnt/galaxy/galaxy-central/lib/galaxy/jobs/runners/drmaa.py
diff -r c8b55344e779 lib/galaxy/jobs/runners/drmaa.py

--- a/lib/galaxy/jobs/runners/drmaa.py    Tue Oct 08 16:30:54 2013 +1100
+++ b/lib/galaxy/jobs/runners/drmaa.py    Thu Oct 10 15:21:56 2013 +0100
@@ -214,7 +214,10 @@
                 state = self.ds.jobStatus( external_job_id )
             # TODO: probably need to keep track of
InvalidJobException count and remove after it exceeds some
configurable
             except ( drmaa.DrmCommunicationException,
drmaa.InternalException, drmaa.InvalidJobException ), e:
-                log.warning( "(%s/%s) job check resulted in %s: %s",
galaxy_id_tag, external_job_id, e.__class__.name, e )
+                if hasattr(e.__class__, "name"):
+                    log.warning( "(%s/%s) job check resulted in %s:
%s", galaxy_id_tag, external_job_id, e.__class__.name, e )
+                else:
+                    log.warning( "(%s/%s) job check resulted in: %s",
galaxy_id_tag, external_job_id, e )
                 new_watched.append( ajs )
                 continue
             except Exception, e:


Now I get lots of these lines instead:

galaxy.jobs.runners.drmaa WARNING 2013-10-10 15:22:16,489 (251/11372)
job check resulted in: code 18: The job specified by the 'jobid' does
not exist.
galaxy.jobs.runners.drmaa WARNING 2013-10-10 15:22:16,533 (252/11373)
job check resulted in: code 18: The job specified by the 'jobid' does
not exist.
galaxy.jobs.runners.drmaa WARNING 2013-10-10 15:22:17,580 (253/11374)
job check resulted in: code 18: The job specified by the 'jobid' does
not exist.
galaxy.jobs.runners.drmaa WARNING 2013-10-10 15:22:17,624 (254/11375)
job check resulted in: code 18: The job specified by the 'jobid' does
not exist.
galaxy.jobs.runners.drmaa WARNING 2013-10-10 15:22:17,668 (255/11376)
job check resulted in: code 18: The job specified by the 'jobid' does
not exist.
galaxy.jobs.runners.drmaa WARNING 2013-10-10 15:22:17,712 (256/11377)
job check resulted in: code 18: The job specified by the 'jobid' does
not exist.
(this seems to repeat, endlessly)

I manually killed the jobs from the Galaxy history, and restarted
Galaxy again. That seemed to fix this.

If the DRMAA layer says the job was invalid (which is what I am
assuming InvalidJobException means) then surely it failed?
Perhaps something like this (untested)?

$ hg diff /mnt/galaxy/galaxy-central/lib/galaxy/jobs/runners/drmaa.py
diff -r c8b55344e779 lib/galaxy/jobs/runners/drmaa.py
--- a/lib/galaxy/jobs/runners/drmaa.py    Tue Oct 08 16:30:54 2013 +1100
+++ b/lib/galaxy/jobs/runners/drmaa.py    Thu Oct 10 15:27:28 2013 +0100
@@ -213,10 +213,15 @@
                 assert external_job_id not in ( None, 'None' ),
'(%s/%s) Invalid job id' % ( galaxy_id_tag, external_job_id )
                 state = self.ds.jobStatus( external_job_id )
             # TODO: probably need to keep track of
InvalidJobException count and remove after it exceeds some
configurable
-            except ( drmaa.DrmCommunicationException,
drmaa.InternalException, drmaa.InvalidJobException ), e:
+            except ( drmaa.DrmCommunicationException,
drmaa.InternalException ), e:
                 log.warning( "(%s/%s) job check resulted in %s: %s",
galaxy_id_tag, external_job_id, e.__class__.name, e )
                 new_watched.append( ajs )
                 continue
+            except drmaa.InvalidJobException, e:
+                log.warning( "(%s/%s) job check resulted in: %s",
galaxy_id_tag, external_job_id, e )
+                ajs.fail_message = str(e)
+                self.work_queue.put( ( self.fail_job, ajs ) )
+                continue
             except Exception, e:
                 # so we don't kill the monitor thread
                 log.exception( "(%s/%s) Unable to check job status:
%s" % ( galaxy_id_tag, external_job_id, str( e ) ) )

Peter

    

Re: [galaxy-dev] AttributeError: type object 'InvalidJobException' has no attribute 'name'

Peter Cock