drmaa job status

15 Jun 2017

      Dear list,

I have two question for all DRMAA users. Here is the first one.

I was checking how our queuing system (univa GridEngine) and Galaxy 
react if jobs are submitted that exceed run time or memory limits.

I found out that the python drmaa library cannot query the job status 
after the job is finished (for both successful and unsuccessful jobs).

In lib/galaxy/jobs/runners/drmaa.py the call gives an exception
     self.ds.job_status( external_job_id )

Is this always the case? Or might this be a problem with our GridEngine?

I have attached some code for testing. Here the first call to 
s.jobStatus(jobid) works, but the second after s.wait(...) doesn't.
But I get "drmaa.errors.InvalidJobException: code 18: The job specified 
by the 'jobid' does not exist."

The same error pops up in the Galaxy logs. The consequence is that jobs 
that reached the limits are shown as completed successfully in Galaxy.

Interestingly, quite a bit of information can be obtained from the 
return value of s.wait. I was wondering if this can be used to 
differentiate successful from failed jobs. In particular hasExited, 
hasSignal, and terminateSignal are different in the two cases.

Cheers,
Matthias

Matthias Bernt

Nate Coraor

Matthias Bernt

Hendrickson, Curtis (Campus)

tags

participants (3)