Re: [galaxy-dev] Not-So-Running Jobs

22 Apr 2009

      Thank you for the explanation.

Continuing my limbo-jobs problems:
...
...
The status is "running", but the command line is empty, and no program 
was executed for this job (I checked with "ps ax -H" and looked for 
python's child-processes).
Two config options could be the cause here:
  local_job_queue_workers (default: 5)
The local workers are the threads available for actually running jobs. 
To facilitate the ability for job tracking to occur in the database, 
jobs are moved to the 'running' state before execution.  At that point, 
if there are not enough threads available, they may sit in the local job 
runner's queue until a thread becomes available.
My Galaxy uses local scheduler with Round Robin policy ( and 5 local job 
queue workers).

The "Unfinished jobs" report page shows 5 running jobs (really running 
with command line) and several other "limbo-running" jobs (and tons of 
"new" jobs).

The problem is that the galaxy python process has only 4 child-processes 
(instead of the expected 5).

I double checked by grepping for the command line that the "unfinished 
jobs" page shows - it doesn't exists in the processes list ($ ps ax -H).

So it appears galaxy missed the termination of the job, and one queue 
worker will be forever lost.
The only hint I have regarding this is that it was a long running job, 
and the user canceled it before it was completed (I actually can't tell 
if it was executed or just limbo-running).

Is there a way to release the queue worker (besides restarting galaxy?)

Thanks,
    Gordon.

Re: [galaxy-dev] Not-So-Running Jobs

Assaf Gordon