Assaf Gordon wrote:
My Galaxy uses local scheduler with Round Robin policy ( and 5 local
The "Unfinished jobs" report page shows 5 running jobs (really running
with command line) and several other "limbo-running" jobs (and tons of
New jobs are dependent on other, running jobs and some backup can be
expected (especially if waiting on early steps in a workflow).
The problem is that the galaxy python process has only 4
(instead of the expected 5).
I double checked by grepping for the command line that the "unfinished
jobs" page shows - it doesn't exists in the processes list ($ ps ax -H).
So it appears galaxy missed the termination of the job, and one queue
worker will be forever lost.
The only hint I have regarding this is that it was a long running job,
and the user canceled it before it was completed (I actually can't tell
if it was executed or just limbo-running).
It's possible that the job is still running its finish method (perhaps a
new 'finishing' state is also in order). This can be a lengthy process
for large datasets where setting metadata is complex.
Is there a way to release the queue worker (besides restarting