Thank you for the explanation. Continuing my limbo-jobs problems:
The status is "running", but the command line is empty, and no program was executed for this job (I checked with "ps ax -H" and looked for python's child-processes).
Two config options could be the cause here: local_job_queue_workers (default: 5)
The local workers are the threads available for actually running jobs. To facilitate the ability for job tracking to occur in the database, jobs are moved to the 'running' state before execution. At that point, if there are not enough threads available, they may sit in the local job runner's queue until a thread becomes available.
My Galaxy uses local scheduler with Round Robin policy ( and 5 local job queue workers). The "Unfinished jobs" report page shows 5 running jobs (really running with command line) and several other "limbo-running" jobs (and tons of "new" jobs). The problem is that the galaxy python process has only 4 child-processes (instead of the expected 5). I double checked by grepping for the command line that the "unfinished jobs" page shows - it doesn't exists in the processes list ($ ps ax -H). So it appears galaxy missed the termination of the job, and one queue worker will be forever lost. The only hint I have regarding this is that it was a long running job, and the user canceled it before it was completed (I actually can't tell if it was executed or just limbo-running). Is there a way to release the queue worker (besides restarting galaxy?) Thanks, Gordon.