Re: [galaxy-dev] Stalled upload jobs under "Admin", "Manage jobs"

16 Mar 2012

      On Mon, Feb 13, 2012 at 5:02 PM, Nate Coraor <nate@bx.psu.edu> wrote:
...
On Feb 10, 2012, at 6:47 AM, Peter Cock wrote:
...
Hello all,
I've noticed we have about a dozen stalled upload jobs on our server
from several users. e.g.
Job ID        User    Last Update     Tool    State   Command Line    Job Runner      PID/Cluster ID
2352  xxxx    21 hours ago    upload1         upload  None    None    None
...
2339  yyyy    19 hours ago    upload1         upload  None    None    None
The job numbers are consecutive (2339 to 2352) and reflect a problem
for a couple of hours yesterday morning. I believe this was due to the
underlying file system being unmounted (without restarting Galaxy),
and at the time restarting Galaxy fixed uploading files. Test jobs
since then have completed normally - but these zombie jobs remain.
Using the "Stop jobs" option does not clear these dead upload jobs.
Restarting the Galaxy server does not clear them either.
This is our production server and was running galaxy-dist, changeset
5743:720455407d1c - which I have now updated to the current release,
6621:26920e20157f - which makes no difference to these stalled jobs.
Does anyone have any insight into what might be wrong, and how to get
rid of these zombie tasks?
Hi Peter,
Are you using the nginx upload module?
There's no way to fix these from within Galaxy, unfortunately.
You'll have to update them in the database.
--nate
Hi Nate,

Sorry for the delay - I must have missed your reply.

No, we're not using nginx here.

What should I edit in the database? Presumably rather than deleting
these jobs I should set the state to finished with error?

(Is there any documentation about the Galaxy database schema,
and the values of fields in it - or is that all considered to be an
internal detail?)

Thanks,

Peter