Thanks for looking into this John. I'm pretty sure there are multiple reasons for stuck jobs. (In fact I'm running into another I'll discuss separately, likely in a Trello ticket). But it turns out in my most recent run-in with this issue, the problem turned out to be caused by datasets that were marked as deleted in the dataset table, but not marked as deleted in the history dataset association table, and thus were used as inputs to jobs. The following query fixed the stuck jobs: update dataset set deleted = 'f' and purgable = 'f' where id in (select distinct(d.id) from dataset d join history_dataset_association hda on d.id = hda.dataset_id join job_to_input_dataset jtid on hda.id = jtid.dataset_id join job j on jtid.job_id = j.id where d.deleted = 't' and hda.deleted = 'f' and j.state = 'new'); However, there seem to many instances where I find datasets marked as deleted but where the history dataset association is not marked as deleted. I'm wary of updating them all without knowing how they got set this way (or even if this is sometimes an appropriate state). Is this ever a valid state for a dataset? BTW: There is also a discussion at https://biostar.usegalaxy.org/p/9608/ about this. Lance John Chilton wrote:
Hello Lance,
I cannot think of a good way to rescue these jobs. If you are curious about the code where jobs are selected for execution - I would check out the job handler (lib/galaxy/jobs/handler.py) - see __monitor_step for instance.
It seems like to prevent this from happening in the future - we should only allow copying datasets from libraries into histories if the the library dataset is in an 'OK' state (https://trello.com/c/0vxbP4El).
-John
On Thu, Nov 6, 2014 at 11:13 AM, Lance Parsons<lparsons@princeton.edu> wrote:
I'v run into this same issue again (just with some other Data Library datasets). This time, there are a few users involved with quite a few "stuck" jobs. Does anyone have any advice on pushing these jobs through? Maybe even a pointer to the relevant code? I'm running latest_2014.08.11. Thanks in advance.
Lance
Lance Parsons wrote:
Thanks, that was the first thing I checked. However, restarting the handler didn't help. Downloading the offending data and re-uploading as a new data set and then rerunning using the new dataset as input did work. Also, all other jobs continued to run fine.
Lance
Kandalaft, Iyad wrote:
I’ve had jobs get stuck in the new state when one of the handler servers crashes. If you have dedicated handlers, check to make sure they are still running.
Restart the handler to see if the jobs get resumed automatically.
Iyad Kandalaft
From: galaxy-dev-bounces@lists.bx.psu.edu [mailto:galaxy-dev-bounces@lists.bx.psu.edu] On Behalf Of Aaron Petkau Sent: Wednesday, October 01, 2014 5:32 PM To: Lance Parsons Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Jobs stuck in "new" state - Data Library datasets to blame?
Are you attempting to upload datasets to a Data Library, and then copy to a history and run jobs on them right away? I've run into issues before where if I attempt to run a job on a dataset in a library before it is finished being uploaded and processed, then the job gets stuck in a queued state and never executes.
Aaron
On Wed, Oct 1, 2014 at 2:51 PM, Lance Parsons<lparsons@princeton.edu> wrote:
Recently, I updated our Galaxy instance to use two processes (one for web, the other as a job handler). This has been working well, except in a few cases. I've noticed that a number of jobs get stuck in the "new" status.
In a number of cases, I've resolved the issue by downloading and uploading one of the input files and rerunning the job using the newly uploaded file. In at least one of these cases, the offending input file was one that was copied from a Data Library.
Can anyone point me to something to look for in the database, etc. that would cause a job to think a dataset was not ready for use as a job input? I'd very much like to fix these datasets since having to re-upload data libraries would be very tedious.
Thanks in advance.
-- Lance Parsons - Scientific Programmer 134 Carl C. Icahn Laboratory Lewis-Sigler Institute for Integrative Genomics Princeton University
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Lance Parsons - Scientific Programmer 134 Carl C. Icahn Laboratory Lewis-Sigler Institute for Integrative Genomics Princeton University
-- Lance Parsons - Scientific Programmer 134 Carl C. Icahn Laboratory Lewis-Sigler Institute for Integrative Genomics Princeton University
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Lance Parsons - Scientific Programmer 134 Carl C. Icahn Laboratory Lewis-Sigler Institute for Integrative Genomics Princeton University