Hi all, I work with Jelle and want to add on the issue. Help would be *greatly* appreciated as this is a *major* stopper on our production server right now. In the database ‘workflow_invocation' table, one can see a ’state’ column with values like ’scheduled’ or ‘failed’. Before december 18, I only see the values ’scheduled’ or ‘failed’. After this date, a new state appeared : ’new’ . And this is always associated to handler 1 (would have 2 job handlers i.e. ‘0’ and ‘1’). As time goes on, we can see a mix of ’new’ and ‘scheduled’ state with more and more ‘new’ and from Jan 4 it is only ’new’ (only for handler ‘1') This sounds like all workflows being assigned to handler1 never get into the ‘scheduled’ mode and then jobs are never created. I have 269 entries in the ‘workflow_invocation’ table with ’new’ state and restarting the job handlers has no impact anymore (used to work a few days ago) How can I fix this ? Thank for your help Charles
On 5 Jan 2016, at 11:29, Jelle Scholtalbers <j.scholtalbers@gmail.com> wrote:
Hi all,
On our installation (v15.07) we suddenly see that one of two job handlers get stuck with a high cpu load (last message generally, `cleaning up external metadata files`) without new messages appearing. In addition, when running workflows in batch (>6x), only a few of them (~3) get their workflow steps/jobs scheduled (LSF-DRMAA). For the remaining 3, their new histories are created but remain empty (according to the GUI). Only upon restart of the two job handlers the remaining workflow steps are scheduled and shown in the history.
First question, how do we resolve this issue? Second, how does this actually work? How are the workflow steps stored in the database i.e. why are they not shown in the web interface until they are processed by a handler?
Possible relevant config settings: [server:handler0] use_threadpool = true threadpool_workers = 5
[server:handler1] use_threadpool = true threadpool_workers = 5
[app:main] force_beta_workflow_scheduled_min_steps=1 force_beta_workflow_scheduled_for_collections=True track_jobs_in_database = True enable_job_recovery = True retry_metadata_internally = False cache_user_job_count = True # only a limit set for the very few local tools like upload
Cheers,
Jelle ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/