On 8/8/12 4:06 PM, "Nate Coraor" <nate(a)bx.psu.edu> wrote:
On Aug 8, 2012, at 2:30 PM, Karger, Amir wrote:
> Meanwhile, we're able to restart, and get happy log messages from the
>jobrunner and two web "servers" (two servers running on different ports
>of a Tomcat host). And I can do an upload, which runs locally. But when
>I try to do a blast, which is supposed to submit to the cluster (and ran
>just fine on our old install), it hangs and never starts. I would think
>the database is working OK, since it shows me new history items when I
>upload and stuff. The web Galaxy log shows that I went to the tool page,
>and then has a ton of loads to root/history_item_updates, but nothing
>else. The job handler Galaxy log has nothing since the PID messages when
>the server started up most recently.
> A quick search of the archives didn't find anything obvious. (I don't
>have any obvious words to search for.) Any thoughts about where I should
>start looking to track this down?
If you aren't setting job_manager and job_handlers in your config, each
server will consider itself the manager and handler. If not configured
to run jobs, this may result in jobs failing to run. I'd suggest
explicitly defining a manager and handlers.
Sigh. We have both job_manager and job_handlers set to the same server.
It seems like our runner app may be getting into some kind of sleeping
state. I was unable to upload a file, which had worked before. However,
when I restarted the runner, it picked up the upload job and successfully
uploaded it AND picked up the previously queued tab2fasta job, and I
believe completed it successfully too. (There's an error due to a missing
filetype, which I guess makes stderr non-empty and makes Galaxy think it
was unsuccessful. But I can confirm that the job was in fact run on our
cluster.) Running paster.py ... --status claims that the process is still
running. So what would make the runner go to "sleep" like that and how do
I stop it from happening?