Re: [galaxy-dev] Shell script to start Galaxy in multi-server environment

16 Aug 2012

      On 8/8/12 4:06 PM, "Nate Coraor" <nate@bx.psu.edu> wrote:
...
On Aug 8, 2012, at 2:30 PM, Karger, Amir wrote:
...
...
Meanwhile, we're able to restart, and get happy log messages from the
jobrunner and two web "servers" (two servers running on different ports
of a Tomcat host). And I can do an upload, which runs locally. But when
I try to do a blast, which is supposed to submit to the cluster (and ran
just fine on our old install), it hangs and never starts. I would think
the database is working OK, since it shows me new history items when I
upload and stuff. The web Galaxy log shows that I went to the tool page,
and then has a ton of loads to root/history_item_updates, but nothing
else. The job handler Galaxy log has nothing since the PID messages when
the server started up most recently.
A quick search of the archives didn't find anything obvious. (I don't
have any obvious words to search for.) Any thoughts about where I should
start looking to track this down?
Hi Amir,
If you aren't setting job_manager and job_handlers in your config, each
server will consider itself the manager and handler.  If not configured
to run jobs, this may result in jobs failing to run.  I'd suggest
explicitly defining a manager and handlers.
--nate
Sigh. We have both job_manager and job_handlers set to the same server.

It seems like our runner app may be getting into some kind of sleeping
state. I was unable to upload a file, which had worked before. However,
when I restarted the runner, it picked up the upload job and successfully
uploaded it AND picked up the previously queued tab2fasta job, and I
believe completed it successfully too. (There's an error due to a missing
filetype, which I guess makes stderr non-empty and makes Galaxy think it
was unsuccessful. But I can confirm that the job was in fact run on our
cluster.) Running paster.py ... --status claims that the process is still
running. So what would make the runner go to "sleep" like that and how do
I stop it from happening?

Thanks,

-Amir