Shell script to start Galaxy in multi-server environment

Karger, Amir

2 Aug 2012 2 Aug '12

11:56 p.m.

We're upgrading to a late June Galaxy from a last-year Galaxy. We noticed that the docs say you no longer need 2 different .ini files. Great! Unfortunately, the multiprocess.sh in contrib/ still assumes you have multiple .ini files. So the question is, assuming we correctly set up the different web servers, job managers, and job handlers servers in universe_wsgi.ini, what's the command line we should be giving to run Galaxy on each type of server? The wiki Admin/Config pages for Performance and Scaling and Cluster and that sort of thing had some info on editing the .ini, but I didn't see what my .sh should look like there. Pointers to websites, emails, or existing .sh files appreciated. Thanks, -Amir Karger Senior Research Computing Consultant Harvard Medical School Research Computing amir_karger@hms.harvard.edu

Show replies by date

Fields, Christopher J

3 Aug 3 Aug

12:40 a.m.

New subject: Shell script to start Galaxy in multi-server environment

The docs about that are here: http://wiki.g2.bx.psu.edu/Admin/Config/Performance/Web%20Application%20Scali... I am working on combining our separate ini files myself, but ran into some odd problems with cluster job submission that weren't present with the original split files (e.g. the combined one didn't work, but the split one did). I haven't managed to debug that just yet, and won't be able to until I get back from vacation in a week. chris On Aug 2, 2012, at 1:56 PM, "Karger, Amir" <Amir_Karger@hms.harvard.edu> wrote:

...

We're upgrading to a late June Galaxy from a last-year Galaxy. We noticed that the docs say you no longer need 2 different .ini files. Great! Unfortunately, the multiprocess.sh in contrib/ still assumes you have multiple .ini files.

So the question is, assuming we correctly set up the different web servers, job managers, and job handlers servers in universe_wsgi.ini, what's the command line we should be giving to run Galaxy on each type of server? The wiki Admin/Config pages for Performance and Scaling and Cluster and that sort of thing had some info on editing the .ini, but I didn't see what my .sh should look like there. Pointers to websites, emails, or existing .sh files appreciated.

Thanks,

-Amir Karger Senior Research Computing Consultant Harvard Medical School Research Computing amir_karger@hms.harvard.edu

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Nate Coraor

8 Aug 8 Aug

8:15 p.m.

On Aug 2, 2012, at 2:56 PM, Karger, Amir wrote:

...

We're upgrading to a late June Galaxy from a last-year Galaxy. We noticed that the docs say you no longer need 2 different .ini files. Great! Unfortunately, the multiprocess.sh in contrib/ still assumes you have multiple .ini files.

So the question is, assuming we correctly set up the different web servers, job managers, and job handlers servers in universe_wsgi.ini, what's the command line we should be giving to run Galaxy on each type of server? The wiki Admin/Config pages for Performance and Scaling and Cluster and that sort of thing had some info on editing the .ini, but I didn't see what my .sh should look like there. Pointers to websites, emails, or existing .sh files appreciated.

Hi Amir, multiprocess.sh is out of date, so I've removed it from galaxy-central. run.sh can start and stop all of your processes now, as described at: http://wiki.g2.bx.psu.edu/Admin/Config/Performance/Web%20Application%20Scali... --nate

...

Thanks,

-Amir Karger Senior Research Computing Consultant Harvard Medical School Research Computing amir_karger@hms.harvard.edu

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Karger, Amir

11:30 p.m.

New subject: Shell script to start Galaxy in multi-server environment

...

From: Nate Coraor [mailto:nate@bx.psu.edu] On Aug 2, 2012, at 2:56 PM, Karger, Amir wrote:

...
We're upgrading to a late June Galaxy from a last-year Galaxy. We noticed that the docs say you no longer need 2 different .ini files. Great! Unfortunately, the multiprocess.sh in contrib/ still assumes you have multiple .ini files. multiprocess.sh is out of date, so I've removed it from galaxy-central. run.sh can start and stop all of your processes now, as described at:

http://wiki.g2.bx.psu.edu/Admin/Config/Performance/Web%20Application %20Scaling

Thanks. Of course, reading some other people's posts and the wiki, it looks like it's not *required* to merge, just recommended. Which means our existing system of running the different scripts on different hosts should continue to work. We figure we can put off the merge thing for a bit. Meanwhile, we're able to restart, and get happy log messages from the jobrunner and two web "servers" (two servers running on different ports of a Tomcat host). And I can do an upload, which runs locally. But when I try to do a blast, which is supposed to submit to the cluster (and ran just fine on our old install), it hangs and never starts. I would think the database is working OK, since it shows me new history items when I upload and stuff. The web Galaxy log shows that I went to the tool page, and then has a ton of loads to root/history_item_updates, but nothing else. The job handler Galaxy log has nothing since the PID messages when the server started up most recently. A quick search of the archives didn't find anything obvious. (I don't have any obvious words to search for.) Any thoughts about where I should start looking to track this down? Thanks, -Amir

Nate Coraor

9 Aug 9 Aug

1:06 a.m.

On Aug 8, 2012, at 2:30 PM, Karger, Amir wrote:

...

...
From: Nate Coraor [mailto:nate@bx.psu.edu] On Aug 2, 2012, at 2:56 PM, Karger, Amir wrote:

...
We're upgrading to a late June Galaxy from a last-year Galaxy. We noticed that the docs say you no longer need 2 different .ini files. Great! Unfortunately, the multiprocess.sh in contrib/ still assumes you have multiple .ini files. multiprocess.sh is out of date, so I've removed it from galaxy-central. run.sh can start and stop all of your processes now, as described at:

http://wiki.g2.bx.psu.edu/Admin/Config/Performance/Web%20Application %20Scaling

Thanks. Of course, reading some other people's posts and the wiki, it looks like it's not *required* to merge, just recommended. Which means our existing system of running the different scripts on different hosts should continue to work. We figure we can put off the merge thing for a bit.

Meanwhile, we're able to restart, and get happy log messages from the jobrunner and two web "servers" (two servers running on different ports of a Tomcat host). And I can do an upload, which runs locally. But when I try to do a blast, which is supposed to submit to the cluster (and ran just fine on our old install), it hangs and never starts. I would think the database is working OK, since it shows me new history items when I upload and stuff. The web Galaxy log shows that I went to the tool page, and then has a ton of loads to root/history_item_updates, but nothing else. The job handler Galaxy log has nothing since the PID messages when the server started up most recently.

A quick search of the archives didn't find anything obvious. (I don't have any obvious words to search for.) Any thoughts about where I should start looking to track this down?

Hi Amir, If you aren't setting job_manager and job_handlers in your config, each server will consider itself the manager and handler. If not configured to run jobs, this may result in jobs failing to run. I'd suggest explicitly defining a manager and handlers. --nate

...

Thanks,

-Amir

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

Karger, Amir

17 Aug 17 Aug

1:18 a.m.

New subject: Shell script to start Galaxy in multi-server environment

On 8/8/12 4:06 PM, "Nate Coraor" <nate@bx.psu.edu> wrote:

...

On Aug 8, 2012, at 2:30 PM, Karger, Amir wrote:

...
...
Meanwhile, we're able to restart, and get happy log messages from the jobrunner and two web "servers" (two servers running on different ports of a Tomcat host). And I can do an upload, which runs locally. But when I try to do a blast, which is supposed to submit to the cluster (and ran just fine on our old install), it hangs and never starts. I would think the database is working OK, since it shows me new history items when I upload and stuff. The web Galaxy log shows that I went to the tool page, and then has a ton of loads to root/history_item_updates, but nothing else. The job handler Galaxy log has nothing since the PID messages when the server started up most recently.

A quick search of the archives didn't find anything obvious. (I don't have any obvious words to search for.) Any thoughts about where I should start looking to track this down?

Hi Amir,

If you aren't setting job_manager and job_handlers in your config, each server will consider itself the manager and handler. If not configured to run jobs, this may result in jobs failing to run. I'd suggest explicitly defining a manager and handlers.

--nate

Sigh. We have both job_manager and job_handlers set to the same server. It seems like our runner app may be getting into some kind of sleeping state. I was unable to upload a file, which had worked before. However, when I restarted the runner, it picked up the upload job and successfully uploaded it AND picked up the previously queued tab2fasta job, and I believe completed it successfully too. (There's an error due to a missing filetype, which I guess makes stderr non-empty and makes Galaxy think it was unsuccessful. But I can confirm that the job was in fact run on our cluster.) Running paster.py ... --status claims that the process is still running. So what would make the runner go to "sleep" like that and how do I stop it from happening? Thanks, -Amir

Karger, Amir

7:54 p.m.

New subject: Track jobs in database should be True? Re: Shell script to start Galaxy in multi-server environment

On 8/16/12 4:18 PM, "Karger, Amir" <Amir_Karger@hms.harvard.edu> wrote:

...

On 8/8/12 4:06 PM, "Nate Coraor" <nate@bx.psu.edu> wrote:

...
If you aren't setting job_manager and job_handlers in your config, each server will consider itself the manager and handler. If not configured to run jobs, this may result in jobs failing to run. I'd suggest explicitly defining a manager and handlers.

--nate

Sigh. We have both job_manager and job_handlers set to the same server.

It seems like our runner app may be getting into some kind of sleeping state. I was unable to upload a file, which had worked before. However, when I restarted the runner, it picked up the upload job and successfully uploaded it AND picked up the previously queued tab2fasta job, and I believe completed it successfully too.

Replying to myself. The reason the runner was in a "sleep" state is the logic in lib/galaxy/web/config.py says: if ( len( self.job_handlers ) == 1 ) and ( self.job_handlers[0] == self.server_name ) and ( self.job_manager == self.server_name ): self.track_jobs_in_database = False For our dev instance, we have a single server acting as the job manager and the job handler, and we have two web servers also running on the dev box. So Galaxy apparently decides not to track the jobs in the database. However, this means it never finds any jobs to run. When we explicitly set self.track_jobs_in_database to be true in config.py, Galaxy correctly finds and runs jobs. I guess the webapps think that Galaxy *is* tracking jobs in the database, so they put jobs in there that never get pulled out? Or should it actually work when track_jobs_in_database is false, as long as the job manager and job handler(and webapps?) are on the same server. In that case, do we know why it didn't work? I'm happy to be running track_jobs_in_database=True, because our prod server is going to have separate machines doing web vs. job handling/managing. Thanks, -Amir

Nate Coraor

11 Sep 11 Sep

11:59 p.m.

New subject: Track jobs in database should be True? Re: Shell script to start Galaxy in multi-server environment

On Aug 17, 2012, at 10:54 AM, Karger, Amir wrote:

...

Replying to myself.

The reason the runner was in a "sleep" state is the logic in lib/galaxy/web/config.py says:

if ( len( self.job_handlers ) == 1 ) and ( self.job_handlers[0] == self.server_name ) and ( self.job_manager == self.server_name ): self.track_jobs_in_database = False

Yeah, this logic doesn't correctly handle the situation (below) where you have a single manager/handler but separate web processes.

...

For our dev instance, we have a single server acting as the job manager and the job handler, and we have two web servers also running on the dev box. So Galaxy apparently decides not to track the jobs in the database. However, this means it never finds any jobs to run. When we explicitly set self.track_jobs_in_database to be true in config.py, Galaxy correctly finds and runs jobs.

I guess the webapps think that Galaxy *is* tracking jobs in the database, so they put jobs in there that never get pulled out? Or should it actually work when track_jobs_in_database is false, as long as the job manager and job handler(and webapps?) are on the same server. In that case, do we know why it didn't work? I'm happy to be running track_jobs_in_database=True, because our prod server is going to have separate machines doing web vs. job handling/managing.

I've made it possible to manually set track_jobs_in_database in universe_wsgi.ini in 244b4cb100d1. --nate

...

Thanks,

-Amir

___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/

4676

Age (days ago)

4716

Last active (days ago)

List overview

Download

7 comments

3 participants

participants (3)

Fields, Christopher J
Karger, Amir
Nate Coraor