Odd (possible latency/race?) problems post-upgrade
I'm seeing a possible latency issue or race condition when starting Galaxy after the latest hg upgrade (July 20) from galaxy-dist; the prior upgrade doesn't have this problem. We have small setup with one job manager/runner and two web front-ends for testing load balancing: …from universe_wsgi.ini: ------------------------------ [server:web0] use = egg:Paste#http port = 8080 host = 127.0.0.1 use_threadpool = true threadpool_workers = 7 [server:web1] use = egg:Paste#http port = 8081 host = 127.0.0.1 use_threadpool = true threadpool_workers = 7 [server:manager] use = egg:Paste#http port = 8079 host = 127.0.0.1 use_threadpool = true threadpool_workers = 5 ------------------------------ If I run: GALAXY_RUN_ALL=1 sh run.sh --daemon I will intermittently see the following in the paster log for any of the above services (example below is web1, but I have seen this for manager and web0 as well). The traceback and error is the same in all cases ('File exists: /home/a-m/galaxy/dist-database/tmp/work_tmp'): ------------------------------ galaxy.tool_shed.tool_shed_registry DEBUG 2012-07-30 11:40:10,194 Loading references to tool sheds from tool_sheds_conf.xml galaxy.tool_shed.tool_shed_registry DEBUG 2012-07-30 11:40:10,194 Loaded reference to tool shed: Galaxy main tool shed galaxy.tool_shed.tool_shed_registry DEBUG 2012-07-30 11:40:10,194 Loaded reference to tool shed: Galaxy test tool shed galaxy.model.migrate.check DEBUG 2012-07-30 11:40:10,650 psycopg2 egg successfully loaded for postgres dialect galaxy.model.migrate.check INFO 2012-07-30 11:40:10,845 At database version 103 galaxy.tool_shed.migrate.check DEBUG 2012-07-30 11:40:10,940 psycopg2 egg successfully loaded for postgres dialect galaxy.tool_shed.migrate.check INFO 2012-07-30 11:40:10,986 At migrate_tools version 3 galaxy.model.custom_types DEBUG 2012-07-30 11:40:10,994 psycopg2 egg successfully loaded for postgres dialect Traceback (most recent call last): File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/web/buildapp.py", line 82, in app_factory app = UniverseApplication( global_conf = global_conf, **kwargs ) File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/app.py", line 66, in __init__ self.installed_repository_manager.load_proprietary_datatypes() File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/tool_shed/__init__.py", line 47, in load_proprietary_datatypes installed_repository_dict = galaxy.util.shed_util.load_installed_datatypes( self.app, tool_shed_repository, relative_install_dir ) File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/util/shed_util.py", line 1269, in load_installed_datatypes work_dir = make_tmp_directory() File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/util/shed_util.py", line 1305, in make_tmp_directory os.makedirs( work_dir ) File "/usr/lib64/python2.6/os.py", line 157, in makedirs mkdir(name, mode) OSError: [Errno 17] File exists: '/home/a-m/galaxy/dist-database/tmp/work_tmp' Removing PID file web1.pid ------------------------------ I was also seeing this using separate runner/webapp ini files and 'run_multiple_processes.sh --daemon', but we decided to go ahead and migrate over to a unified universe_wsgi.ini file. Anyway, we found a workaround by rerunning 'GALAXY_RUN_ALL=1 sh run.sh --daemon' which skips any running services, but I'm curious whether anyone else has seen this and whether there is a fix (or maybe added config setting we are missing?) chris
Hi, I'm experiencing the same problem after I configured the "web application scaling"- although I have to say I still have the june 03 version from central. Has this been resolved in newer updates? Cheers, Jelle On Mon, Jul 30, 2012 at 7:03 PM, Fields, Christopher J <cjfields@illinois.edu> wrote:
I'm seeing a possible latency issue or race condition when starting Galaxy after the latest hg upgrade (July 20) from galaxy-dist; the prior upgrade doesn't have this problem. We have small setup with one job manager/runner and two web front-ends for testing load balancing:
…from universe_wsgi.ini: ------------------------------ [server:web0] use = egg:Paste#http port = 8080 host = 127.0.0.1 use_threadpool = true threadpool_workers = 7
[server:web1] use = egg:Paste#http port = 8081 host = 127.0.0.1 use_threadpool = true threadpool_workers = 7
[server:manager] use = egg:Paste#http port = 8079 host = 127.0.0.1 use_threadpool = true threadpool_workers = 5 ------------------------------
If I run:
GALAXY_RUN_ALL=1 sh run.sh --daemon
I will intermittently see the following in the paster log for any of the above services (example below is web1, but I have seen this for manager and web0 as well). The traceback and error is the same in all cases ('File exists: /home/a-m/galaxy/dist-database/tmp/work_tmp'):
------------------------------ galaxy.tool_shed.tool_shed_registry DEBUG 2012-07-30 11:40:10,194 Loading references to tool sheds from tool_sheds_conf.xml galaxy.tool_shed.tool_shed_registry DEBUG 2012-07-30 11:40:10,194 Loaded reference to tool shed: Galaxy main tool shed galaxy.tool_shed.tool_shed_registry DEBUG 2012-07-30 11:40:10,194 Loaded reference to tool shed: Galaxy test tool shed galaxy.model.migrate.check DEBUG 2012-07-30 11:40:10,650 psycopg2 egg successfully loaded for postgres dialect galaxy.model.migrate.check INFO 2012-07-30 11:40:10,845 At database version 103 galaxy.tool_shed.migrate.check DEBUG 2012-07-30 11:40:10,940 psycopg2 egg successfully loaded for postgres dialect galaxy.tool_shed.migrate.check INFO 2012-07-30 11:40:10,986 At migrate_tools version 3 galaxy.model.custom_types DEBUG 2012-07-30 11:40:10,994 psycopg2 egg successfully loaded for postgres dialect Traceback (most recent call last): File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/web/buildapp.py", line 82, in app_factory app = UniverseApplication( global_conf = global_conf, **kwargs ) File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/app.py", line 66, in __init__ self.installed_repository_manager.load_proprietary_datatypes() File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/tool_shed/__init__.py", line 47, in load_proprietary_datatypes installed_repository_dict = galaxy.util.shed_util.load_installed_datatypes( self.app, tool_shed_repository, relative_install_dir ) File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/util/shed_util.py", line 1269, in load_installed_datatypes work_dir = make_tmp_directory() File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/util/shed_util.py", line 1305, in make_tmp_directory os.makedirs( work_dir ) File "/usr/lib64/python2.6/os.py", line 157, in makedirs mkdir(name, mode) OSError: [Errno 17] File exists: '/home/a-m/galaxy/dist-database/tmp/work_tmp' Removing PID file web1.pid ------------------------------
I was also seeing this using separate runner/webapp ini files and 'run_multiple_processes.sh --daemon', but we decided to go ahead and migrate over to a unified universe_wsgi.ini file.
Anyway, we found a workaround by rerunning 'GALAXY_RUN_ALL=1 sh run.sh --daemon' which skips any running services, but I'm curious whether anyone else has seen this and whether there is a fix (or maybe added config setting we are missing?)
chris
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
I mentioned this on IRC, Nate indicated he was looking into it. My short-term workaround was adding a 'sleep 9' in the process initiation loops in 'run.sh' and 'run_multiple_processes.sh' (just after the python call) to allow time between the start of each process. chris On Aug 2, 2012, at 7:46 AM, Jelle Scholtalbers <j.scholtalbers@gmail.com> wrote:
Hi,
I'm experiencing the same problem after I configured the "web application scaling"- although I have to say I still have the june 03 version from central. Has this been resolved in newer updates?
Cheers, Jelle
On Mon, Jul 30, 2012 at 7:03 PM, Fields, Christopher J <cjfields@illinois.edu> wrote:
I'm seeing a possible latency issue or race condition when starting Galaxy after the latest hg upgrade (July 20) from galaxy-dist; the prior upgrade doesn't have this problem. We have small setup with one job manager/runner and two web front-ends for testing load balancing:
…from universe_wsgi.ini: ------------------------------ [server:web0] use = egg:Paste#http port = 8080 host = 127.0.0.1 use_threadpool = true threadpool_workers = 7
[server:web1] use = egg:Paste#http port = 8081 host = 127.0.0.1 use_threadpool = true threadpool_workers = 7
[server:manager] use = egg:Paste#http port = 8079 host = 127.0.0.1 use_threadpool = true threadpool_workers = 5 ------------------------------
If I run:
GALAXY_RUN_ALL=1 sh run.sh --daemon
I will intermittently see the following in the paster log for any of the above services (example below is web1, but I have seen this for manager and web0 as well). The traceback and error is the same in all cases ('File exists: /home/a-m/galaxy/dist-database/tmp/work_tmp'):
------------------------------ galaxy.tool_shed.tool_shed_registry DEBUG 2012-07-30 11:40:10,194 Loading references to tool sheds from tool_sheds_conf.xml galaxy.tool_shed.tool_shed_registry DEBUG 2012-07-30 11:40:10,194 Loaded reference to tool shed: Galaxy main tool shed galaxy.tool_shed.tool_shed_registry DEBUG 2012-07-30 11:40:10,194 Loaded reference to tool shed: Galaxy test tool shed galaxy.model.migrate.check DEBUG 2012-07-30 11:40:10,650 psycopg2 egg successfully loaded for postgres dialect galaxy.model.migrate.check INFO 2012-07-30 11:40:10,845 At database version 103 galaxy.tool_shed.migrate.check DEBUG 2012-07-30 11:40:10,940 psycopg2 egg successfully loaded for postgres dialect galaxy.tool_shed.migrate.check INFO 2012-07-30 11:40:10,986 At migrate_tools version 3 galaxy.model.custom_types DEBUG 2012-07-30 11:40:10,994 psycopg2 egg successfully loaded for postgres dialect Traceback (most recent call last): File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/web/buildapp.py", line 82, in app_factory app = UniverseApplication( global_conf = global_conf, **kwargs ) File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/app.py", line 66, in __init__ self.installed_repository_manager.load_proprietary_datatypes() File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/tool_shed/__init__.py", line 47, in load_proprietary_datatypes installed_repository_dict = galaxy.util.shed_util.load_installed_datatypes( self.app, tool_shed_repository, relative_install_dir ) File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/util/shed_util.py", line 1269, in load_installed_datatypes work_dir = make_tmp_directory() File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/util/shed_util.py", line 1305, in make_tmp_directory os.makedirs( work_dir ) File "/usr/lib64/python2.6/os.py", line 157, in makedirs mkdir(name, mode) OSError: [Errno 17] File exists: '/home/a-m/galaxy/dist-database/tmp/work_tmp' Removing PID file web1.pid ------------------------------
I was also seeing this using separate runner/webapp ini files and 'run_multiple_processes.sh --daemon', but we decided to go ahead and migrate over to a unified universe_wsgi.ini file.
Anyway, we found a workaround by rerunning 'GALAXY_RUN_ALL=1 sh run.sh --daemon' which skips any running services, but I'm curious whether anyone else has seen this and whether there is a fix (or maybe added config setting we are missing?)
chris
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Hi, Greg has just fixed this in 9f790bc90769. Thanks for letting us know! --nate On Aug 2, 2012, at 3:34 PM, Fields, Christopher J wrote:
I mentioned this on IRC, Nate indicated he was looking into it. My short-term workaround was adding a 'sleep 9' in the process initiation loops in 'run.sh' and 'run_multiple_processes.sh' (just after the python call) to allow time between the start of each process.
chris
On Aug 2, 2012, at 7:46 AM, Jelle Scholtalbers <j.scholtalbers@gmail.com> wrote:
Hi,
I'm experiencing the same problem after I configured the "web application scaling"- although I have to say I still have the june 03 version from central. Has this been resolved in newer updates?
Cheers, Jelle
On Mon, Jul 30, 2012 at 7:03 PM, Fields, Christopher J <cjfields@illinois.edu> wrote:
I'm seeing a possible latency issue or race condition when starting Galaxy after the latest hg upgrade (July 20) from galaxy-dist; the prior upgrade doesn't have this problem. We have small setup with one job manager/runner and two web front-ends for testing load balancing:
…from universe_wsgi.ini: ------------------------------ [server:web0] use = egg:Paste#http port = 8080 host = 127.0.0.1 use_threadpool = true threadpool_workers = 7
[server:web1] use = egg:Paste#http port = 8081 host = 127.0.0.1 use_threadpool = true threadpool_workers = 7
[server:manager] use = egg:Paste#http port = 8079 host = 127.0.0.1 use_threadpool = true threadpool_workers = 5 ------------------------------
If I run:
GALAXY_RUN_ALL=1 sh run.sh --daemon
I will intermittently see the following in the paster log for any of the above services (example below is web1, but I have seen this for manager and web0 as well). The traceback and error is the same in all cases ('File exists: /home/a-m/galaxy/dist-database/tmp/work_tmp'):
------------------------------ galaxy.tool_shed.tool_shed_registry DEBUG 2012-07-30 11:40:10,194 Loading references to tool sheds from tool_sheds_conf.xml galaxy.tool_shed.tool_shed_registry DEBUG 2012-07-30 11:40:10,194 Loaded reference to tool shed: Galaxy main tool shed galaxy.tool_shed.tool_shed_registry DEBUG 2012-07-30 11:40:10,194 Loaded reference to tool shed: Galaxy test tool shed galaxy.model.migrate.check DEBUG 2012-07-30 11:40:10,650 psycopg2 egg successfully loaded for postgres dialect galaxy.model.migrate.check INFO 2012-07-30 11:40:10,845 At database version 103 galaxy.tool_shed.migrate.check DEBUG 2012-07-30 11:40:10,940 psycopg2 egg successfully loaded for postgres dialect galaxy.tool_shed.migrate.check INFO 2012-07-30 11:40:10,986 At migrate_tools version 3 galaxy.model.custom_types DEBUG 2012-07-30 11:40:10,994 psycopg2 egg successfully loaded for postgres dialect Traceback (most recent call last): File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/web/buildapp.py", line 82, in app_factory app = UniverseApplication( global_conf = global_conf, **kwargs ) File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/app.py", line 66, in __init__ self.installed_repository_manager.load_proprietary_datatypes() File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/tool_shed/__init__.py", line 47, in load_proprietary_datatypes installed_repository_dict = galaxy.util.shed_util.load_installed_datatypes( self.app, tool_shed_repository, relative_install_dir ) File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/util/shed_util.py", line 1269, in load_installed_datatypes work_dir = make_tmp_directory() File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/util/shed_util.py", line 1305, in make_tmp_directory os.makedirs( work_dir ) File "/usr/lib64/python2.6/os.py", line 157, in makedirs mkdir(name, mode) OSError: [Errno 17] File exists: '/home/a-m/galaxy/dist-database/tmp/work_tmp' Removing PID file web1.pid ------------------------------
I was also seeing this using separate runner/webapp ini files and 'run_multiple_processes.sh --daemon', but we decided to go ahead and migrate over to a unified universe_wsgi.ini file.
Anyway, we found a workaround by rerunning 'GALAXY_RUN_ALL=1 sh run.sh --daemon' which skips any running services, but I'm curious whether anyone else has seen this and whether there is a fix (or maybe added config setting we are missing?)
chris
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (3)
-
Fields, Christopher J
-
Jelle Scholtalbers
-
Nate Coraor