I'm seeing a possible latency issue or race condition when starting Galaxy after the latest hg upgrade (July 20) from galaxy-dist; the prior upgrade doesn't have this problem. We have small setup with one job manager/runner and two web front-ends for testing load balancing: …from universe_wsgi.ini: ------------------------------ [server:web0] use = egg:Paste#http port = 8080 host = 127.0.0.1 use_threadpool = true threadpool_workers = 7 [server:web1] use = egg:Paste#http port = 8081 host = 127.0.0.1 use_threadpool = true threadpool_workers = 7 [server:manager] use = egg:Paste#http port = 8079 host = 127.0.0.1 use_threadpool = true threadpool_workers = 5 ------------------------------ If I run: GALAXY_RUN_ALL=1 sh run.sh --daemon I will intermittently see the following in the paster log for any of the above services (example below is web1, but I have seen this for manager and web0 as well). The traceback and error is the same in all cases ('File exists: /home/a-m/galaxy/dist-database/tmp/work_tmp'): ------------------------------ galaxy.tool_shed.tool_shed_registry DEBUG 2012-07-30 11:40:10,194 Loading references to tool sheds from tool_sheds_conf.xml galaxy.tool_shed.tool_shed_registry DEBUG 2012-07-30 11:40:10,194 Loaded reference to tool shed: Galaxy main tool shed galaxy.tool_shed.tool_shed_registry DEBUG 2012-07-30 11:40:10,194 Loaded reference to tool shed: Galaxy test tool shed galaxy.model.migrate.check DEBUG 2012-07-30 11:40:10,650 psycopg2 egg successfully loaded for postgres dialect galaxy.model.migrate.check INFO 2012-07-30 11:40:10,845 At database version 103 galaxy.tool_shed.migrate.check DEBUG 2012-07-30 11:40:10,940 psycopg2 egg successfully loaded for postgres dialect galaxy.tool_shed.migrate.check INFO 2012-07-30 11:40:10,986 At migrate_tools version 3 galaxy.model.custom_types DEBUG 2012-07-30 11:40:10,994 psycopg2 egg successfully loaded for postgres dialect Traceback (most recent call last): File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/web/buildapp.py", line 82, in app_factory app = UniverseApplication( global_conf = global_conf, **kwargs ) File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/app.py", line 66, in __init__ self.installed_repository_manager.load_proprietary_datatypes() File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/tool_shed/__init__.py", line 47, in load_proprietary_datatypes installed_repository_dict = galaxy.util.shed_util.load_installed_datatypes( self.app, tool_shed_repository, relative_install_dir ) File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/util/shed_util.py", line 1269, in load_installed_datatypes work_dir = make_tmp_directory() File "/home/a-m/galaxy/galaxy-dist/lib/galaxy/util/shed_util.py", line 1305, in make_tmp_directory os.makedirs( work_dir ) File "/usr/lib64/python2.6/os.py", line 157, in makedirs mkdir(name, mode) OSError: [Errno 17] File exists: '/home/a-m/galaxy/dist-database/tmp/work_tmp' Removing PID file web1.pid ------------------------------ I was also seeing this using separate runner/webapp ini files and 'run_multiple_processes.sh --daemon', but we decided to go ahead and migrate over to a unified universe_wsgi.ini file. Anyway, we found a workaround by rerunning 'GALAXY_RUN_ALL=1 sh run.sh --daemon' which skips any running services, but I'm curious whether anyone else has seen this and whether there is a fix (or maybe added config setting we are missing?) chris