Ulf,

Thanks for all the information, hopefully it helps me replicate this issue so I can figure out a better fix.

I can work on some better documentation for this, but there aren't any Galaxy-specific instructions that I know of.  Using the basic configuration for setting up the rabbitmq server (www.rabbitmq.com) should work fine, and then you just need to fill out the 'galaxy_internal_connection' part in your galaxy.ini (formerly universe_wsgi.ini) with something like (more documentation here:  http://kombu.readthedocs.org/en/latest/userguide/connections.html) :

amqp_internal_connection = amqp://galaxy:galaxy@localhost:5672//


And galaxy should automatically do the rest of the work for you -- configuring of the exchange/queues and such.


On Wed, Oct 8, 2014 at 6:58 AM, Ulf Schaefer <Ulf.Schaefer@phe.gov.uk> wrote:
Hi Dannon

Yes. The database is running on a different server from the server
running Galaxy. They are both VMs running Centos (6.5 on the Galaxy
server, 6.2 on the database server). The postgres version is 8.4.9 and
the database size is 712,161,040. I suspect that is not very large
compared to some others. There are a number of other databases running
on the same server, the one most frequently used is for our test Galaxy
server which runs on yet a different VM. This one is much smaller
(25,319,184). Both servers are on the same subnet. The problem is with
our production Galaxy (of course).

Are there any instructions around, how to implement a rabbitmq for my
Galaxy?

Thanks for looking into this.
Ulf

On 08/10/14 11:26, Dannon Baker wrote:
> Hi again Ulf,
>
> Thanks for the info. A few questions to help me track this down:
>
> Does the postgres database reside on a remote box from galaxy?  And is it
> very large?
>
> Running the latest galaxy may not change anything related to this
> particular issue, but you could always try it.
>
> Sqlalchemy is fixed at the latest version we can currently support without
> reworking how migration scripts function (which we will do, moving to
> Alembic, in the future), and I do suspect that this is actually a bug in
> sqlalchemy mapper initialization, but we should be able to come up with an
> interim work around.
>
> Finally, if this is a blocker for you while it's not trivial(and I still am
> going to fox this bug), setting up an amqp (rabbitmq) server and
> configuring your galaxy instances to communicate using that is a workaround.
> On Oct 8, 2014 10:45 AM, "Ulf Schaefer" <Ulf.Schaefer@phe.gov.uk> wrote:
>
>> Hi all again
>>
>> Seems I am not so fortunate that this would just go away.
>>
>> It appear to be happening sometimes at start-up time for one of the
>> handler processes. The first thing that appears to go wrong is this just
>> after starting the job handler queue:
>>
>> ---
>>
>> galaxy.jobs.handler INFO 2014-10-06 14:37:51,220 job handler queue started
>> galaxy.sample_tracking.external_service_types DEBUG 2014-10-06
>> 14:37:51,246 Loaded external_service_type: Simple unknown sequencer 1.0.0
>> galaxy.sample_tracking.external_service_types DEBUG 2014-10-06
>> 14:37:51,253 Loaded external_service_type: Applied Biosystems SOLiD 1.0.0
>> galaxy.queue_worker INFO 2014-10-06 14:37:51,254 Initalizing Galaxy
>> Queue Worker on
>> sqlalchemy+postgres://galaxy:xxx@158.119.147.86:5432/galaxyprod
>> galaxy.jobs DEBUG 2014-10-06 14:37:51,416 (78355) Working directory for
>> job is:
>>
>> /phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/database/job_working_directory/078/78355
>> galaxy.web.framework.base DEBUG 2014-10-06 14:37:51,454 Enabling
>> 'data_admin' controller, class: DataAdmin
>> galaxy.jobs.handler ERROR 2014-10-06 14:37:51,464 failure running job 78355
>> Traceback (most recent call last):
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/lib/galaxy/jobs/handler.py",
>> line 243, in __monitor_step
>>       job_state = self.__check_if_ready_to_run( job )
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/lib/galaxy/jobs/handler.py",
>> line 333, in __check_if_ready_to_run
>>       state = self.__check_user_jobs( job, self.job_wrappers[job.id] )
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/lib/galaxy/jobs/handler.py",
>> line 417, in __check_user_jobs
>>       if job.user:
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/attributes.py",
>> line 168, in __get__
>>       return self.impl.get(instance_state(instance),dict_)
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/attributes.py",
>> line 453, in get
>>       value = self.callable_(state, passive)
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/strategies.py",
>> line 508, in _load_for_state
>>       return self._emit_lazyload(session, state, ident_key)
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/strategies.py",
>> line 552, in _emit_lazyload
>>       return q._load_on_ident(ident_key)
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/query.py",
>> line 2512, in _load_on_ident
>>       return q.one()
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/query.py",
>> line 2184, in one
>>       ret = list(self)
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/query.py",
>> line 2227, in __iter__
>>       return self._execute_and_instances(context)
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/query.py",
>> line 2240, in _execute_and_instances
>>       close_with_result=True)
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/query.py",
>> line 2231, in _connection_from_session
>>       **kw)
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py",
>> line 774, in connection
>>       bind = self.get_bind(mapper, clause=clause, **kw)
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py",
>> line 1052, in get_bind
>>       c_mapper = mapper is not None and _class_to_mapper(mapper) or None
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/util.py",
>> line 680, in _class_to_mapper
>>       mapperlib.configure_mappers()
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/mapper.py",
>> line 2263, in configure_mappers
>>       mapper._post_configure_properties()
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/mapper.py",
>> line 1172, in _post_configure_properties
>>       prop.init()
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/interfaces.py",
>> line 128, in init
>>       self.do_init()
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/properties.py",
>> line 910, in do_init
>>       self._process_dependent_arguments()
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/properties.py",
>> line 998, in _process_dependent_arguments
>>       self.target = self.mapper.mapped_table
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/util/langhelpers.py",
>> line 494, in __get__
>>       obj.__dict__[self.__name__] = result = self.fget(obj)
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/properties.py",
>> line 891, in mapper
>>       mapper_ = mapper.class_mapper(self.argument(),
>>     File
>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/ext/declarative.py",
>> line 1428, in return_cls
>>       (prop.parent, arg, n.args[0], cls)
>> InvalidRequestError: When initializing mapper Mapper|Queue|kombu_queue,
>> expression 'Message' failed to locate a name ("name 'Message' is not
>> defined"). If this is a class name, consider adding this relationship()
>> to the <class 'kombu.transport.sqlalchemy.Queue'> class after both
>> dependent classes have been defined.
>>
>> ---
>>
>> After that it starts throwing the exception in monitor_step that I
>> previously posted. Has anyone seen a potentially related issue? Would an
>> update to the latest galaxy code help? I see there are newer versions of
>> SQLAlchemy available. Are they part of a newer code base?
>>
>> Thanks a lot for your help
>> Ulf
>>
>> On 07/10/14 12:20, Ulf Schaefer wrote:
>>> Update:
>>>
>>> The usual switching it off and on again (server reboot) has resolved the
>>> problem (for now), albeit in a rather unsatisfactory manner.
>>>
>>> If there are any insights what caused this behaviour and how it can be
>>> avoided in the future I'd be more than happy to hear them.
>>>
>>> Cheers
>>> Ulf
>>>
>>> On 07/10/14 11:04, Dannon Baker wrote:
>>>> One per second?  Can you tell me more about your configuration?   This
>> is
>>>> an odd bug with multiple mapper initialization that I haven't been able
>> to
>>>> reproduce yet, so any information will help.  Database configuration,
>>>> number of processes, etc.
>>>> On Oct 7, 2014 11:46 AM, "Ulf Schaefer" <Ulf.Schaefer@phe.gov.uk>
>> wrote:
>>>>
>>>>> Dear all
>>>>>
>>>>> Maybe one of you can shed some light on this error message that I see
>> in
>>>>> the log file for one of my handler processes. I get about one of them
>>>>> per second. The effect is that most of the jobs remain in the "waiting
>>>>> to run" stage.
>>>>>
>>>>> The postgres database is running on a separate server and appear to be
>>>>> doing just fine.
>>>>>
>>>>> Any help is greatly appreciated.
>>>>>
>>>>> Thanks
>>>>> Ulf
>>>>>
>>>>> ---
>>>>>
>>>>> galaxy.jobs.handler ERROR 2014-10-07 10:32:24,676 Exception in
>> monitor_step
>>>>> Traceback (most recent call last):
>>>>>       File
>>>>>
>>>>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/lib/galaxy/jobs/handler.py",
>>>>> line 161, in __monitor
>>>>>         self.__monitor_step()
>>>>>       File
>>>>>
>>>>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/lib/galaxy/jobs/handler.py",
>>>>> line 184, in __monitor_step
>>>>>         hda_not_ready =
>>>>> self.sa_session.query(model.Job.id).enable_eagerloads(False) \
>>>>>       File
>>>>>
>>>>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/scoping.py",
>>>>> line 114, in do
>>>>>         return getattr(self.registry(), name)(*args, **kwargs)
>>>>>       File
>>>>>
>>>>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py",
>>>>> line 1088, in query
>>>>>         return self._query_cls(entities, self, **kwargs)
>>>>>       File
>>>>>
>>>>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/query.py",
>>>>> line 108, in __init__
>>>>>         self._set_entities(entities)
>>>>>       File
>>>>>
>>>>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/query.py",
>>>>> line 117, in _set_entities
>>>>>         self._setup_aliasizers(self._entities)
>>>>>       File
>>>>>
>>>>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/query.py",
>>>>> line 132, in _setup_aliasizers
>>>>>         _entity_info(entity)
>>>>>       File
>>>>>
>>>>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/util.py",
>>>>> line 578, in _entity_info
>>>>>         mapperlib.configure_mappers()
>>>>>       File
>>>>>
>>>>>
>> "/phengs/hpc_storage/home/galaxy_hpc/galaxy-dist/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/orm/mapper.py",
>>>>> line 2260, in configure_mappers
>>>>>         raise e
>>>>> InvalidRequestError: One or more mappers failed to initialize - can't
>>>>> proceed with initialization of other mappers.  Original exception was:
>>>>> When initializing mapper Mapper|Queue|kombu_queue, expression 'Message'
>>>>> failed to locate a name ("name 'Message' is not defined"). If this is a
>>>>> class name, consider adding this relationship() to the <class
>>>>> 'kombu.transport.sqlalchemy.Queue'> class after both dependent classes
>>>>> have been defined.
>>>>>
>>>>> ---
>>>>>
>>>>>
>> **************************************************************************
>>>>> The information contained in the EMail and any attachments is
>> confidential
>>>>> and intended solely and for the attention and use of the named
>>>>> addressee(s). It may not be disclosed to any other person without the
>>>>> express authority of Public Health England, or the intended recipient,
>> or
>>>>> both. If you are not the intended recipient, you must not disclose,
>> copy,
>>>>> distribute or retain this message or any part of it. This footnote also
>>>>> confirms that this EMail has been swept for computer viruses by
>>>>> Symantec.Cloud, but please re-sweep any attachments before opening or
>>>>> saving. http://www.gov.uk/PHE
>>>>>
>> **************************************************************************
>>>>>
>>>>> ___________________________________________________________
>>>>> Please keep all replies on the list by using "reply all"
>>>>> in your mail client.  To manage your subscriptions to this
>>>>> and other Galaxy lists, please use the interface at:
>>>>>      http://lists.bx.psu.edu/
>>>>>
>>>>> To search Galaxy mailing lists use the unified search at:
>>>>>      http://galaxyproject.org/search/mailinglists/
>>>>>
>>>>
>>>
>>>
>> **************************************************************************
>>> The information contained in the EMail and any attachments is
>> confidential and intended solely and for the attention and use of the named
>> addressee(s). It may not be disclosed to any other person without the
>> express authority of Public Health England, or the intended recipient, or
>> both. If you are not the intended recipient, you must not disclose, copy,
>> distribute or retain this message or any part of it. This footnote also
>> confirms that this EMail has been swept for computer viruses by
>> Symantec.Cloud, but please re-sweep any attachments before opening or
>> saving. http://www.gov.uk/PHE
>>>
>> **************************************************************************
>>>
>>> ___________________________________________________________
>>> Please keep all replies on the list by using "reply all"
>>> in your mail client.  To manage your subscriptions to this
>>> and other Galaxy lists, please use the interface at:
>>>     http://lists.bx.psu.edu/
>>>
>>> To search Galaxy mailing lists use the unified search at:
>>>     http://galaxyproject.org/search/mailinglists/
>>>
>>
>> **************************************************************************
>> The information contained in the EMail and any attachments is confidential
>> and intended solely and for the attention and use of the named
>> addressee(s). It may not be disclosed to any other person without the
>> express authority of Public Health England, or the intended recipient, or
>> both. If you are not the intended recipient, you must not disclose, copy,
>> distribute or retain this message or any part of it. This footnote also
>> confirms that this EMail has been swept for computer viruses by
>> Symantec.Cloud, but please re-sweep any attachments before opening or
>> saving. http://www.gov.uk/PHE
>> **************************************************************************
>>
>

**************************************************************************
The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE
**************************************************************************