UnicodeDecodeError - suggested note regarding UTF8 encoding in postgres databases - galaxy-dev

18 Sep 2013

      I've finally joined the happy throng suffering from Unicode errors.  In my particular case, running the deseq tool from the main toolshed brought down my Galaxy server, and it wouldn't restart.  The traceback in paster.log contained this

Traceback (most recent call last):
  [snip]
  File "/home/galaxy-dev/galaxy/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 2903, in __init__
    self._init_metadata()
  File "/home/galaxy-dev/galaxy/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 3306, in _init_metadata
    self.__buffer_rows()
  File "/home/galaxy-dev/galaxy/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 3326, in __buffer_rows
    self.__rowbuffer = collections.deque(self.cursor.fetchmany(size))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 162: ordinal not in range(128)

The same error occurred each time I tried to restart the galaxy server until I restored the galaxy database from last night's postgres backup.

The problem was my galaxy database in postgres was using SQL_ASCII encoding, which turns out to be a bad idea if you're ever going to write Unicode characters there.  (I'm quite disappointed that this is the default encoding for postgres databases on CentOS 6.)

The fix was to dump the galaxy database, recreate it with UTF8 encoding, and reload it.  Now everything works, and that one tool doesn't crash my galaxy server.

I thought it would be worth noting that postgres databases should be created with UTF8 encoding on this page:
http://wiki.galaxyproject.org/Admin/Config/Performance/ProductionServer#Swit...

I think the right thing to do is to ensure you set the default database encoding for postgres when you run initdb.  You only get one chance to get this right.  After that, a way to change the default database encoding to UTF8 is given here:
https://gist.github.com/ffmike/877447

Scary stuff, but it worked well enough for me.

Alternatively, if you didn't think about encodings when you ran initdb, you can think about it every time you run createdb.

Hope this helps someone.

cheers,
Simon

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

UnicodeDecodeError - suggested note regarding UTF8 encoding in postgres databases

Guest, Simon

tags

participants (1)