UnicodeDecodeError - suggested note regarding UTF8 encoding in postgres databases
I've finally joined the happy throng suffering from Unicode errors. In my particular case, running the deseq tool from the main toolshed brought down my Galaxy server, and it wouldn't restart. The traceback in paster.log contained this Traceback (most recent call last): [snip] File "/home/galaxy-dev/galaxy/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 2903, in __init__ self._init_metadata() File "/home/galaxy-dev/galaxy/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 3306, in _init_metadata self.__buffer_rows() File "/home/galaxy-dev/galaxy/eggs/SQLAlchemy-0.7.9-py2.6-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 3326, in __buffer_rows self.__rowbuffer = collections.deque(self.cursor.fetchmany(size)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 162: ordinal not in range(128) The same error occurred each time I tried to restart the galaxy server until I restored the galaxy database from last night's postgres backup. The problem was my galaxy database in postgres was using SQL_ASCII encoding, which turns out to be a bad idea if you're ever going to write Unicode characters there. (I'm quite disappointed that this is the default encoding for postgres databases on CentOS 6.) The fix was to dump the galaxy database, recreate it with UTF8 encoding, and reload it. Now everything works, and that one tool doesn't crash my galaxy server. I thought it would be worth noting that postgres databases should be created with UTF8 encoding on this page: http://wiki.galaxyproject.org/Admin/Config/Performance/ProductionServer#Swit... I think the right thing to do is to ensure you set the default database encoding for postgres when you run initdb. You only get one chance to get this right. After that, a way to change the default database encoding to UTF8 is given here: https://gist.github.com/ffmike/877447 Scary stuff, but it worked well enough for me. Alternatively, if you didn't think about encodings when you ran initdb, you can think about it every time you run createdb. Hope this helps someone. cheers, Simon ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. =======================================================================
participants (1)
-
Guest, Simon