Startup error, Galaxy DB got corrupted?
Dear list, Yesterday, I noticed that the user quota setting on our Galaxy had a problem - the list of user email address is missing from the manage quota page. So I thought to give a Galaxy codebase update a try after backing up the Galaxy dir and the database. Running hg head gives me: == postgres@ubuntu:/mnt/galaxyTools/galaxy-2013-05-13$ hg head changeset: 10003:b4a373d86c51 tag: tip parent: 10001:471484ff8be6 user: greg date: Wed Jun 12 11:48:09 2013 -0400 summary: Add targets to Repository Actions menu items. == I updated the DB schema to support the code update. But the restart of Galaxy failed with an error in paster.log: == Traceback (most recent call last): File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/webapps/galaxy/buildapp.py", line 35, in app_factory app = UniverseApplication( global_conf = global_conf, **kwargs ) File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/app.py", line 164, in __init__ self.job_manager = manager.JobManager( self ) File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/manager.py", line 36, in __init__ self.job_handler.start() File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/handler.py", line 34, in start self.job_queue.start() File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/handler.py", line 77, in start self.__check_jobs_at_startup() File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/handler.py", line 125, in __check_jobs_at_startup self.dispatcher.recover( job, job_wrapper ) File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/handler.py", line 620, in recover self.job_runners[runner_name].recover( job, job_wrapper ) File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/runners/local.py", line 128, in recover job_wrapper.change_state( model.Job.states.ERROR, info = "This job was killed when Galaxy was restarted. Please retry the job." ) File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/__init__.py", line 824, in change_state dataset.state = state File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/model/__init__.py", line 1163, in set_dataset_state object_session( self ).flush() #flush here, because hda.flush() won't flush the Dataset object File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py", line 1718, in flush self._flush(objects) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py", line 1789, in _flush flush_context.execute() File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/unitofwork.py", line 331, in execute rec.execute(self) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/unitofwork.py", line 475, in execute uow File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/persistence.py", line 59, in save_obj mapper, table, update) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/persistence.py", line 485, in _emit_update_statements execute(statement, params) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 1449, in execute params) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 1584, in _execute_clauseelement compiled_sql, distilled_params File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 1698, in _execute_context context) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 1691, in _execute_context context) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/engine/default.py", line 331, in do_execute cursor.execute(statement, parameters) InternalError: (InternalError) could not read block 326 in file "base/257536/259291": read only 0 of 8192 bytes 'UPDATE dataset SET update_time=%(update_time)s, state=%(state)s WHERE dataset.id = %(dataset_id)s' {'update_time': datetime.datetime(2013, 6, 17, 12, 42, 38, 145908), 'state': 'error', 'dataset_id': 68792} == I went on to check the Galaxy database (in postgres) which confirmed the error in the paster.log. == galaxy=# select * from dataset where id = 68792; id | create_time | update_time | state | deleted | purged | purgable | external_filename | _extra_files_path | file_size | total_size | object_store_id | uuid -------+----------------------------+----------------------------+-------+---------+--------+----------+-------------------+-------------------+-----------+------------+-----------------+------ 68792 | 2013-06-12 06:55:17.608725 | 2013-06-12 06:55:19.396557 | error | f | f | t | | | | | | (1 row) galaxy=# UPDATE dataset SET update_time='2013-06-17 12:42:38.145908', state='error' WHERE dataset.id = 68792; ERROR: could not read block 326 in file "base/257536/259291": read only 0 of 8192 bytes galaxy=# UPDATE dataset SET state='error' WHERE dataset.id = 68792; UPDATE 1 == When I tried to restore the database from the dumped copy I made yesterday, I got the following error: == ERROR: insert or update on table "galaxy_session_to_history" violates foreign key constraint "galaxy_session_to_history_ session_id_fkey" DETAIL: Key (session_id)=(956338) is not present in table "galaxy_session". ERROR: insert or update on table "job" violates foreign key constraint "job_session_id_fkey" DETAIL: Key (session_id)=(852060) is not present in table "galaxy_session". ERROR: insert or update on table "workflow_step_connection" violates foreign key constraint "workflow_step_connection_input_step_id_fkey" DETAIL: Key (input_step_id)=(18422) is not present in table "workflow_step". ERROR: insert or update on table "workflow_step_connection" violates foreign key constraint "workflow_step_connection_output_step_id_fkey" DETAIL: Key (output_step_id)=(18422) is not present in table "workflow_step". == It seems our postgres DB got corrupted? Any suggestion on a fix? Thanks a lot! Leon -- Hailiang (Leon) Mei Netherlands Bioinformatics Center BioAssist NGS Taskforce - http://ngs.nbic.nl<https://wiki.nbic.nl/index.php/Next_Generation_Sequencing> Skype: leon_mei Mobile: +31 6 41709231
The error "ERROR: could not read block 326 in file "base/257536/259291": read only 0 of 8192 bytes" line makes me wonder if your database may have been corrupt prior to the upgrade/dump. Some brief googling suggests that this could be hardware failure induced among other things. Can you check the postgres logs for any likely explanations? How exactly did you perform your database backup and attempted recreation? -Dannon On Mon, Jun 17, 2013 at 5:12 PM, Leon Mei <hailiang.mei@nbic.nl> wrote:
Dear list,
Yesterday, I noticed that the user quota setting on our Galaxy had a problem - the list of user email address is missing from the manage quota page. So I thought to give a Galaxy codebase update a try after backing up the Galaxy dir and the database.
Running hg head gives me: == postgres@ubuntu:/mnt/galaxyTools/galaxy-2013-05-13$ hg head changeset: 10003:b4a373d86c51 tag: tip parent: 10001:471484ff8be6 user: greg date: Wed Jun 12 11:48:09 2013 -0400 summary: Add targets to Repository Actions menu items. ==
I updated the DB schema to support the code update. But the restart of Galaxy failed with an error in paster.log: == Traceback (most recent call last): File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/webapps/galaxy/buildapp.py", line 35, in app_factory app = UniverseApplication( global_conf = global_conf, **kwargs ) File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/app.py", line 164, in __init__ self.job_manager = manager.JobManager( self ) File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/manager.py", line 36, in __init__ self.job_handler.start() File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/handler.py", line 34, in start self.job_queue.start() File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/handler.py", line 77, in start self.__check_jobs_at_startup() File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/handler.py", line 125, in __check_jobs_at_startup self.dispatcher.recover( job, job_wrapper ) File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/handler.py", line 620, in recover self.job_runners[runner_name].recover( job, job_wrapper ) File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/runners/local.py", line 128, in recover job_wrapper.change_state( model.Job.states.ERROR, info = "This job was killed when Galaxy was restarted. Please retry the job." ) File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/__init__.py", line 824, in change_state dataset.state = state File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/model/__init__.py", line 1163, in set_dataset_state object_session( self ).flush() #flush here, because hda.flush() won't flush the Dataset object File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py", line 1718, in flush self._flush(objects) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py", line 1789, in _flush flush_context.execute() File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/unitofwork.py", line 331, in execute rec.execute(self) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/unitofwork.py", line 475, in execute uow File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/persistence.py", line 59, in save_obj mapper, table, update) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/persistence.py", line 485, in _emit_update_statements execute(statement, params) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 1449, in execute params) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 1584, in _execute_clauseelement compiled_sql, distilled_params File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 1698, in _execute_context context) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 1691, in _execute_context context) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/engine/default.py", line 331, in do_execute cursor.execute(statement, parameters) InternalError: (InternalError) could not read block 326 in file "base/257536/259291": read only 0 of 8192 bytes 'UPDATE dataset SET update_time=%(update_time)s, state=%(state)s WHERE dataset.id = %(dataset_id)s' {'update_time': datetime.datetime(2013, 6, 17, 12, 42, 38, 145908), 'state': 'error', 'dataset_id': 68792} ==
I went on to check the Galaxy database (in postgres) which confirmed the error in the paster.log. == galaxy=# select * from dataset where id = 68792; id | create_time | update_time | state | deleted | purged | purgable | external_filename | _extra_files_path | file_size | total_size | object_store_id | uuid
-------+----------------------------+----------------------------+-------+---------+--------+----------+-------------------+-------------------+-----------+------------+-----------------+------ 68792 | 2013-06-12 06:55:17.608725 | 2013-06-12 06:55:19.396557 | error | f | f | t | | | | | | (1 row)
galaxy=# UPDATE dataset SET update_time='2013-06-17 12:42:38.145908', state='error' WHERE dataset.id = 68792; ERROR: could not read block 326 in file "base/257536/259291": read only 0 of 8192 bytes galaxy=# UPDATE dataset SET state='error' WHERE dataset.id = 68792; UPDATE 1 ==
When I tried to restore the database from the dumped copy I made yesterday, I got the following error: == ERROR: insert or update on table "galaxy_session_to_history" violates foreign key constraint "galaxy_session_to_history_ session_id_fkey" DETAIL: Key (session_id)=(956338) is not present in table "galaxy_session". ERROR: insert or update on table "job" violates foreign key constraint "job_session_id_fkey" DETAIL: Key (session_id)=(852060) is not present in table "galaxy_session". ERROR: insert or update on table "workflow_step_connection" violates foreign key constraint "workflow_step_connection_input_step_id_fkey" DETAIL: Key (input_step_id)=(18422) is not present in table "workflow_step". ERROR: insert or update on table "workflow_step_connection" violates foreign key constraint "workflow_step_connection_output_step_id_fkey" DETAIL: Key (output_step_id)=(18422) is not present in table "workflow_step". ==
It seems our postgres DB got corrupted? Any suggestion on a fix?
Thanks a lot!
Leon
-- Hailiang (Leon) Mei Netherlands Bioinformatics Center BioAssist NGS Taskforce - http://ngs.nbic.nl<https://wiki.nbic.nl/index.php/Next_Generation_Sequencing> Skype: leon_mei Mobile: +31 6 41709231
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Dear Dannon, Thanks for your suggestion! We did give it a try with the backup copy of 10 days back and that worked! We keep the postgres log and the failed postgres backup (using pg_dump with both custom and plain text format) for now and will look into what exactly happened during those 10 days. After this update, it seems we also have solved the problem of missing user email addresses in the add quota page. Need a bit more test further to see it all goes well. Cheers, Leon On Tue, Jun 18, 2013 at 3:24 AM, Dannon Baker <dannon.baker@gmail.com>wrote:
The error "ERROR: could not read block 326 in file "base/257536/259291": read only 0 of 8192 bytes" line makes me wonder if your database may have been corrupt prior to the upgrade/dump. Some brief googling suggests that this could be hardware failure induced among other things. Can you check the postgres logs for any likely explanations?
How exactly did you perform your database backup and attempted recreation?
-Dannon
On Mon, Jun 17, 2013 at 5:12 PM, Leon Mei <hailiang.mei@nbic.nl> wrote:
Dear list,
Yesterday, I noticed that the user quota setting on our Galaxy had a problem - the list of user email address is missing from the manage quota page. So I thought to give a Galaxy codebase update a try after backing up the Galaxy dir and the database.
Running hg head gives me: == postgres@ubuntu:/mnt/galaxyTools/galaxy-2013-05-13$ hg head changeset: 10003:b4a373d86c51 tag: tip parent: 10001:471484ff8be6 user: greg date: Wed Jun 12 11:48:09 2013 -0400 summary: Add targets to Repository Actions menu items. ==
I updated the DB schema to support the code update. But the restart of Galaxy failed with an error in paster.log: == Traceback (most recent call last): File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/webapps/galaxy/buildapp.py", line 35, in app_factory app = UniverseApplication( global_conf = global_conf, **kwargs ) File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/app.py", line 164, in __init__ self.job_manager = manager.JobManager( self ) File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/manager.py", line 36, in __init__ self.job_handler.start() File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/handler.py", line 34, in start self.job_queue.start() File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/handler.py", line 77, in start self.__check_jobs_at_startup() File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/handler.py", line 125, in __check_jobs_at_startup self.dispatcher.recover( job, job_wrapper ) File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/handler.py", line 620, in recover self.job_runners[runner_name].recover( job, job_wrapper ) File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/runners/local.py", line 128, in recover job_wrapper.change_state( model.Job.states.ERROR, info = "This job was killed when Galaxy was restarted. Please retry the job." ) File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/jobs/__init__.py", line 824, in change_state dataset.state = state File "/mnt/galaxyTools/galaxy-2013-05-13/lib/galaxy/model/__init__.py", line 1163, in set_dataset_state object_session( self ).flush() #flush here, because hda.flush() won't flush the Dataset object File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py", line 1718, in flush self._flush(objects) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/session.py", line 1789, in _flush flush_context.execute() File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/unitofwork.py", line 331, in execute rec.execute(self) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/unitofwork.py", line 475, in execute uow File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/persistence.py", line 59, in save_obj mapper, table, update) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/orm/persistence.py", line 485, in _emit_update_statements execute(statement, params) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 1449, in execute params) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 1584, in _execute_clauseelement compiled_sql, distilled_params File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 1698, in _execute_context context) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/engine/base.py", line 1691, in _execute_context context) File "/mnt/galaxyTools/galaxy-2013-05-13/eggs/SQLAlchemy-0.7.9-py2.7-linux-x86_64-ucs4.egg/sqlalchemy/engine/default.py", line 331, in do_execute cursor.execute(statement, parameters) InternalError: (InternalError) could not read block 326 in file "base/257536/259291": read only 0 of 8192 bytes 'UPDATE dataset SET update_time=%(update_time)s, state=%(state)s WHERE dataset.id = %(dataset_id)s' {'update_time': datetime.datetime(2013, 6, 17, 12, 42, 38, 145908), 'state': 'error', 'dataset_id': 68792} ==
I went on to check the Galaxy database (in postgres) which confirmed the error in the paster.log. == galaxy=# select * from dataset where id = 68792; id | create_time | update_time | state | deleted | purged | purgable | external_filename | _extra_files_path | file_size | total_size | object_store_id | uuid
-------+----------------------------+----------------------------+-------+---------+--------+----------+-------------------+-------------------+-----------+------------+-----------------+------ 68792 | 2013-06-12 06:55:17.608725 | 2013-06-12 06:55:19.396557 | error | f | f | t | | | | | | (1 row)
galaxy=# UPDATE dataset SET update_time='2013-06-17 12:42:38.145908', state='error' WHERE dataset.id = 68792; ERROR: could not read block 326 in file "base/257536/259291": read only 0 of 8192 bytes galaxy=# UPDATE dataset SET state='error' WHERE dataset.id = 68792; UPDATE 1 ==
When I tried to restore the database from the dumped copy I made yesterday, I got the following error: == ERROR: insert or update on table "galaxy_session_to_history" violates foreign key constraint "galaxy_session_to_history_ session_id_fkey" DETAIL: Key (session_id)=(956338) is not present in table "galaxy_session". ERROR: insert or update on table "job" violates foreign key constraint "job_session_id_fkey" DETAIL: Key (session_id)=(852060) is not present in table "galaxy_session". ERROR: insert or update on table "workflow_step_connection" violates foreign key constraint "workflow_step_connection_input_step_id_fkey" DETAIL: Key (input_step_id)=(18422) is not present in table "workflow_step". ERROR: insert or update on table "workflow_step_connection" violates foreign key constraint "workflow_step_connection_output_step_id_fkey" DETAIL: Key (output_step_id)=(18422) is not present in table "workflow_step". ==
It seems our postgres DB got corrupted? Any suggestion on a fix?
Thanks a lot!
Leon
-- Hailiang (Leon) Mei Netherlands Bioinformatics Center BioAssist NGS Taskforce - http://ngs.nbic.nl<https://wiki.nbic.nl/index.php/Next_Generation_Sequencing> Skype: leon_mei Mobile: +31 6 41709231
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Hailiang (Leon) Mei Netherlands Bioinformatics Center BioAssist NGS Taskforce - http://ngs.nbic.nl<https://wiki.nbic.nl/index.php/Next_Generation_Sequencing> Skype: leon_mei Mobile: +31 6 41709231
participants (2)
-
Dannon Baker
-
Leon Mei