Hi all,
We are faced with some incomprehensible troubles with our Galaxy instance (newly upgraded to 16.07, using SGE and PostgreSQL database).
Since two weeks, it started to suddenly give different kind of error messages randomly, sometimes it gives "failure preparing job", sometimes "The cluster DRM system terminated this job", sometimes it finishes without error, even when relaunching the same wrapper with the same input datasets.
In parallel, we have a dev instance for which we do not have these troubles. The config files are substancially the same, except the connection to the database which is obviously different.
We suspected an issue from PostgreSQL database. So we did some tests and changed the connection with an empty postgresql databse and the troubles seem to disappear.
Is there any scripts to check the integrity of the database? Any recommendations to face this kind of troubles? It seems that there is inconsistencies in the database that makes the system crash.
Thanks a lot for your help.
Regards,
Alexis
galaxy-dev@lists.galaxyproject.org