Hello All, after more researchs, I found that the crash of the galaxy server was caused by stopping jobs. We are working with our own SGE cluster. It's weird because we can kill jobs via history or administration panel without problem. In the paster.log, we just got this message before the crash of the server :
galaxy.jobs.handler DEBUG 2013-01-08 16:52:39,877 Stopping job 3065: galaxy.jobs.handler DEBUG 2013-01-08 16:52:39,877 stopping job 3065 in drmaa runner
I think this problem comes when there is many jobs in "running", "new" and "queued" states. Cheers, Cyril On 01/08/2013 04:11 PM, MONJEAUD wrote:
Hello All,
I'm trying to deploy my instance of Galaxy in production. Some tests we've done show that when the number of person connected is high (>20 together), the server stops itself.
Sometimes, I have this error in the paster.log:
Exception happened during processing of request from ('127.0.0.1', 60575) Traceback (most recent call last): File "/opt/galaxy-dist/eggs/Paste-1.6-py2.7.egg/paste/httpserver.py", line 1053, in process_request_in_thread self.finish_request(request, client_address) File "/local/python/2.7-bis/lib/python2.7/SocketServer.py", line 323, in finish_request self.RequestHandlerClass(request, client_address, self) File "/local/python/2.7-bis/lib/python2.7/SocketServer.py", line 641, in __init__ self.finish() File "/local/python/2.7-bis/lib/python2.7/SocketServer.py", line 694, in finish self.wfile.flush() File "/local/python/2.7-bis/lib/python2.7/socket.py", line 301, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) error: [Errno 32] Broken pipe
Do you have any ideas about this and how resolve it?
Cheers!! Cyril
-- Cyril Monjeaud Equipe Symbiose / Plate-forme GenOuest Bureau D156 IRISA-INRIA, Campus de Beaulieu 35042 Rennes cedex, France Tél: +33 (0) 2 99 84 74 17