HI Nate, Many thanks for these ideas - our HPC guys are going to try a few things. Hopefully we'll nail the problem and be able to report back in case someone else has the same issues. Best Wishes, David. __________________________________ Dr David A. Matthews Senior Lecturer in Virology Room E49 Department of Cellular and Molecular Medicine, School of Medical Sciences University Walk, University of Bristol Bristol. BS8 1TD U.K. Tel. +44 117 3312058 Fax. +44 117 3312091 D.A.Matthews@bristol.ac.uk On 19 Dec 2011, at 15:56, Nate Coraor wrote:
On Dec 14, 2011, at 6:13 PM, David Matthews wrote:
Hi Guys,
Sorry to be a pain but this seems to be getting worse for us. Here are the latest tracebacks - any suggestions would be gratefully received!!
Hi David,
As the MemoryError indicates, the Galaxy process is running out of memory. debug = False is preferable, actually. I asked because having debug = True could easily result in the behavior you're seeing.
The pbs code definitely has a memory leak, I believe within libtorque or pbs_python. Because of this, I restart my job runner process when it reaches a certain amount of memory usage. However, this may not be the cause of your errors. To figure it out, we'll need to know exactly which thread is consuming the memory. You may want to enable the heartbeat log and look there to see which threads are active.
The question about the path was in reference to whether these errors occur immediately upon running a tophat job, without any interaction, or if they occur when you try to click to view the job's output, or on some other part of the Galaxy interface.
Thanks, --nate
Cheers David
galaxy.jobs.runners.pbs ERROR 2011-12-13 19:57:57,689 Uncaught exception checking jobs Traceback (most recent call last): File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/lib/galaxy/jobs/runners/pbs.py", line 338, in monitor self.check_watched_items() File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/lib/galaxy/jobs/runners/pbs.py", line 351, in check_watched_items ( failures, statuses ) = self.check_all_jobs() File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/lib/galaxy/jobs/runners/pbs.py", line 462, in check_all_jobs statuses.update( self.convert_statjob_to_bunches( jobs ) ) File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/lib/galaxy/jobs/runners/pbs.py", line 476, in convert_statjob_to_bunches statuses[ job.name ] = Bunch( **status ) MemoryError Unhandled exception in thread started by Traceback (most recent call last): File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 504, in __bootstrap File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 580, in __bootstrap_inner MemoryError Unhandled exception in thread started by <bound method Thread.__bootstrap of <Thread(Thread-11, stopped 1111390528)>> Traceback (most recent call last): File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 504, in __bootstrap File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 545, in __bootstrap_inner MemoryError Unexpected exception in worker <function <lambda> at 0x883acf8> Traceback (most recent call last): File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 863, in worker_thread_callback File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1037, in <lambda> File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1056, in process_request_in_thread File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1044, in handle_error File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/SocketServer.py", line 334, in handle_error MemoryError Unhandled exception in thread started by <bound method Thread.__bootstrap of <Thread(Thread-10, stopped 1109289280)>> Traceback (most recent call last): File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 504, in __bootstrap File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 545, in __bootstrap_inner MemoryError ---------------------------------------- Exception happened during processing of request from ('xxx.xxx.xxx.xxx', 44389) Traceback (most recent call last): File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1053, in process_request_in_thread File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/SocketServer.py", line 322, in finish_request File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/SocketServer.py", line 616, in __init__ File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/SocketServer.py", line 657, in setup MemoryError ---------------------------------------- ---------------------------------------- Exception happened during processing of request from ('xxx.xxx.xx.xx', 60069) Unexpected exception in worker <function <lambda> at 0x883a2a8>Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1053, in process_request_in_thread Unhandled exception in thread started by <bound method Thread.__bootstrap of <Thread(worker 9, stopped 1130301760)>> Traceback (most recent call last): File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 504, in __bootstrap File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 545, in __bootstrap_inner MemoryError File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/SocketServer.py", line 322, in finish_request
Unexpected exception in worker <function <lambda> at 0x8721410> Traceback (most recent call last): File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 863, in worker_thread_callback Unhandled exception in thread started by <bound method Thread.__bootstrap of <Thread(worker 0, stopped 1086265664)>> Traceback (most recent call last): File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 504, in __bootstrap File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 545, in __bootstrap_inner File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 242, in format_exc File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 142, in format_exception File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 76, in format_tb File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 101, in extract_tb File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 14, in getline File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 40, in getlines MemoryError ---------------------------------------- Exception happened during processing of request from ('xxx.xxx.xx.xx', 60071) Traceback (most recent call last): File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1053, in process_request_in_thread Unexpected exception in worker <function <lambda> at 0x8721410> Traceback (most recent call last): File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 863, in worker_thread_callback Unhandled exception in thread started by <bound method Thread.__bootstrap of <Thread(worker 6, stopped 1123998016)>> Traceback (most recent call last): File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 504, in __bootstrap self.__bootstrap_inner() File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 545, in __bootstrap_inner (self.name, _format_exc())) File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 242, in format_exc return ''.join(format_exception(etype, value, tb, limit)) File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 142, in format_exception list = list + format_tb(tb, limit) File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 76, in format_tb return format_list(extract_tb(tb, limit)) File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 101, in extract_tb line = linecache.getline(filename, lineno, f.f_globals) File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 14, in getline lines = getlines(filename, module_globals) File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 40, in getlines return updatecache(filename, module_globals) File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 131, in updatecache lines = fp.readlines() MemoryError ---------------------------------------- Exception happened during processing of request from ('xxx.xxx.xxx.xxx', 44416) Traceback (most recent call last): File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1053, in process_request_in_thread Unexpected exception in worker <function <lambda> at 0x8721410> Traceback (most recent call last): File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 863, in worker_thread_callback Unhandled exception in thread started by <bound method Thread.__bootstrap of <Thread(worker 7, stopped 1126099264)>> Traceback (most recent call last): File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 504, in __bootstrap self.__bootstrap_inner() File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 545, in __bootstrap_inner (self.name, _format_exc())) File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 242, in format_exc return ''.join(format_exception(etype, value, tb, limit)) File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 142, in format_exception list = list + format_tb(tb, limit) File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 76, in format_tb return format_list(extract_tb(tb, limit)) File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 101, in extract_tb line = linecache.getline(filename, lineno, f.f_globals) File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 14, in getline lines = getlines(filename, module_globals) File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 40, in getlines return updatecache(filename, module_globals) File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 131, in updatecache lines = fp.readlines() MemoryError ----------------------------------------
-- ----------------------------------------------------------- Callum Wright HPC Systems Administrator High Performance Computing University of Bristol
Phone: 0117 331 4429 email: c.wright@bristol.ac.uk web: www.acrc.bristol.ac.uk -----------------------------------------------------------