On Dec 14, 2011, at 6:13 PM, David Matthews wrote:
Hi Guys,
Sorry to be a pain but this seems to be getting worse for us. Here are the latest tracebacks - any suggestions would be gratefully received!!
Hi David,
As the MemoryError indicates, the Galaxy process is running out of memory. debug = False is preferable, actually. I asked because having debug = True could easily result in the behavior you're seeing.
The pbs code definitely has a memory leak, I believe within libtorque or pbs_python. Because of this, I restart my job runner process when it reaches a certain amount of memory usage. However, this may not be the cause of your errors. To figure it out, we'll need to know exactly which thread is consuming the memory. You may want to enable the heartbeat log and look there to see which threads are active.
The question about the path was in reference to whether these errors occur immediately upon running a tophat job, without any interaction, or if they occur when you try to click to view the job's output, or on some other part of the Galaxy interface.
Thanks,
--nate
Cheers
David
galaxy.jobs.runners.pbs ERROR 2011-12-13 19:57:57,689 Uncaught exception checking jobs
Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/lib/galaxy/jobs/runners/pbs.py", line 338, in monitor
self.check_watched_items()
File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/lib/galaxy/jobs/runners/pbs.py", line 351, in check_watched_items
( failures, statuses ) = self.check_all_jobs()
File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/lib/galaxy/jobs/runners/pbs.py", line 462, in check_all_jobs
statuses.update( self.convert_statjob_to_bunches( jobs ) )
File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/lib/galaxy/jobs/runners/pbs.py", line 476, in convert_statjob_to_bunches
statuses[ job.name ] = Bunch( **status )
MemoryError
Unhandled exception in thread started by
Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 504, in __bootstrap
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 580, in __bootstrap_inner
MemoryError
Unhandled exception in thread started by <bound method Thread.__bootstrap of <Thread(Thread-11, stopped 1111390528)>>
Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 504, in __bootstrap
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 545, in __bootstrap_inner
MemoryError
Unexpected exception in worker <function <lambda> at 0x883acf8>
Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 863, in worker_thread_callback
File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1037, in <lambda>
File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1056, in process_request_in_thread
File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1044, in handle_error
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/SocketServer.py", line 334, in handle_error
MemoryError
Unhandled exception in thread started by <bound method Thread.__bootstrap of <Thread(Thread-10, stopped 1109289280)>>
Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 504, in __bootstrap
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 545, in __bootstrap_inner
MemoryError
----------------------------------------
Exception happened during processing of request from ('xxx.xxx.xxx.xxx', 44389)
Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1053, in process_request_in_thread
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/SocketServer.py", line 322, in finish_request
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/SocketServer.py", line 616, in __init__
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/SocketServer.py", line 657, in setup
MemoryError
----------------------------------------
----------------------------------------
Exception happened during processing of request from ('xxx.xxx.xx.xx', 60069)
Unexpected exception in worker <function <lambda> at 0x883a2a8>Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1053, in process_request_in_thread
Unhandled exception in thread started by <bound method Thread.__bootstrap of <Thread(worker 9, stopped 1130301760)>>
Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 504, in __bootstrap
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 545, in __bootstrap_inner
MemoryError File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/SocketServer.py", line 322, in finish_request
Unexpected exception in worker <function <lambda> at 0x8721410>
Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 863, in worker_thread_callback
Unhandled exception in thread started by <bound method Thread.__bootstrap of <Thread(worker 0, stopped 1086265664)>>
Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 504, in __bootstrap
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 545, in __bootstrap_inner
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 242, in format_exc
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 142, in format_exception
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 76, in format_tb
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 101, in extract_tb
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 14, in getline
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 40, in getlines
MemoryError
----------------------------------------
Exception happened during processing of request from ('xxx.xxx.xx.xx', 60071)
Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1053, in process_request_in_thread
Unexpected exception in worker <function <lambda> at 0x8721410>
Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 863, in worker_thread_callback
Unhandled exception in thread started by <bound method Thread.__bootstrap of <Thread(worker 6, stopped 1123998016)>>
Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 504, in __bootstrap
self.__bootstrap_inner()
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 545, in __bootstrap_inner
(self.name, _format_exc()))
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 242, in format_exc
return ''.join(format_exception(etype, value, tb, limit))
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 142, in format_exception
list = list + format_tb(tb, limit)
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 76, in format_tb
return format_list(extract_tb(tb, limit))
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 101, in extract_tb
line = linecache.getline(filename, lineno, f.f_globals)
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 14, in getline
lines = getlines(filename, module_globals)
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 40, in getlines
return updatecache(filename, module_globals)
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 131, in updatecache
lines = fp.readlines()
MemoryError
----------------------------------------
Exception happened during processing of request from ('xxx.xxx.xxx.xxx', 44416)
Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1053, in process_request_in_thread
Unexpected exception in worker <function <lambda> at 0x8721410>
Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 863, in worker_thread_callback
Unhandled exception in thread started by <bound method Thread.__bootstrap of <Thread(worker 7, stopped 1126099264)>>
Traceback (most recent call last):
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 504, in __bootstrap
self.__bootstrap_inner()
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 545, in __bootstrap_inner
(self.name, _format_exc()))
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 242, in format_exc
return ''.join(format_exception(etype, value, tb, limit))
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 142, in format_exception
list = list + format_tb(tb, limit)
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 76, in format_tb
return format_list(extract_tb(tb, limit))
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 101, in extract_tb
line = linecache.getline(filename, lineno, f.f_globals)
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 14, in getline
lines = getlines(filename, module_globals)
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 40, in getlines
return updatecache(filename, module_globals)
File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 131, in updatecache
lines = fp.readlines()
MemoryError
----------------------------------------
--
-----------------------------------------------------------
Callum Wright
HPC Systems Administrator
High Performance Computing
University of Bristol
Phone: 0117 331 4429
email: c.wright@bristol.ac.uk
web: www.acrc.bristol.ac.uk
-----------------------------------------------------------