Pulsar - running as real DRMAA user problems
Hey John, So I’ve been happily using Pulsar to send all my Galaxy server jobs to our cluster here at UCL for several months now (I love it!). I am now exploring the ‘run-as-real-user’ option for DRMAA submissions and have run into a problem. The files are correctly staged, correctly chowned, successfully submitted to the queue and the job runs. However, at job end (collection?) fails with the following error message in Pulsar: Exception happened during processing of request from (‘*.*.*.*', 54321) Traceback (most recent call last): File "/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py", line 1072, in process_request_in_thread self.finish_request(request, client_address) File "/opt/rocks/lib/python2.6/SocketServer.py", line 322, in finish_request self.RequestHandlerClass(request, client_address, self) File "/opt/rocks/lib/python2.6/SocketServer.py", line 617, in __init__ self.handle() File "/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py", line 446, in handle BaseHTTPRequestHandler.handle(self) File "/opt/rocks/lib/python2.6/BaseHTTPServer.py", line 329, in handle self.handle_one_request() File "/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py", line 441, in handle_one_request self.wsgi_execute() File "/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py", line 291, in wsgi_execute self.wsgi_start_response) File "/cluster/galaxy/pulsar/pulsar/web/framework.py", line 39, in __call__ return controller(environ, start_response, **request_args) File "/cluster/galaxy/pulsar/pulsar/web/framework.py", line 144, in controller_replacement result = self.__execute_request(func, args, req, environ) File "/cluster/galaxy/pulsar/pulsar/web/framework.py", line 124, in __execute_request result = func(**args) File "/cluster/galaxy/pulsar/pulsar/web/routes.py", line 82, in status return status_dict(manager, job_id) File "/cluster/galaxy/pulsar/pulsar/manager_endpoint_util.py", line 12, in status_dict job_status = manager.get_status(job_id) File "/cluster/galaxy/pulsar/pulsar/managers/stateful.py", line 95, in get_status proxy_status, state_change = self.__proxy_status(job_directory, job_id) File "/cluster/galaxy/pulsar/pulsar/managers/stateful.py", line 115, in __proxy_status proxy_status = self._proxied_manager.get_status(job_id) File "/cluster/galaxy/pulsar/pulsar/managers/queued_external_drmaa_original.py", line 62, in get_status external_status = super(ExternalDrmaaQueueManager, self)._get_status_external(external_id) File "/cluster/galaxy/pulsar/pulsar/managers/base/base_drmaa.py", line 31, in _get_status_external drmaa_state = self.drmaa_session.job_status(external_id) File "/cluster/galaxy/pulsar/pulsar/managers/util/drmaa/__init__.py", line 50, in job_status return self.session.jobStatus(str(external_job_id)) File "build/bdist.linux-x86_64/egg/drmaa/session.py", line 518, in jobStatus c(drmaa_job_ps, jobId, byref(status)) File "build/bdist.linux-x86_64/egg/drmaa/helpers.py", line 299, in c return f(*(args + (error_buffer, sizeof(error_buffer)))) File "build/bdist.linux-x86_64/egg/drmaa/errors.py", line 151, in error_check raise _ERRORS[code - 1](error_string) InvalidJobException: code 18: The job specified by the 'jobid' does not exist. With this corresponding error from my Galaxy server: galaxy.tools.actions INFO 2016-10-13 18:47:51,851 Handled output (279.421 ms) galaxy.tools.actions INFO 2016-10-13 18:47:52,093 Verified access to datasets (5.271 ms) galaxy.tools.execute DEBUG 2016-10-13 18:47:52,118 Tool [toolshed.g2.bx.psu.edu/repos/devteam/sam_to_bam/sam_to_bam/1.1.4] created job [25008<http://toolshed.g2.bx.psu.edu/repos/devteam/sam_to_bam/sam_to_bam/1.1.4]%20created%20job%20[25008>] (560.404 ms) galaxy.jobs DEBUG 2016-10-13 18:47:52,579 (25008) Working directory for job is: /Users/galaxy/galaxy-dist/database/job_working_directory/025/25008 galaxy.jobs.handler DEBUG 2016-10-13 18:47:52,591 (25008) Dispatching to pulsar runner galaxy.jobs DEBUG 2016-10-13 18:47:52,677 (25008) Persisting job destination (destination id: hpc_low) galaxy.jobs.runners DEBUG 2016-10-13 18:47:52,681 Job [25008] queued (90.231 ms) galaxy.jobs.handler INFO 2016-10-13 18:47:52,699 (25008) Job dispatched galaxy.tools.deps DEBUG 2016-10-13 18:47:53,138 Building dependency shell command for dependency 'samtools' galaxy.jobs.runners.pulsar INFO 2016-10-13 18:47:53,233 Pulsar job submitted with job_id 25008 galaxy.jobs DEBUG 2016-10-13 18:47:53,257 (25008) Persisting job destination (destination id: hpc_low) galaxy.datatypes.metadata DEBUG 2016-10-13 18:51:03,922 Cleaning up external metadata files galaxy.jobs.runners.pulsar ERROR 2016-10-13 18:51:03,945 failure finishing job 25008 Traceback (most recent call last): File "/Users/galaxy/galaxy-dist/lib/galaxy/jobs/runners/pulsar.py", line 386, in finish_job run_results = client.full_status() File "/Users/galaxy/galaxy-dist/lib/pulsar/client/client.py", line 132, in full_status return self.raw_check_complete() File "/Users/galaxy/galaxy-dist/lib/pulsar/client/decorators.py", line 28, in replacement return func(*args, **kwargs) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/decorators.py", line 13, in replacement response = func(*args, **kwargs) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/client.py", line 146, in raw_check_complete check_complete_response = self._raw_execute("status", {"job_id": self.job_id}) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/client.py", line 215, in _raw_execute return self.job_manager_interface.execute(command, args, data, input_path, output_path) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/interface.py", line 96, in execute response = self.transport.execute(url, method=method, data=data, input_path=input_path, output_path=output_path) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/transport/standard.py", line 34, in execute response = self._url_open(request, data) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/transport/standard.py", line 20, in _url_open return urlopen(request, data) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 437, in open response = meth(req, response) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 550, in http_response 'http', request, response, code, msg, hdrs) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 475, in error return self._call_chain(*args) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain result = func(*args) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 558, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 500: Internal Server Error I am running 15.10 and Python 2.7.10 on my iMac for the server and the cluster submission node is running Pulsar 0.5.0 and Python 2.7.12 For these tests I run Pulsar in an interactive window so I have not set the sudoers file up, but rather enter sudo password when requested by Pulsar (at the first step of chowning the staging directory). Also have rewrites set up in Galaxy’s pulsar_actions.yml and I am using remote_scp for the file transfers rather than http - although I have also tried switching back to http (as I noticed caching, which I am also testing, does not work with scp transfers) but get an identical set of error messages. As I say, I have no troubles using a regular queued_drmaa manager in pulsar. Any ideas what the problem may be? Cheers, Rich Richard J Poole PhD Wellcome Trust Fellow Department of Cell and Developmental Biology University College London 518 Rockefeller 21 University Street, London WC1E 6DE Office (518 Rockefeller): +44 20 7679 6577 (int. 46577) Lab (529 Rockefeller): +44 20 7679 6133 (int. 46133) https://www.ucl.ac.uk/cdb/academics/poole
Glad this almost worked - I'm not sure what the problem is. I'd open the file /cluster/galaxy/pulsar/pulsar/managers/util/drmaa/__init__.py - and add some logging right before this line - (return self.session.jobStatus(str(external_job_id))). log.info("Fetching job status for %s" % external_job_id) or something like that. See if the ID matches something that was in your queuing software. It might have some extra prefix or something that we can strip off. It would be also interesting to try Pulsar 0.7.3 against Galaxy 16.07 - this may be caused by a problem that has been fixed. -John On Thu, Oct 13, 2016 at 2:06 PM, Poole, Richard <r.poole@ucl.ac.uk> wrote:
Hey John,
So I’ve been happily using Pulsar to send all my Galaxy server jobs to our cluster here at UCL for several months now (I love it!). I am now exploring the ‘run-as-real-user’ option for DRMAA submissions and have run into a problem. The files are correctly staged, correctly chowned, successfully submitted to the queue and the job runs. However, at job end (collection?) fails with the following error message in Pulsar:
Exception happened during processing of request from (‘*.*.*.*', 54321) Traceback (most recent call last): File "/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py", line 1072, in process_request_in_thread self.finish_request(request, client_address) File "/opt/rocks/lib/python2.6/SocketServer.py", line 322, in finish_request self.RequestHandlerClass(request, client_address, self) File "/opt/rocks/lib/python2.6/SocketServer.py", line 617, in __init__ self.handle() File "/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py", line 446, in handle BaseHTTPRequestHandler.handle(self) File "/opt/rocks/lib/python2.6/BaseHTTPServer.py", line 329, in handle self.handle_one_request() File "/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py", line 441, in handle_one_request self.wsgi_execute() File "/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py", line 291, in wsgi_execute self.wsgi_start_response) File "/cluster/galaxy/pulsar/pulsar/web/framework.py", line 39, in __call__ return controller(environ, start_response, **request_args) File "/cluster/galaxy/pulsar/pulsar/web/framework.py", line 144, in controller_replacement result = self.__execute_request(func, args, req, environ) File "/cluster/galaxy/pulsar/pulsar/web/framework.py", line 124, in __execute_request result = func(**args) File "/cluster/galaxy/pulsar/pulsar/web/routes.py", line 82, in status return status_dict(manager, job_id) File "/cluster/galaxy/pulsar/pulsar/manager_endpoint_util.py", line 12, in status_dict job_status = manager.get_status(job_id) File "/cluster/galaxy/pulsar/pulsar/managers/stateful.py", line 95, in get_status proxy_status, state_change = self.__proxy_status(job_directory, job_id) File "/cluster/galaxy/pulsar/pulsar/managers/stateful.py", line 115, in __proxy_status proxy_status = self._proxied_manager.get_status(job_id) File "/cluster/galaxy/pulsar/pulsar/managers/queued_external_drmaa_original.py", line 62, in get_status external_status = super(ExternalDrmaaQueueManager, self)._get_status_external(external_id) File "/cluster/galaxy/pulsar/pulsar/managers/base/base_drmaa.py", line 31, in _get_status_external drmaa_state = self.drmaa_session.job_status(external_id) File "/cluster/galaxy/pulsar/pulsar/managers/util/drmaa/__init__.py", line 50, in job_status return self.session.jobStatus(str(external_job_id)) File "build/bdist.linux-x86_64/egg/drmaa/session.py", line 518, in jobStatus c(drmaa_job_ps, jobId, byref(status)) File "build/bdist.linux-x86_64/egg/drmaa/helpers.py", line 299, in c return f(*(args + (error_buffer, sizeof(error_buffer)))) File "build/bdist.linux-x86_64/egg/drmaa/errors.py", line 151, in error_check raise _ERRORS[code - 1](error_string) InvalidJobException: code 18: The job specified by the 'jobid' does not exist.
With this corresponding error from my Galaxy server:
galaxy.tools.actions INFO 2016-10-13 18:47:51,851 Handled output (279.421 ms) galaxy.tools.actions INFO 2016-10-13 18:47:52,093 Verified access to datasets (5.271 ms) galaxy.tools.execute DEBUG 2016-10-13 18:47:52,118 Tool [toolshed.g2.bx.psu.edu/repos/devteam/sam_to_bam/sam_to_bam/1.1.4] created job [25008] (560.404 ms) galaxy.jobs DEBUG 2016-10-13 18:47:52,579 (25008) Working directory for job is: /Users/galaxy/galaxy-dist/database/job_working_directory/025/25008 galaxy.jobs.handler DEBUG 2016-10-13 18:47:52,591 (25008) Dispatching to pulsar runner galaxy.jobs DEBUG 2016-10-13 18:47:52,677 (25008) Persisting job destination (destination id: hpc_low) galaxy.jobs.runners DEBUG 2016-10-13 18:47:52,681 Job [25008] queued (90.231 ms) galaxy.jobs.handler INFO 2016-10-13 18:47:52,699 (25008) Job dispatched galaxy.tools.deps DEBUG 2016-10-13 18:47:53,138 Building dependency shell command for dependency 'samtools' galaxy.jobs.runners.pulsar INFO 2016-10-13 18:47:53,233 Pulsar job submitted with job_id 25008 galaxy.jobs DEBUG 2016-10-13 18:47:53,257 (25008) Persisting job destination (destination id: hpc_low) galaxy.datatypes.metadata DEBUG 2016-10-13 18:51:03,922 Cleaning up external metadata files galaxy.jobs.runners.pulsar ERROR 2016-10-13 18:51:03,945 failure finishing job 25008 Traceback (most recent call last): File "/Users/galaxy/galaxy-dist/lib/galaxy/jobs/runners/pulsar.py", line 386, in finish_job run_results = client.full_status() File "/Users/galaxy/galaxy-dist/lib/pulsar/client/client.py", line 132, in full_status return self.raw_check_complete() File "/Users/galaxy/galaxy-dist/lib/pulsar/client/decorators.py", line 28, in replacement return func(*args, **kwargs) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/decorators.py", line 13, in replacement response = func(*args, **kwargs) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/client.py", line 146, in raw_check_complete check_complete_response = self._raw_execute("status", {"job_id": self.job_id}) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/client.py", line 215, in _raw_execute return self.job_manager_interface.execute(command, args, data, input_path, output_path) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/interface.py", line 96, in execute response = self.transport.execute(url, method=method, data=data, input_path=input_path, output_path=output_path) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/transport/standard.py", line 34, in execute response = self._url_open(request, data) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/transport/standard.py", line 20, in _url_open return urlopen(request, data) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 437, in open response = meth(req, response) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 550, in http_response 'http', request, response, code, msg, hdrs) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 475, in error return self._call_chain(*args) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain result = func(*args) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 558, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 500: Internal Server Error
I am running 15.10 and Python 2.7.10 on my iMac for the server and the cluster submission node is running Pulsar 0.5.0 and Python 2.7.12
For these tests I run Pulsar in an interactive window so I have not set the sudoers file up, but rather enter sudo password when requested by Pulsar (at the first step of chowning the staging directory). Also have rewrites set up in Galaxy’s pulsar_actions.yml and I am using remote_scp for the file transfers rather than http - although I have also tried switching back to http (as I noticed caching, which I am also testing, does not work with scp transfers) but get an identical set of error messages.
As I say, I have no troubles using a regular queued_drmaa manager in pulsar. Any ideas what the problem may be?
Cheers, Rich
Richard J Poole PhD Wellcome Trust Fellow Department of Cell and Developmental Biology University College London 518 Rockefeller 21 University Street, London WC1E 6DE Office (518 Rockefeller): +44 20 7679 6577 (int. 46577) Lab (529 Rockefeller): +44 20 7679 6133 (int. 46133) https://www.ucl.ac.uk/cdb/academics/poole
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi John, Thanks for the reply, I will add the logging as soon as I can and let you know what I find. I have tried updating pulsar to the latest release 0.8.0 and same error - so lets see what the logging suggests. It does strike me as potentially some sort of python issue - so your right, an upgrade of Galaxy itself to v16 may be in order……..I am just hesitating as my Galaxy server is quite ‘personalised’ and updating to v16 may require a lot of other changes. Cheers and get back to you on this one again asap, Rich
On 31 Oct 2016, at 20:24, John Chilton <jmchilton@gmail.com> wrote:
Glad this almost worked - I'm not sure what the problem is. I'd open the file /cluster/galaxy/pulsar/pulsar/managers/util/drmaa/__init__.py - and add some logging right before this line - (return self.session.jobStatus(str(external_job_id))).
log.info("Fetching job status for %s" % external_job_id)
or something like that. See if the ID matches something that was in your queuing software. It might have some extra prefix or something that we can strip off.
It would be also interesting to try Pulsar 0.7.3 against Galaxy 16.07 - this may be caused by a problem that has been fixed.
-John
On Thu, Oct 13, 2016 at 2:06 PM, Poole, Richard <r.poole@ucl.ac.uk> wrote:
Hey John,
So I’ve been happily using Pulsar to send all my Galaxy server jobs to our cluster here at UCL for several months now (I love it!). I am now exploring the ‘run-as-real-user’ option for DRMAA submissions and have run into a problem. The files are correctly staged, correctly chowned, successfully submitted to the queue and the job runs. However, at job end (collection?) fails with the following error message in Pulsar:
Exception happened during processing of request from (‘*.*.*.*', 54321) Traceback (most recent call last): File "/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py", line 1072, in process_request_in_thread self.finish_request(request, client_address) File "/opt/rocks/lib/python2.6/SocketServer.py", line 322, in finish_request self.RequestHandlerClass(request, client_address, self) File "/opt/rocks/lib/python2.6/SocketServer.py", line 617, in __init__ self.handle() File "/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py", line 446, in handle BaseHTTPRequestHandler.handle(self) File "/opt/rocks/lib/python2.6/BaseHTTPServer.py", line 329, in handle self.handle_one_request() File "/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py", line 441, in handle_one_request self.wsgi_execute() File "/opt/rocks/lib/python2.6/site-packages/Paste-2.0.1-py2.6.egg/paste/httpserver.py", line 291, in wsgi_execute self.wsgi_start_response) File "/cluster/galaxy/pulsar/pulsar/web/framework.py", line 39, in __call__ return controller(environ, start_response, **request_args) File "/cluster/galaxy/pulsar/pulsar/web/framework.py", line 144, in controller_replacement result = self.__execute_request(func, args, req, environ) File "/cluster/galaxy/pulsar/pulsar/web/framework.py", line 124, in __execute_request result = func(**args) File "/cluster/galaxy/pulsar/pulsar/web/routes.py", line 82, in status return status_dict(manager, job_id) File "/cluster/galaxy/pulsar/pulsar/manager_endpoint_util.py", line 12, in status_dict job_status = manager.get_status(job_id) File "/cluster/galaxy/pulsar/pulsar/managers/stateful.py", line 95, in get_status proxy_status, state_change = self.__proxy_status(job_directory, job_id) File "/cluster/galaxy/pulsar/pulsar/managers/stateful.py", line 115, in __proxy_status proxy_status = self._proxied_manager.get_status(job_id) File "/cluster/galaxy/pulsar/pulsar/managers/queued_external_drmaa_original.py", line 62, in get_status external_status = super(ExternalDrmaaQueueManager, self)._get_status_external(external_id) File "/cluster/galaxy/pulsar/pulsar/managers/base/base_drmaa.py", line 31, in _get_status_external drmaa_state = self.drmaa_session.job_status(external_id) File "/cluster/galaxy/pulsar/pulsar/managers/util/drmaa/__init__.py", line 50, in job_status return self.session.jobStatus(str(external_job_id)) File "build/bdist.linux-x86_64/egg/drmaa/session.py", line 518, in jobStatus c(drmaa_job_ps, jobId, byref(status)) File "build/bdist.linux-x86_64/egg/drmaa/helpers.py", line 299, in c return f(*(args + (error_buffer, sizeof(error_buffer)))) File "build/bdist.linux-x86_64/egg/drmaa/errors.py", line 151, in error_check raise _ERRORS[code - 1](error_string) InvalidJobException: code 18: The job specified by the 'jobid' does not exist.
With this corresponding error from my Galaxy server:
galaxy.tools.actions INFO 2016-10-13 18:47:51,851 Handled output (279.421 ms) galaxy.tools.actions INFO 2016-10-13 18:47:52,093 Verified access to datasets (5.271 ms) galaxy.tools.execute DEBUG 2016-10-13 18:47:52,118 Tool [toolshed.g2.bx.psu.edu/repos/devteam/sam_to_bam/sam_to_bam/1.1.4] created job [25008] (560.404 ms) galaxy.jobs DEBUG 2016-10-13 18:47:52,579 (25008) Working directory for job is: /Users/galaxy/galaxy-dist/database/job_working_directory/025/25008 galaxy.jobs.handler DEBUG 2016-10-13 18:47:52,591 (25008) Dispatching to pulsar runner galaxy.jobs DEBUG 2016-10-13 18:47:52,677 (25008) Persisting job destination (destination id: hpc_low) galaxy.jobs.runners DEBUG 2016-10-13 18:47:52,681 Job [25008] queued (90.231 ms) galaxy.jobs.handler INFO 2016-10-13 18:47:52,699 (25008) Job dispatched galaxy.tools.deps DEBUG 2016-10-13 18:47:53,138 Building dependency shell command for dependency 'samtools' galaxy.jobs.runners.pulsar INFO 2016-10-13 18:47:53,233 Pulsar job submitted with job_id 25008 galaxy.jobs DEBUG 2016-10-13 18:47:53,257 (25008) Persisting job destination (destination id: hpc_low) galaxy.datatypes.metadata DEBUG 2016-10-13 18:51:03,922 Cleaning up external metadata files galaxy.jobs.runners.pulsar ERROR 2016-10-13 18:51:03,945 failure finishing job 25008 Traceback (most recent call last): File "/Users/galaxy/galaxy-dist/lib/galaxy/jobs/runners/pulsar.py", line 386, in finish_job run_results = client.full_status() File "/Users/galaxy/galaxy-dist/lib/pulsar/client/client.py", line 132, in full_status return self.raw_check_complete() File "/Users/galaxy/galaxy-dist/lib/pulsar/client/decorators.py", line 28, in replacement return func(*args, **kwargs) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/decorators.py", line 13, in replacement response = func(*args, **kwargs) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/client.py", line 146, in raw_check_complete check_complete_response = self._raw_execute("status", {"job_id": self.job_id}) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/client.py", line 215, in _raw_execute return self.job_manager_interface.execute(command, args, data, input_path, output_path) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/interface.py", line 96, in execute response = self.transport.execute(url, method=method, data=data, input_path=input_path, output_path=output_path) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/transport/standard.py", line 34, in execute response = self._url_open(request, data) File "/Users/galaxy/galaxy-dist/lib/pulsar/client/transport/standard.py", line 20, in _url_open return urlopen(request, data) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 437, in open response = meth(req, response) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 550, in http_response 'http', request, response, code, msg, hdrs) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 475, in error return self._call_chain(*args) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain result = func(*args) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 558, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 500: Internal Server Error
I am running 15.10 and Python 2.7.10 on my iMac for the server and the cluster submission node is running Pulsar 0.5.0 and Python 2.7.12
For these tests I run Pulsar in an interactive window so I have not set the sudoers file up, but rather enter sudo password when requested by Pulsar (at the first step of chowning the staging directory). Also have rewrites set up in Galaxy’s pulsar_actions.yml and I am using remote_scp for the file transfers rather than http - although I have also tried switching back to http (as I noticed caching, which I am also testing, does not work with scp transfers) but get an identical set of error messages.
As I say, I have no troubles using a regular queued_drmaa manager in pulsar. Any ideas what the problem may be?
Cheers, Rich
Richard J Poole PhD Wellcome Trust Fellow Department of Cell and Developmental Biology University College London 518 Rockefeller 21 University Street, London WC1E 6DE Office (518 Rockefeller): +44 20 7679 6577 (int. 46577) Lab (529 Rockefeller): +44 20 7679 6133 (int. 46133) https://www.ucl.ac.uk/cdb/academics/poole
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi John and all, We are running an old version of Galaxy v15.10 and use Pulsar to stage certain jobs to our hpc cluster. This works great and we love it. However, we have noticed that for certain tools which require metadata to be staged, although the metadata file is actually staged, the Galaxy pulsar runner fails to rewrite the file path for the metadata correctly. Here is an example command from the latest Freebayes wrappers: ln -s -f '/cluster/galaxy/pulsar/files/staging/32728/inputs/dataset_54494.dat' 'b_0.bam' && ln -s -f '/Volumes/ngs/database/files/_metadata_files/002/metadata_2754.dat' 'b_0.bam.bai' && ln -s -f '/cluster/galaxy/pulsar/files/staging/32728/inputs/dataset_54496.dat' 'b_1.bam' && ln -s -f '/Volumes/ngs/database/files/_metadata_files/002/metadata_2755.dat' 'b_1.bam.bai' && samtools view -H b_0.bam | grep "^@SQ" | cut -f 2- | awk '{ gsub("^SN:","",$1); gsub("^LN:","",$2); print $1"\t0\t"$2; }' >> regions_all.bed && samtools view -H b_1.bam | grep "^@SQ" | cut -f 2- | awk '{ gsub("^SN:","",$1); gsub("^LN:","",$2); print $1"\t0\t"$2; }' >> regions_all.bed && sort -u regions_all.bed > regions_uniq.bed && mkdir vcf_output && mkdir failed_alleles && mkdir trace && for i in `cat regions_uniq.bed | awk '{print $1":"$2".."$3}'`; do echo " freebayes --region '$i' --bam 'b_0.bam' --bam 'b_1.bam' --fasta-reference '/cluster/galaxy/indexes/danRer7/sam_index/danRer7.fa' --vcf './vcf_output/part_$i.vcf' --standard-filters --min-coverage '3' "; done > freebayes_commands.sh && cat freebayes_commands.sh | parallel --no-notice -j ${GALAXY_SLOTS:-1} && grep "^#" "./vcf_output/part_$i.vcf" > header.txt && for i in `cat regions_uniq.bed | awk '{print $1":"$2".."$3}'`; do cat "./vcf_output/part_$i.vcf" | grep -v "^#" || true ; done | sort -k1,1 -k2,2n -k5,5 -u | cat header.txt - > '/cluster/galaxy/pulsar/files/staging/32728/outputs/dataset_54897.dat’ As you can see, the file path for the BAM file is correctly re-written but not the file path for the bam.bai index, which still has a file path local to Galaxy. Has anyone come across this problem before and is there a fix? We are very hesitant to upgrade our Galaxy server as everything else is working perfectly for us right now. Thanks, Richard
Apologies - forgot to change the old subject line - will remail now…… On 13 May 2017, at 12:54, Poole, Richard <r.poole@ucl.ac.uk<mailto:r.poole@ucl.ac.uk>> wrote: Hi John and all, We are running an old version of Galaxy v15.10 and use Pulsar to stage certain jobs to our hpc cluster. This works great and we love it. However, we have noticed that for certain tools which require metadata to be staged, although the metadata file is actually staged, the Galaxy pulsar runner fails to rewrite the file path for the metadata correctly. Here is an example command from the latest Freebayes wrappers: ln -s -f '/cluster/galaxy/pulsar/files/staging/32728/inputs/dataset_54494.dat' 'b_0.bam' && ln -s -f '/Volumes/ngs/database/files/_metadata_files/002/metadata_2754.dat' 'b_0.bam.bai' && ln -s -f '/cluster/galaxy/pulsar/files/staging/32728/inputs/dataset_54496.dat' 'b_1.bam' && ln -s -f '/Volumes/ngs/database/files/_metadata_files/002/metadata_2755.dat' 'b_1.bam.bai' && samtools view -H b_0.bam | grep "^@SQ" | cut -f 2- | awk '{ gsub("^SN:","",$1); gsub("^LN:","",$2); print $1"\t0\t"$2; }' >> regions_all.bed && samtools view -H b_1.bam | grep "^@SQ" | cut -f 2- | awk '{ gsub("^SN:","",$1); gsub("^LN:","",$2); print $1"\t0\t"$2; }' >> regions_all.bed && sort -u regions_all.bed > regions_uniq.bed && mkdir vcf_output && mkdir failed_alleles && mkdir trace && for i in `cat regions_uniq.bed | awk '{print $1":"$2".."$3}'`; do echo " freebayes --region '$i' --bam 'b_0.bam' --bam 'b_1.bam' --fasta-reference '/cluster/galaxy/indexes/danRer7/sam_index/danRer7.fa' --vcf './vcf_output/part_$i.vcf' --standard-filters --min-coverage '3' "; done > freebayes_commands.sh && cat freebayes_commands.sh | parallel --no-notice -j ${GALAXY_SLOTS:-1} && grep "^#" "./vcf_output/part_$i.vcf" > header.txt && for i in `cat regions_uniq.bed | awk '{print $1":"$2".."$3}'`; do cat "./vcf_output/part_$i.vcf" | grep -v "^#" || true ; done | sort -k1,1 -k2,2n -k5,5 -u | cat header.txt - > '/cluster/galaxy/pulsar/files/staging/32728/outputs/dataset_54897.dat’ As you can see, the file path for the BAM file is correctly re-written but not the file path for the bam.bai index, which still has a file path local to Galaxy. Has anyone come across this problem before and is there a fix? We are very hesitant to upgrade our Galaxy server as everything else is working perfectly for us right now. Thanks, Richard
Hi John and all, We are running an old version of Galaxy v15.10 and use Pulsar to stage certain jobs to our hpc cluster. This works great and we love it. However, we have noticed that for certain tools which require metadata to be staged, although the metadata file is actually staged, the Galaxy pulsar runner fails to rewrite the file path for the metadata correctly. Here is an example command from the latest Freebayes wrappers: ln -s -f '/cluster/galaxy/pulsar/files/staging/32728/inputs/dataset_54494.dat' 'b_0.bam' && ln -s -f '/Volumes/ngs/database/files/_metadata_files/002/metadata_2754.dat' 'b_0.bam.bai' && ln -s -f '/cluster/galaxy/pulsar/files/staging/32728/inputs/dataset_54496.dat' 'b_1.bam' && ln -s -f '/Volumes/ngs/database/files/_metadata_files/002/metadata_2755.dat' 'b_1.bam.bai' && samtools view -H b_0.bam | grep "^@SQ" | cut -f 2- | awk '{ gsub("^SN:","",$1); gsub("^LN:","",$2); print $1"\t0\t"$2; }' >> regions_all.bed && samtools view -H b_1.bam | grep "^@SQ" | cut -f 2- | awk '{ gsub("^SN:","",$1); gsub("^LN:","",$2); print $1"\t0\t"$2; }' >> regions_all.bed && sort -u regions_all.bed > regions_uniq.bed && mkdir vcf_output && mkdir failed_alleles && mkdir trace && for i in `cat regions_uniq.bed | awk '{print $1":"$2".."$3}'`; do echo " freebayes --region '$i' --bam 'b_0.bam' --bam 'b_1.bam' --fasta-reference '/cluster/galaxy/indexes/danRer7/sam_index/danRer7.fa' --vcf './vcf_output/part_$i.vcf' --standard-filters --min-coverage '3' "; done > freebayes_commands.sh && cat freebayes_commands.sh | parallel --no-notice -j ${GALAXY_SLOTS:-1} && grep "^#" "./vcf_output/part_$i.vcf" > header.txt && for i in `cat regions_uniq.bed | awk '{print $1":"$2".."$3}'`; do cat "./vcf_output/part_$i.vcf" | grep -v "^#" || true ; done | sort -k1,1 -k2,2n -k5,5 -u | cat header.txt - > '/cluster/galaxy/pulsar/files/staging/32728/outputs/dataset_54897.dat’ As you can see, the file path for the BAM file is correctly re-written but not the file path for the bam.bai index, which still has a file path local to Galaxy. Has anyone come across this problem before and is there a fix? We are very hesitant to upgrade our Galaxy server as everything else is working perfectly for us right now. Thanks, Richard
participants (2)
-
John Chilton
-
Poole, Richard