Hello,
We almost succeed in using drmaa-0.4b3 with galaxy and pbs pro : The job launched from galaxy runs on our cluster but when job status changes to finished, there's an error in drmaa python egg.
Here is the server log :
galaxy.jobs.runners.drmaa ERROR 2010-11-26 17:11:10,857 (21/516559.service0.ice.ifremer.fr) Unable to check job status Traceback (most recent call last): File "/home12/caparmor/bioinfo/galaxy_dist/lib/galaxy/jobs/runners/drmaa.py", line 252, in check_watched_items state = self.ds.jobStatus( job_id ) File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/__init__.py", line 522, in jobStatus File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/helpers.py", line 213, in c return f(*(args + (error_buffer, sizeof(error_buffer)))) File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/errors.py", line 90, in error_check raise _ERRORS[code-1]("code %s: %s" % (code, error_buffer.value)) InternalException: code 1: pbs_statjob: Job %s has finished galaxy.jobs.runners.drmaa WARNING 2010-11-26 17:11:10,861 (21/516559.service0.ice.ifremer.fr) job will now be errored galaxy.jobs.runners.drmaa DEBUG 2010-11-26 17:11:10,986 (21/516559.service0.ice.ifremer.fr) User killed running job, but error encountered removing from DRM queue: code 1: pbs_deljob: Job %s has finished
Any idea ?
Thanks a lot
Laure
Laure QUINTRIC wrote:
Hello,
We almost succeed in using drmaa-0.4b3 with galaxy and pbs pro : The job launched from galaxy runs on our cluster but when job status changes to finished, there's an error in drmaa python egg.
Here is the server log :
galaxy.jobs.runners.drmaa ERROR 2010-11-26 17:11:10,857 (21/516559.service0.ice.ifremer.fr) Unable to check job status Traceback (most recent call last): File "/home12/caparmor/bioinfo/galaxy_dist/lib/galaxy/jobs/runners/drmaa.py", line 252, in check_watched_items state = self.ds.jobStatus( job_id ) File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/__init__.py", line 522, in jobStatus File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/helpers.py", line 213, in c return f(*(args + (error_buffer, sizeof(error_buffer)))) File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/errors.py", line 90, in error_check raise _ERRORS[code-1]("code %s: %s" % (code, error_buffer.value)) InternalException: code 1: pbs_statjob: Job %s has finished galaxy.jobs.runners.drmaa WARNING 2010-11-26 17:11:10,861 (21/516559.service0.ice.ifremer.fr) job will now be errored galaxy.jobs.runners.drmaa DEBUG 2010-11-26 17:11:10,986 (21/516559.service0.ice.ifremer.fr) User killed running job, but error encountered removing from DRM queue: code 1: pbs_deljob: Job %s has finished
Any idea ?
The job has successfully completed, but it's being treated as an error by the drmaa library.
I can't really test this so there's no way for me to know whether any other calls to drmaa.Session().jobStatus() can possibly raise an InternalException. The following will get the job completion to succeed, but it could cause other failed jobs to not be marked as failed.
In lib/galaxy/jobs/runners/drmaa.py, locate:
except drmaa.InvalidJobException:
and change it to:
except ( drmaa.InvalidJobException, drmaa.InternalException ):
If you wanted to keep an eye on the value of the error in the log, you could do the following instead:
except ( drmaa.InvalidJobException, drmaa.InternalException ), e: log.debug( "(%s/%s) job left DRM queue with following message: %s" % ( galaxy_job_id, job_id, e ) )
Please do let us know if you get it working. There are quite a few people hoping to get Galaxy working on PBS Pro.
--nate
Thanks a lot
Laure
galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
Hello,
Changing the exception management was the right thing to do !! Actually, as the job status on the cluster is Finished without error, even if drmaa.py cannot get the status finished (I think it's because of the limitations of drmaa library), galaxy considers the job as finish now and not as failed as it was before, so I can get my result inside galaxy web interface.
Maybe this should be integrated in Galaxy for next release ?
Thanks
Laure
On 30/11/2010 22:07, Nate Coraor wrote:
Laure QUINTRIC wrote:
Hello,
We almost succeed in using drmaa-0.4b3 with galaxy and pbs pro : The job launched from galaxy runs on our cluster but when job status changes to finished, there's an error in drmaa python egg.
Here is the server log :
galaxy.jobs.runners.drmaa ERROR 2010-11-26 17:11:10,857 (21/516559.service0.ice.ifremer.fr) Unable to check job status Traceback (most recent call last): File "/home12/caparmor/bioinfo/galaxy_dist/lib/galaxy/jobs/runners/drmaa.py", line 252, in check_watched_items state = self.ds.jobStatus( job_id ) File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/__init__.py", line 522, in jobStatus File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/helpers.py", line 213, in c return f(*(args + (error_buffer, sizeof(error_buffer)))) File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/errors.py", line 90, in error_check raise _ERRORS[code-1]("code %s: %s" % (code, error_buffer.value)) InternalException: code 1: pbs_statjob: Job %s has finished galaxy.jobs.runners.drmaa WARNING 2010-11-26 17:11:10,861 (21/516559.service0.ice.ifremer.fr) job will now be errored galaxy.jobs.runners.drmaa DEBUG 2010-11-26 17:11:10,986 (21/516559.service0.ice.ifremer.fr) User killed running job, but error encountered removing from DRM queue: code 1: pbs_deljob: Job %s has finished
Any idea ?
The job has successfully completed, but it's being treated as an error by the drmaa library.
I can't really test this so there's no way for me to know whether any other calls to drmaa.Session().jobStatus() can possibly raise an InternalException. The following will get the job completion to succeed, but it could cause other failed jobs to not be marked as failed.
In lib/galaxy/jobs/runners/drmaa.py, locate:
except drmaa.InvalidJobException:
and change it to:
except ( drmaa.InvalidJobException, drmaa.InternalException ):
If you wanted to keep an eye on the value of the error in the log, you could do the following instead:
except ( drmaa.InvalidJobException, drmaa.InternalException ), e: log.debug( "(%s/%s) job left DRM queue with following message: %s" % ( galaxy_job_id, job_id, e ) )
Please do let us know if you get it working. There are quite a few people hoping to get Galaxy working on PBS Pro.
--nate
Thanks a lot
Laure
galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
Laure QUINTRIC wrote:
Hello,
Changing the exception management was the right thing to do !! Actually, as the job status on the cluster is Finished without error, even if drmaa.py cannot get the status finished (I think it's because of the limitations of drmaa library), galaxy considers the job as finish now and not as failed as it was before, so I can get my result inside galaxy web interface.
Maybe this should be integrated in Galaxy for next release ?
Hi Laure,
I'll make the change, but please let me know if you find that any errors or failures are being treated as completed jobs.
Thanks, --nate
Thanks
Laure
On 30/11/2010 22:07, Nate Coraor wrote:
Laure QUINTRIC wrote:
Hello,
We almost succeed in using drmaa-0.4b3 with galaxy and pbs pro : The job launched from galaxy runs on our cluster but when job status changes to finished, there's an error in drmaa python egg.
Here is the server log :
galaxy.jobs.runners.drmaa ERROR 2010-11-26 17:11:10,857 (21/516559.service0.ice.ifremer.fr) Unable to check job status Traceback (most recent call last): File "/home12/caparmor/bioinfo/galaxy_dist/lib/galaxy/jobs/runners/drmaa.py", line 252, in check_watched_items state = self.ds.jobStatus( job_id ) File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/__init__.py", line 522, in jobStatus File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/helpers.py", line 213, in c return f(*(args + (error_buffer, sizeof(error_buffer)))) File "/usr/lib/python2.5/site-packages/drmaa-0.4b3-py2.5.egg/drmaa/errors.py", line 90, in error_check raise _ERRORS[code-1]("code %s: %s" % (code, error_buffer.value)) InternalException: code 1: pbs_statjob: Job %s has finished galaxy.jobs.runners.drmaa WARNING 2010-11-26 17:11:10,861 (21/516559.service0.ice.ifremer.fr) job will now be errored galaxy.jobs.runners.drmaa DEBUG 2010-11-26 17:11:10,986 (21/516559.service0.ice.ifremer.fr) User killed running job, but error encountered removing from DRM queue: code 1: pbs_deljob: Job %s has finished
Any idea ?
The job has successfully completed, but it's being treated as an error by the drmaa library.
I can't really test this so there's no way for me to know whether any other calls to drmaa.Session().jobStatus() can possibly raise an InternalException. The following will get the job completion to succeed, but it could cause other failed jobs to not be marked as failed.
In lib/galaxy/jobs/runners/drmaa.py, locate:
except drmaa.InvalidJobException:
and change it to:
except ( drmaa.InvalidJobException, drmaa.InternalException ):
If you wanted to keep an eye on the value of the error in the log, you could do the following instead:
except ( drmaa.InvalidJobException, drmaa.InternalException ), e: log.debug( "(%s/%s) job left DRM queue with following message: %s" % ( galaxy_job_id, job_id, e ) )
Please do let us know if you get it working. There are quite a few people hoping to get Galaxy working on PBS Pro.
--nate
Thanks a lot
Laure
galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
galaxy-dev@lists.galaxyproject.org