Hi Galaxy-Dev-Team! First of all, I would like to thank you for providing the community with such a great framework! I've downloaded and set up locally the latest packed release. When I want to use Galaxy with the Sun Grid Engine, I run into problems. It looks like Galaxy isn't able to fully control the Grid Engine and I think it might be a version issue with DRMMA and our current SGE. Jobs get submitted, appear in the queue and are executed, but their status is not returned to Galaxy and therefore the job's status on the website remains always "Job is waiting to run". Here's a more detailed error-message: ------------------------------------------------------ [environment] CentOS release 5.4 (Final) SGE 6.2u5 Python 2.5.2 Jun 30th source tar.gz file http://bitbucket.org/galaxy/galaxy-dist/get/tip.tar.gz --------------------------------------------------------- [config] $ cat universe_wsgi.ini |grep sge # currently available are 'pbs' and 'sge'. start_job_runners = sge #default_cluster_job_runner = sge://default/mjobs.q/ cant lunch service default_cluster_job_runner = sge:///mjobs.q/ $ grep -5 SGE_ROOT eggs.ini [general] repository = http://eggs.g2.bx.psu.edu/new ; these eggs must be scrambled for your local environment no_auto = pbs_python DRMAA_python SGE_ROOT = /home/geadmin/N1GE python scripts/scramble.py DRMAA_python [eggs:platform] bx_python = 0.5.0 Cheetah = 2.2.2 DRMAA_python = 0.2 --------------------------------------------------------- [Error Messages] galaxy.jobs DEBUG 2010-07-16 17:30:24,794 job 10 dispatched galaxy.jobs.runners.sge DEBUG 2010-07-16 17:30:27,480 (10) submitting file /share1/home/icgc/galaxy-dist/database/pbs/galaxy_10.sh galaxy.jobs.runners.sge DEBUG 2010-07-16 17:30:27,480 (10) command is: perl /share1/home/icgc/galaxy-dist/tools/myTools/toolExample.pl /share1/home/icgc/galaxy-dist/database/files/000/dataset_3.dat /share1/home/icgc/galaxy-dist/database/files/000/dataset_10.dat galaxy.jobs.runners.sge DEBUG 2010-07-16 17:30:27,522 (10) queued in mjobs.q queue as 9440764 202.175.149.92 - - [16/Jul/2010:17:30:26 +0900] "POST /root/history_item_updates HTTP/1.1" 200 - "http://gw01.hgc.jp:8090/history" "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.6) Gecko/20100628 Ubuntu/10.04 (lucid) Firefox/3.6.6" galaxy.jobs.runners.sge ERROR 2010-07-16 17:30:29,524 (10/9440764) Unable to check job status Traceback (most recent call last): File "/share1/home/icgc/galaxy-dist/lib/galaxy/jobs/runners/sge.py", line 277, in check_watched_items state = self.ds.getJobProgramStatus( job_id ) File "/usr/local/lib/python2.5/site-packages/DRMAA.py", line 395, in getJobProgramStatus return ps TypeError: writelines() argument must be a sequence of strings galaxy.jobs.runners.sge WARNING 2010-07-16 17:30:29,525 (10/9440764) job will now be errored galaxy.jobs.runners.sge ERROR 2010-07-16 17:30:29,526 Uncaught exception failing job Traceback (most recent call last): File "/share1/home/icgc/galaxy-dist/lib/galaxy/jobs/runners/sge.py", line 138, in run_next self.fail_job( obj ) File "/share1/home/icgc/galaxy-dist/lib/galaxy/jobs/runners/sge.py", line 338, in fail_job self.stop_job( self.sa_session.query( self.app.model.Job ).get( sge_job_state.job_wrapper.job_id ) ) AttributeError: 'SGEJobRunner' object has no attribute 'sa_session' [when I use sample code] #!/usr/bin/env python import DRMAA import time import os def main(): """Submit a job, and check its progress. Note, need file called sleeper.sh in home directory. An example: echo 'Hello World $1' sleep 30s """ s=DRMAA.Session() s.init() print 'Creating job template' jt = s.createJobTemplate() jt.remoteCommand = os.getcwd() + '/sleeper.sh' jt.args = ['42','Simon says:'] jt.joinFiles=True jt.outputPath=":"+DRMAA.JobTemplate.HOME_DIRECTORY+'/tmp/JOB_OUT' jobid = s.runJob(jt) print 'Your job has been submitted with id ' + jobid # Who needs a case statement when you have dictionaries? decodestatus = { DRMAA.Session.UNDETERMINED: 'process status cannot be determined', DRMAA.Session.QUEUED_ACTIVE: 'job is queued and active', DRMAA.Session.SYSTEM_ON_HOLD: 'job is queued and in system hold', DRMAA.Session.USER_ON_HOLD: 'job is queued and in user hold', DRMAA.Session.USER_SYSTEM_ON_HOLD: 'job is queued and in user and system hold', DRMAA.Session.RUNNING: 'job is running', DRMAA.Session.SYSTEM_SUSPENDED: 'job is system suspended', DRMAA.Session.USER_SUSPENDED: 'job is user suspended', DRMAA.Session.DONE: 'job finished normally', DRMAA.Session.FAILED: 'job finished, but failed', } for ix in range(10): print ix print 'Checking ' + str(ix) + ' of 10 times' status = s.getJobProgramStatus(jobid) print decodestatus.get(status) time.sleep(5) print 'Cleaning up' s.deleteJobTemplate(jt) s.exit() if __name__=='__main__': main() $ ./test Creating job template Your job has been submitted with id 9440771 0 Checking 0 of 10 times Traceback (most recent call last): File "./test", line 52, in <module> main() File "./test", line 43, in main status = s.getJobProgramStatus(jobid) File "/usr/local/lib/python2.5/site-packages/DRMAA.py", line 393, in getJobProgramStatus eno, ps, estr = cDRMAA.drmaa_job_ps(jobName) ValueError: need more than 2 values to unpack [debug] print cDRMAA.drmaa_job_ps(jobName) eno, ps, estr = cDRMAA.drmaa_job_ps(jobName) => ([0, 16], '') Thank you for taking time! Hopefully, you can give me some hints on how to resolve this issue. Cheers, George Chalkidis
George Chalkidis wrote:
Hi Galaxy-Dev-Team! First of all, I would like to thank you for providing the community with such a great framework!
I've downloaded and set up locally the latest packed release. When I want to use Galaxy with the Sun Grid Engine, I run into problems. It looks like Galaxy isn't able to fully control the Grid Engine and I think it might be a version issue with DRMMA and our current SGE. Jobs get submitted, appear in the queue and are executed, but their status is not returned to Galaxy and therefore the job's status on the website remains always "Job is waiting to run".
Here's a more detailed error-message: ------------------------------------------------------ [environment] CentOS release 5.4 (Final) SGE 6.2u5 Python 2.5.2 Jun 30th source tar.gz file http://bitbucket.org/galaxy/galaxy-dist/get/tip.tar.gz ---------------------------------------------------------
[config] $ cat universe_wsgi.ini |grep sge # currently available are 'pbs' and 'sge'. start_job_runners = sge #default_cluster_job_runner = sge://default/mjobs.q/ cant lunch service default_cluster_job_runner = sge:///mjobs.q/
$ grep -5 SGE_ROOT eggs.ini
[general] repository = http://eggs.g2.bx.psu.edu/new ; these eggs must be scrambled for your local environment no_auto = pbs_python DRMAA_python SGE_ROOT = /home/geadmin/N1GE python scripts/scramble.py DRMAA_python
[eggs:platform] bx_python = 0.5.0 Cheetah = 2.2.2 DRMAA_python = 0.2
---------------------------------------------------------
[Error Messages] galaxy.jobs DEBUG 2010-07-16 17:30:24,794 job 10 dispatched galaxy.jobs.runners.sge DEBUG 2010-07-16 17:30:27,480 (10) submitting file /share1/home/icgc/galaxy-dist/database/pbs/galaxy_10.sh galaxy.jobs.runners.sge DEBUG 2010-07-16 17:30:27,480 (10) command is: perl /share1/home/icgc/galaxy-dist/tools/myTools/toolExample.pl /share1/home/icgc/galaxy-dist/database/files/000/dataset_3.dat /share1/home/icgc/galaxy-dist/database/files/000/dataset_10.dat galaxy.jobs.runners.sge DEBUG 2010-07-16 17:30:27,522 (10) queued in mjobs.q queue as 9440764 202.175.149.92 - - [16/Jul/2010:17:30:26 +0900] "POST /root/history_item_updates HTTP/1.1" 200 - "http://gw01.hgc.jp:8090/history" "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.6) Gecko/20100628 Ubuntu/10.04 (lucid) Firefox/3.6.6" galaxy.jobs.runners.sge ERROR 2010-07-16 17:30:29,524 (10/9440764) Unable to check job status Traceback (most recent call last): File "/share1/home/icgc/galaxy-dist/lib/galaxy/jobs/runners/sge.py", line 277, in check_watched_items state = self.ds.getJobProgramStatus( job_id ) File "/usr/local/lib/python2.5/site-packages/DRMAA.py", line 395, in getJobProgramStatus return ps TypeError: writelines() argument must be a sequence of strings galaxy.jobs.runners.sge WARNING 2010-07-16 17:30:29,525 (10/9440764) job will now be errored galaxy.jobs.runners.sge ERROR 2010-07-16 17:30:29,526 Uncaught exception failing job Traceback (most recent call last): File "/share1/home/icgc/galaxy-dist/lib/galaxy/jobs/runners/sge.py", line 138, in run_next self.fail_job( obj ) File "/share1/home/icgc/galaxy-dist/lib/galaxy/jobs/runners/sge.py", line 338, in fail_job self.stop_job( self.sa_session.query( self.app.model.Job ).get( sge_job_state.job_wrapper.job_id ) ) AttributeError: 'SGEJobRunner' object has no attribute 'sa_session'
Hi George, I believe this issue should be resolved in changeset 4074:af48a13e46b9. Also note there is a new 'drmaa' job runner which is not SGE-specific. The sge runner will soon be deprecated in favor of the drmaa runner. Details on how to configure can be found here: http://bitbucket.org/galaxy/galaxy-central/wiki/Config/Cluster --nate
[when I use sample code]
#!/usr/bin/env python
import DRMAA import time import os
def main(): """Submit a job, and check its progress. Note, need file called sleeper.sh in home directory. An example: echo 'Hello World $1' sleep 30s """ s=DRMAA.Session() s.init()
print 'Creating job template' jt = s.createJobTemplate() jt.remoteCommand = os.getcwd() + '/sleeper.sh' jt.args = ['42','Simon says:'] jt.joinFiles=True jt.outputPath=":"+DRMAA.JobTemplate.HOME_DIRECTORY+'/tmp/JOB_OUT'
jobid = s.runJob(jt) print 'Your job has been submitted with id ' + jobid
# Who needs a case statement when you have dictionaries? decodestatus = { DRMAA.Session.UNDETERMINED: 'process status cannot be determined', DRMAA.Session.QUEUED_ACTIVE: 'job is queued and active', DRMAA.Session.SYSTEM_ON_HOLD: 'job is queued and in system hold', DRMAA.Session.USER_ON_HOLD: 'job is queued and in user hold', DRMAA.Session.USER_SYSTEM_ON_HOLD: 'job is queued and in user and system hold', DRMAA.Session.RUNNING: 'job is running', DRMAA.Session.SYSTEM_SUSPENDED: 'job is system suspended', DRMAA.Session.USER_SUSPENDED: 'job is user suspended', DRMAA.Session.DONE: 'job finished normally', DRMAA.Session.FAILED: 'job finished, but failed', }
for ix in range(10): print ix print 'Checking ' + str(ix) + ' of 10 times' status = s.getJobProgramStatus(jobid) print decodestatus.get(status) time.sleep(5)
print 'Cleaning up' s.deleteJobTemplate(jt) s.exit()
if __name__=='__main__': main()
$ ./test Creating job template Your job has been submitted with id 9440771 0 Checking 0 of 10 times Traceback (most recent call last): File "./test", line 52, in <module> main() File "./test", line 43, in main status = s.getJobProgramStatus(jobid) File "/usr/local/lib/python2.5/site-packages/DRMAA.py", line 393, in getJobProgramStatus eno, ps, estr = cDRMAA.drmaa_job_ps(jobName) ValueError: need more than 2 values to unpack
[debug] print cDRMAA.drmaa_job_ps(jobName) eno, ps, estr = cDRMAA.drmaa_job_ps(jobName) => ([0, 16], '')
Thank you for taking time! Hopefully, you can give me some hints on how to resolve this issue.
Cheers,
George Chalkidis _______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
Thanks Nate! In the meantime, I could resolve the issue by using the "new" drmaa library (0.4b3) and making some changes to sge.py. I like the galaxy-framework. Hope to be able to contribute some features soon! Cheers, George On Thu, Jul 29, 2010 at 10:49 PM, Nate Coraor <nate@bx.psu.edu> wrote:
George Chalkidis wrote:
Hi Galaxy-Dev-Team! First of all, I would like to thank you for providing the community with such a great framework!
I've downloaded and set up locally the latest packed release. When I want to use Galaxy with the Sun Grid Engine, I run into problems. It looks like Galaxy isn't able to fully control the Grid Engine and I think it might be a version issue with DRMMA and our current SGE. Jobs get submitted, appear in the queue and are executed, but their status is not returned to Galaxy and therefore the job's status on the website remains always "Job is waiting to run".
Here's a more detailed error-message: ------------------------------------------------------ [environment] CentOS release 5.4 (Final) SGE 6.2u5 Python 2.5.2 Jun 30th source tar.gz file http://bitbucket.org/galaxy/galaxy-dist/get/tip.tar.gz ---------------------------------------------------------
[config] $ cat universe_wsgi.ini |grep sge # currently available are 'pbs' and 'sge'. start_job_runners = sge #default_cluster_job_runner = sge://default/mjobs.q/ cant lunch service default_cluster_job_runner = sge:///mjobs.q/
$ grep -5 SGE_ROOT eggs.ini
[general] repository = http://eggs.g2.bx.psu.edu/new ; these eggs must be scrambled for your local environment no_auto = pbs_python DRMAA_python SGE_ROOT = /home/geadmin/N1GE python scripts/scramble.py DRMAA_python
[eggs:platform] bx_python = 0.5.0 Cheetah = 2.2.2 DRMAA_python = 0.2
---------------------------------------------------------
[Error Messages] galaxy.jobs DEBUG 2010-07-16 17:30:24,794 job 10 dispatched galaxy.jobs.runners.sge DEBUG 2010-07-16 17:30:27,480 (10) submitting file /share1/home/icgc/galaxy-dist/database/pbs/galaxy_10.sh galaxy.jobs.runners.sge DEBUG 2010-07-16 17:30:27,480 (10) command is: perl /share1/home/icgc/galaxy-dist/tools/myTools/toolExample.pl /share1/home/icgc/galaxy-dist/database/files/000/dataset_3.dat /share1/home/icgc/galaxy-dist/database/files/000/dataset_10.dat galaxy.jobs.runners.sge DEBUG 2010-07-16 17:30:27,522 (10) queued in mjobs.q queue as 9440764 202.175.149.92 - - [16/Jul/2010:17:30:26 +0900] "POST /root/history_item_updates HTTP/1.1" 200 - "http://gw01.hgc.jp:8090/history" "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.6) Gecko/20100628 Ubuntu/10.04 (lucid) Firefox/3.6.6" galaxy.jobs.runners.sge ERROR 2010-07-16 17:30:29,524 (10/9440764) Unable to check job status Traceback (most recent call last): File "/share1/home/icgc/galaxy-dist/lib/galaxy/jobs/runners/sge.py", line 277, in check_watched_items state = self.ds.getJobProgramStatus( job_id ) File "/usr/local/lib/python2.5/site-packages/DRMAA.py", line 395, in getJobProgramStatus return ps TypeError: writelines() argument must be a sequence of strings galaxy.jobs.runners.sge WARNING 2010-07-16 17:30:29,525 (10/9440764) job will now be errored galaxy.jobs.runners.sge ERROR 2010-07-16 17:30:29,526 Uncaught exception failing job Traceback (most recent call last): File "/share1/home/icgc/galaxy-dist/lib/galaxy/jobs/runners/sge.py", line 138, in run_next self.fail_job( obj ) File "/share1/home/icgc/galaxy-dist/lib/galaxy/jobs/runners/sge.py", line 338, in fail_job self.stop_job( self.sa_session.query( self.app.model.Job ).get( sge_job_state.job_wrapper.job_id ) ) AttributeError: 'SGEJobRunner' object has no attribute 'sa_session'
Hi George,
I believe this issue should be resolved in changeset 4074:af48a13e46b9. Also note there is a new 'drmaa' job runner which is not SGE-specific. The sge runner will soon be deprecated in favor of the drmaa runner. Details on how to configure can be found here:
http://bitbucket.org/galaxy/galaxy-central/wiki/Config/Cluster
--nate
[when I use sample code]
#!/usr/bin/env python
import DRMAA import time import os
def main(): """Submit a job, and check its progress. Note, need file called sleeper.sh in home directory. An example: echo 'Hello World $1' sleep 30s """ s=DRMAA.Session() s.init()
print 'Creating job template' jt = s.createJobTemplate() jt.remoteCommand = os.getcwd() + '/sleeper.sh' jt.args = ['42','Simon says:'] jt.joinFiles=True jt.outputPath=":"+DRMAA.JobTemplate.HOME_DIRECTORY+'/tmp/JOB_OUT'
jobid = s.runJob(jt) print 'Your job has been submitted with id ' + jobid
# Who needs a case statement when you have dictionaries? decodestatus = { DRMAA.Session.UNDETERMINED: 'process status cannot be determined', DRMAA.Session.QUEUED_ACTIVE: 'job is queued and active', DRMAA.Session.SYSTEM_ON_HOLD: 'job is queued and in system hold', DRMAA.Session.USER_ON_HOLD: 'job is queued and in user hold', DRMAA.Session.USER_SYSTEM_ON_HOLD: 'job is queued and in user and system hold', DRMAA.Session.RUNNING: 'job is running', DRMAA.Session.SYSTEM_SUSPENDED: 'job is system suspended', DRMAA.Session.USER_SUSPENDED: 'job is user suspended', DRMAA.Session.DONE: 'job finished normally', DRMAA.Session.FAILED: 'job finished, but failed', }
for ix in range(10): print ix print 'Checking ' + str(ix) + ' of 10 times' status = s.getJobProgramStatus(jobid) print decodestatus.get(status) time.sleep(5)
print 'Cleaning up' s.deleteJobTemplate(jt) s.exit()
if __name__=='__main__': main()
$ ./test Creating job template Your job has been submitted with id 9440771 0 Checking 0 of 10 times Traceback (most recent call last): File "./test", line 52, in <module> main() File "./test", line 43, in main status = s.getJobProgramStatus(jobid) File "/usr/local/lib/python2.5/site-packages/DRMAA.py", line 393, in getJobProgramStatus eno, ps, estr = cDRMAA.drmaa_job_ps(jobName) ValueError: need more than 2 values to unpack
[debug] print cDRMAA.drmaa_job_ps(jobName) eno, ps, estr = cDRMAA.drmaa_job_ps(jobName) => ([0, 16], '')
Thank you for taking time! Hopefully, you can give me some hints on how to resolve this issue.
Cheers,
George Chalkidis _______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
participants (2)
-
George Chalkidis
-
Nate Coraor