Thanks Nate!
In the meantime, I could resolve the issue by using the "new" drmaa library (0.4b3) and making some changes to sge.py.
I like the galaxy-framework. Hope to be able to contribute some features soon!
Cheers,
George
Hi George,George Chalkidis wrote:
Hi Galaxy-Dev-Team!
First of all, I would like to thank you for providing the community
with such a great framework!
I've downloaded and set up locally the latest packed release. When I
want to use Galaxy with the Sun Grid Engine, I run into problems. It
looks like Galaxy isn't able to fully control the Grid Engine and I
think it might be a version issue with DRMMA and our current SGE. Jobs
get submitted, appear in the queue and are executed, but their status
is not returned to Galaxy and therefore the job's status on the
website remains always "Job is waiting to run".
Here's a more detailed error-message:
------------------------------------------------------
[environment]
CentOS release 5.4 (Final)
SGE 6.2u5
Python 2.5.2
Jun 30th source tar.gz file
http://bitbucket.org/galaxy/galaxy-dist/get/tip.tar.gz
---------------------------------------------------------
[config]
$ cat universe_wsgi.ini |grep sge
# currently available are 'pbs' and 'sge'.
start_job_runners = sge
#default_cluster_job_runner = sge://default/mjobs.q/ cant lunch service
default_cluster_job_runner = sge:///mjobs.q/
$ grep -5 SGE_ROOT eggs.ini
[general]
repository = http://eggs.g2.bx.psu.edu/new
; these eggs must be scrambled for your local environment
no_auto = pbs_python DRMAA_python
SGE_ROOT = /home/geadmin/N1GE python scripts/scramble.py DRMAA_python
[eggs:platform]
bx_python = 0.5.0
Cheetah = 2.2.2
DRMAA_python = 0.2
---------------------------------------------------------
[Error Messages]
galaxy.jobs DEBUG 2010-07-16 17:30:24,794 job 10 dispatched
galaxy.jobs.runners.sge DEBUG 2010-07-16 17:30:27,480 (10) submitting
file /share1/home/icgc/galaxy-dist/database/pbs/galaxy_10.sh
galaxy.jobs.runners.sge DEBUG 2010-07-16 17:30:27,480 (10) command is:
perl /share1/home/icgc/galaxy-dist/tools/myTools/toolExample.pl
/share1/home/icgc/galaxy-dist/database/files/000/dataset_3.dat
/share1/home/icgc/galaxy-dist/database/files/000/dataset_10.dat
galaxy.jobs.runners.sge DEBUG 2010-07-16 17:30:27,522 (10) queued in
mjobs.q queue as 9440764
202.175.149.92 - - [16/Jul/2010:17:30:26 +0900] "POST
/root/history_item_updates HTTP/1.1" 200 -
"http://gw01.hgc.jp:8090/history" "Mozilla/5.0 (X11; U; Linux x86_64;
en-US; rv:1.9.2.6) Gecko/20100628 Ubuntu/10.04 (lucid) Firefox/3.6.6"
galaxy.jobs.runners.sge ERROR 2010-07-16 17:30:29,524 (10/9440764)
Unable to check job status
Traceback (most recent call last):
File "/share1/home/icgc/galaxy-dist/lib/galaxy/jobs/runners/sge.py",
line 277, in check_watched_items
state = self.ds.getJobProgramStatus( job_id )
File "/usr/local/lib/python2.5/site-packages/DRMAA.py", line 395, in
getJobProgramStatus
return ps
TypeError: writelines() argument must be a sequence of strings
galaxy.jobs.runners.sge WARNING 2010-07-16 17:30:29,525 (10/9440764)
job will now be errored
galaxy.jobs.runners.sge ERROR 2010-07-16 17:30:29,526 Uncaught
exception failing job
Traceback (most recent call last):
File "/share1/home/icgc/galaxy-dist/lib/galaxy/jobs/runners/sge.py",
line 138, in run_next
self.fail_job( obj )
File "/share1/home/icgc/galaxy-dist/lib/galaxy/jobs/runners/sge.py",
line 338, in fail_job
self.stop_job( self.sa_session.query( self.app.model.Job ).get(
sge_job_state.job_wrapper.job_id ) )
AttributeError: 'SGEJobRunner' object has no attribute 'sa_session'
I believe this issue should be resolved in changeset 4074:af48a13e46b9. Also note there is a new 'drmaa' job runner which is not SGE-specific. The sge runner will soon be deprecated in favor of the drmaa runner. Details on how to configure can be found here:
http://bitbucket.org/galaxy/galaxy-central/wiki/Config/Cluster
--nate
_______________________________________________[when I use sample code]
#!/usr/bin/env python
import DRMAA
import time
import os
def main():
"""Submit a job, and check its progress.
Note, need file called sleeper.sh in home directory. An example:
echo 'Hello World $1'
sleep 30s
"""
s=DRMAA.Session()
s.init()
print 'Creating job template'
jt = s.createJobTemplate()
jt.remoteCommand = os.getcwd() + '/sleeper.sh'
jt.args = ['42','Simon says:']
jt.joinFiles=True
jt.outputPath=":"+DRMAA.JobTemplate.HOME_DIRECTORY+'/tmp/JOB_OUT'
jobid = s.runJob(jt)
print 'Your job has been submitted with id ' + jobid
# Who needs a case statement when you have dictionaries?
decodestatus = {
DRMAA.Session.UNDETERMINED: 'process status cannot be determined',
DRMAA.Session.QUEUED_ACTIVE: 'job is queued and active',
DRMAA.Session.SYSTEM_ON_HOLD: 'job is queued and in system hold',
DRMAA.Session.USER_ON_HOLD: 'job is queued and in user hold',
DRMAA.Session.USER_SYSTEM_ON_HOLD: 'job is queued and in user
and system hold',
DRMAA.Session.RUNNING: 'job is running',
DRMAA.Session.SYSTEM_SUSPENDED: 'job is system suspended',
DRMAA.Session.USER_SUSPENDED: 'job is user suspended',
DRMAA.Session.DONE: 'job finished normally',
DRMAA.Session.FAILED: 'job finished, but failed',
}
for ix in range(10):
print ix
print 'Checking ' + str(ix) + ' of 10 times'
status = s.getJobProgramStatus(jobid)
print decodestatus.get(status)
time.sleep(5)
print 'Cleaning up'
s.deleteJobTemplate(jt)
s.exit()
if __name__=='__main__':
main()
$ ./test
Creating job template
Your job has been submitted with id 9440771
0
Checking 0 of 10 times
Traceback (most recent call last):
File "./test", line 52, in <module>
main()
File "./test", line 43, in main
status = s.getJobProgramStatus(jobid)
File "/usr/local/lib/python2.5/site-packages/DRMAA.py", line 393, in
getJobProgramStatus
eno, ps, estr = cDRMAA.drmaa_job_ps(jobName)
ValueError: need more than 2 values to unpack
[debug]
print cDRMAA.drmaa_job_ps(jobName)
eno, ps, estr = cDRMAA.drmaa_job_ps(jobName)
=> ([0, 16], '')
Thank you for taking time! Hopefully, you can give me some hints on
how to resolve this issue.
Cheers,
George Chalkidis
galaxy-dev mailing list
galaxy-dev@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-dev