Thanks Nate!
In the meantime, I could resolve the issue by using the "new" drmaa library
(0.4b3) and making some changes to sge.py.
I like the galaxy-framework. Hope to be able to contribute some features
soon!
Cheers,
George
On Thu, Jul 29, 2010 at 10:49 PM, Nate Coraor <nate(a)bx.psu.edu> wrote:
George Chalkidis wrote:
> Hi Galaxy-Dev-Team!
> First of all, I would like to thank you for providing the community
> with such a great framework!
>
> I've downloaded and set up locally the latest packed release. When I
> want to use Galaxy with the Sun Grid Engine, I run into problems. It
> looks like Galaxy isn't able to fully control the Grid Engine and I
> think it might be a version issue with DRMMA and our current SGE. Jobs
> get submitted, appear in the queue and are executed, but their status
> is not returned to Galaxy and therefore the job's status on the
> website remains always "Job is waiting to run".
>
> Here's a more detailed error-message:
> ------------------------------------------------------
> [environment]
> CentOS release 5.4 (Final)
> SGE 6.2u5
> Python 2.5.2
> Jun 30th source tar.gz file
>
http://bitbucket.org/galaxy/galaxy-dist/get/tip.tar.gz
> ---------------------------------------------------------
>
> [config]
> $ cat universe_wsgi.ini |grep sge
> # currently available are 'pbs' and 'sge'.
> start_job_runners = sge
> #default_cluster_job_runner = sge://default/mjobs.q/ cant lunch service
> default_cluster_job_runner = sge:///mjobs.q/
>
> $ grep -5 SGE_ROOT eggs.ini
>
> [general]
> repository =
http://eggs.g2.bx.psu.edu/new
> ; these eggs must be scrambled for your local environment
> no_auto = pbs_python DRMAA_python
> SGE_ROOT = /home/geadmin/N1GE python scripts/scramble.py DRMAA_python
>
> [eggs:platform]
> bx_python = 0.5.0
> Cheetah = 2.2.2
> DRMAA_python = 0.2
>
> ---------------------------------------------------------
>
> [Error Messages]
> galaxy.jobs DEBUG 2010-07-16 17:30:24,794 job 10 dispatched
> galaxy.jobs.runners.sge DEBUG 2010-07-16 17:30:27,480 (10) submitting
> file /share1/home/icgc/galaxy-dist/database/pbs/galaxy_10.sh
> galaxy.jobs.runners.sge DEBUG 2010-07-16 17:30:27,480 (10) command is:
> perl /share1/home/icgc/galaxy-dist/tools/myTools/toolExample.pl
> /share1/home/icgc/galaxy-dist/database/files/000/dataset_3.dat
> /share1/home/icgc/galaxy-dist/database/files/000/dataset_10.dat
> galaxy.jobs.runners.sge DEBUG 2010-07-16 17:30:27,522 (10) queued in
> mjobs.q queue as 9440764
> 202.175.149.92 - - [16/Jul/2010:17:30:26 +0900] "POST
> /root/history_item_updates HTTP/1.1" 200 -
> "http://gw01.hgc.jp:8090/history" "Mozilla/5.0 (X11; U; Linux x86_64;
> en-US; rv:1.9.2.6) Gecko/20100628 Ubuntu/10.04 (lucid) Firefox/3.6.6"
> galaxy.jobs.runners.sge ERROR 2010-07-16 17:30:29,524 (10/9440764)
> Unable to check job status
> Traceback (most recent call last):
> File "/share1/home/icgc/galaxy-dist/lib/galaxy/jobs/runners/sge.py",
> line 277, in check_watched_items
> state = self.ds.getJobProgramStatus( job_id )
> File "/usr/local/lib/python2.5/site-packages/DRMAA.py", line 395, in
> getJobProgramStatus
> return ps
> TypeError: writelines() argument must be a sequence of strings
> galaxy.jobs.runners.sge WARNING 2010-07-16 17:30:29,525 (10/9440764)
> job will now be errored
> galaxy.jobs.runners.sge ERROR 2010-07-16 17:30:29,526 Uncaught
> exception failing job
> Traceback (most recent call last):
> File "/share1/home/icgc/galaxy-dist/lib/galaxy/jobs/runners/sge.py",
> line 138, in run_next
> self.fail_job( obj )
> File "/share1/home/icgc/galaxy-dist/lib/galaxy/jobs/runners/sge.py",
> line 338, in fail_job
> self.stop_job( self.sa_session.query( self.app.model.Job ).get(
> sge_job_state.job_wrapper.job_id ) )
> AttributeError: 'SGEJobRunner' object has no attribute 'sa_session'
>
Hi George,
I believe this issue should be resolved in changeset 4074:af48a13e46b9.
Also note there is a new 'drmaa' job runner which is not SGE-specific. The
sge runner will soon be deprecated in favor of the drmaa runner. Details on
how to configure can be found here:
http://bitbucket.org/galaxy/galaxy-central/wiki/Config/Cluster
--nate
[when I use sample code]
>
> #!/usr/bin/env python
>
> import DRMAA
> import time
> import os
>
> def main():
> """Submit a job, and check its progress.
> Note, need file called sleeper.sh in home directory. An example:
> echo 'Hello World $1'
> sleep 30s
> """
> s=DRMAA.Session()
> s.init()
>
> print 'Creating job template'
> jt = s.createJobTemplate()
> jt.remoteCommand = os.getcwd() + '/sleeper.sh'
> jt.args = ['42','Simon says:']
> jt.joinFiles=True
>
jt.outputPath=":"+DRMAA.JobTemplate.HOME_DIRECTORY+'/tmp/JOB_OUT'
>
> jobid = s.runJob(jt)
> print 'Your job has been submitted with id ' + jobid
>
> # Who needs a case statement when you have dictionaries?
> decodestatus = {
> DRMAA.Session.UNDETERMINED: 'process status cannot be determined',
> DRMAA.Session.QUEUED_ACTIVE: 'job is queued and active',
> DRMAA.Session.SYSTEM_ON_HOLD: 'job is queued and in system hold',
> DRMAA.Session.USER_ON_HOLD: 'job is queued and in user hold',
> DRMAA.Session.USER_SYSTEM_ON_HOLD: 'job is queued and in user
> and system hold',
> DRMAA.Session.RUNNING: 'job is running',
> DRMAA.Session.SYSTEM_SUSPENDED: 'job is system suspended',
> DRMAA.Session.USER_SUSPENDED: 'job is user suspended',
> DRMAA.Session.DONE: 'job finished normally',
> DRMAA.Session.FAILED: 'job finished, but failed',
> }
>
> for ix in range(10):
> print ix
> print 'Checking ' + str(ix) + ' of 10 times'
> status = s.getJobProgramStatus(jobid)
> print decodestatus.get(status)
> time.sleep(5)
>
> print 'Cleaning up'
> s.deleteJobTemplate(jt)
> s.exit()
>
> if __name__=='__main__':
> main()
>
>
> $ ./test
> Creating job template
> Your job has been submitted with id 9440771
> 0
> Checking 0 of 10 times
> Traceback (most recent call last):
> File "./test", line 52, in <module>
> main()
> File "./test", line 43, in main
> status = s.getJobProgramStatus(jobid)
> File "/usr/local/lib/python2.5/site-packages/DRMAA.py", line 393, in
> getJobProgramStatus
> eno, ps, estr = cDRMAA.drmaa_job_ps(jobName)
> ValueError: need more than 2 values to unpack
>
> [debug]
> print cDRMAA.drmaa_job_ps(jobName)
> eno, ps, estr = cDRMAA.drmaa_job_ps(jobName)
> => ([0, 16], '')
>
>
> Thank you for taking time! Hopefully, you can give me some hints on
> how to resolve this issue.
>
> Cheers,
>
> George Chalkidis
> _______________________________________________
> galaxy-dev mailing list
> galaxy-dev(a)lists.bx.psu.edu
>
http://lists.bx.psu.edu/listinfo/galaxy-dev
>