This has now been narrowed down to a seq fault in the drmaa libraries immediately after submitting a job when an LSF queue is set explicitly with the LSB_DEFAULTQUEUE global variable. Marina On 02/02/2011 16:43, Marina Gourtovaia wrote:
Hello
I've set up Galaxy to use LSF. My first job has failed because Galaxy submitted it to the default queue, which was wrong in my case. However, Galaxy gracefully survived the failure, I was able to get the job number from the console output and figure out what went wrong.
Next time I run the Galaxy with the LSB_DEFAULTQUEUE env variable set like this:
LSB_DEFAULTQUEUE=test DRMAA_LIBRARY_PATH=/usr/local/lsf/7.0/linux2.6-glibc2.3-x86_64/lib/libdrmaa.so.1.0.4 PATH=/usr/bin:/software/solexa/bin:$PATH sh run.sh
The job is submitted to the correct queue and at this point Galaxy fails with this error:
run.sh: line 46: 6506 Segmentation fault python ./scripts/paster.py serve universe_wsgi.ini $@
The job successfully completes in its own time.
When I try to run Galaxy again I get the following:
galaxy.jobs DEBUG 2011-02-02 16:27:32,565 dispatching job 36 to drmaa runner galaxy.jobs INFO 2011-02-02 16:27:32,675 job 36 dispatched galaxy.jobs.runners.drmaa DEBUG 2011-02-02 16:27:33,192 (36) submitting file /nfs/users/nfs_m/mg8/mygalaxy/galaxy-dist/database/pbs/galaxy_36.sh galaxy.jobs.runners.drmaa DEBUG 2011-02-02 16:27:33,192 (36) command is: java -jar /nfs/users/nfs_m/mg8/mygalaxy/galaxy-dist/tool-data/shared/jars/SamToFastq.jar VALIDATION_STRINGENCY=SILENT QUIET=true INPUT=/lustre/scratch103/sanger/mg8/galaxy/datasets/000/dataset_16.dat FASTQ=/lustre/scratch103/sanger/mg8/galaxy/datasets/000/dataset_51.dat SECOND_END_FASTQ=/lustre/scratch103/sanger/mg8/galaxy/datasets/000/dataset_52.dat
Job <855341> is submitted to queue <test>. run.sh: line 46: 6506 Segmentation fault python ./scripts/paster.py serve universe_wsgi.ini $@
ie looks like Galaxy is trying to pick up where it has left and fails again.
I configured my job runners like this:
start_job_runners = drmaa default_cluster_job_runner = drmaa:///
Any suggestions?
Regards
Marina
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.