This is just a guess, which may help you troubleshoot.

It could be a that python is reaching a stack limit: run ulimit -s  and set it to a higher value if required

I’m completely guessing here but is it possible that the DRMAA is missing a linked library on the redhat system – check with ldd?

 

Regards,

Iyad Kandalaft

 

Iyad Kandalaft

Microbial Biodiversity Bioinformatics

Agriculture and Agri-Food Canada | Agriculture et Agroalimentaire Canada
960 Carling Ave.| 960 Ave. Carling

Ottawa, ON| Ottawa (ON) K1A 0C6

E-mail Address / Adresse courriel  Iyad.Kandalaft@agr.gc.ca
Telephone | Téléphone 613-759-1228
Facsimile | Télécopieur 613-759-1701
Teletypewriter | Téléimprimeur 613-773-2600
Government of Canada | Gouvernement du Canada

 

 

 

 

From: galaxy-dev-bounces@lists.bx.psu.edu [mailto:galaxy-dev-bounces@lists.bx.psu.edu] On Behalf Of I Kozin
Sent: Tuesday, June 10, 2014 12:42 PM
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] troubleshooting Galaxy with LSF

 

Hello,

This is largely a repost from the biostar forum following the suggestion there to post here.

 

I'm doing my first steps in setting up a Galaxy server with an LSF job scheduler. Recently LSF started supporting DRMAA again so I decided to give it a go. 

 

I have two setups. The one that works is a stand along server (OpenSuse 12.1, python 2.7.2, LSF 9.1.2). By "works" I mean that when I login into Galaxy using a browser and upload a file, a job gets submitted and run and everything seems fine.

The second setup does not work (RH 6.4, python 2.6.6, LSF 9.1.2). It's a server running Galaxy which is meant to submit jobs to an LSF cluster. When I similarly pick and download a file I get

Job <72266> is submitted to queue <short>.
./run.sh: line 79: 99087 Segmentation fault      python ./scripts/paster.py serve universe_wsgi.ini $@

For the moment, I'm not bothered with the full server setup, I'm just testing whether Galaxy works with LSF and therefore run ./run.sh as a user. 

The job configuration job_conf.xml is identical in both cases:

<?xml version="1.0"?>
<job_conf>
    <plugins>
        <plugin id="lsf" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner">
            <param id="drmaa_library_path">/opt/gridware/lsf/9.1/linux2.6-glibc2.3-x86_64/lib/libdrmaa.so</param>
        </plugin>
    </plugins>
    <handlers>
        <handler id="main"/>
    </handlers>
    <destinations default="lsf_default">
        <destination id="lsf_default" runner="lsf">
            <param id="nativeSpecification">-W 24:00</param>
        </destination>
    </destinations>
</job_conf>

run.sh is only changed to allow remote access.

Most recently I tried replacing python with 2.7.5 to no avail. Still the same kind of error. I also updated Galaxy.

Any hints would be much appreciated. Thank you