Re: [galaxy-dev] PBS_Python Unable to submit jobs

5 Apr 2013

      Hi Carrie,

I've had the same problem.  I wanted to get Galaxy to submit to a cluster which was running Torque 4.x.  Torque clients need to be 4.x to work with the that version of the server.  I spent a bit of time looking into this and determined that pbs_python used by Galaxy is not compatible with Torque 4.x.  A new version would need to be built.

At that stage I investigated using the DRMAA runner to talk to the Torque 4.x server.  That did work if I built the Torque clients with the server name hard coded --with-default-server.

What the DRMAA runner didn't do was data staging as the PBS runner does.  So I started working on some code for that.

I'm looking at giving up on the data staging by moving the Galaxy instance to the cluster.

Sorry I didn't help.  I would be interested in comments from Galaxy developers about whether the PBS runner will be supported in the future and, hence, whether Torque 4.x will be supported.  I'm also interested whether the DRMAA runner will support data staging or whether Galaxy instances really need to share file systems with a cluster.

Regards.

Steve McMahon
Solutions architect & senior systems administrator
ASC Cluster Services
Information Management & Technology (IM&T)
CSIRO
Phone: +61-2-62142968  |  Mobile:  +61-4-00779318
steve.mcmahon@csiro.au<mailto:steve.mcmahon@csiro.au> |  www.csiro.au<http://www.csiro.au/>
PO Box 225, DICKSON  ACT  2602
1 Wilf Crane Crescent, Yarralumla  ACT  2600

From: galaxy-dev-bounces@lists.bx.psu.edu [mailto:galaxy-dev-bounces@lists.bx.psu.edu] On Behalf Of Ganote, Carrie L
Sent: Friday, 5 April 2013 4:52 AM
To: galaxy-dev@bx.psu.edu
Subject: [galaxy-dev] PBS_Python Unable to submit jobs

Hi Galaxy dev,

My setup is a bit non-standard, but I'm getting the following error:
galaxy.jobs.runners.pbs WARNING 2013-04-04 13:24:00,590 (75) pbs_submit failed (try 1/5), PBS error 15044: Resources temporarily unavailable

Here is my setup:
Torque3 is installed in /usr/local/bin and I can use it to connect with (Default) server1.
Torque4 is installed in /N/soft/ and I can use it to connect to server2.

I'm running trq_authd so torque4 should work.
I can submit jobs to both servers from the command line. For server2, I specify the path to qsub and the servername (-q batch@server2).

In Galaxy, I used torquelib_dir=/N/soft to scramble pbs_python.
My path is pointing at /N/soft first so 'which qsub' returns torque4.
If I just use pbs:///, it will submit a job to server1 (shouldn't work, because /N/soft/qsub doesn't work from the commandline, since the default server1 is running torque3).
If I use pbs://-l vmem=100mb,walltime=00:30:00/, it won't work (the server string in pbs.py becomes "-l vmem=100mb,walltime=00:30:00" intsead of "server1")
If I use pbs://server2/, I get the Resources temp unavail error above. The server string is server2, and I put the following in pbs.py:
        whichq = os.popen("which qsub").read()
        stats = os.popen("qstat @server2").read()
These return the correct values for server2 using the correct torque version4.

I'm stumped as to why this is not making the connection. It's probably something about the python implementation I'm overlooking.

Thanks for any advice,

Carrie Ganote

Re: [galaxy-dev] PBS_Python Unable to submit jobs

Steve.Mcmahon＠csiro.au