[galaxy-dev] Help with cluster setup

13 Aug 2013

      I am totally lost on what is happening now, I have Galaxy running but jobs
are not being run:

This is my setup:
torque:
qmgr -c 'p s'
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = manager
set server managers = root@*
set server managers += jurgens@*
set server operators = galaxy@*
set server operators += jurgens@*
set server operators += root@*
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 300
set server job_stat_rate = 45
set server poll_jobs = True
set server mom_job_sync = True
set server keep_completed = 300
set server next_job_number = 17
set server moab_array_compatible = True

This is my job_conf.xml

<?xml version="1.0"?>
<!-- A sample job config that explicitly configures job running the way it
is configured by default (if there is no explicit config). -->
<job_conf>
    <plugins>
         <!-- <plugin id="drmaa" type="runner"
load="galaxy.jobs.runners.drmaa:DRMAAJobRunner" workers="4"/> -->
         <plugin id="drmaa" type="runner"
load="galaxy.jobs.runners.drmaa:DRMAAJobRunner"/>
        </plugins>
    <handlers default="batch">
        <handler id="cn01"  tags="batch"/>
        <handler id="cn02"  tags="batch"/>
    </handlers>
    <destinations default="batch">
        <destination id="batch" runner="drmaa" tag="cluster,batch">
        <param id="nativeSpecfication">-q batch</param>
        </destination>
    </destinations>
</job_conf>

This is parts of the universe_wsgi.ini

# Configuration of the internal HTTP server.

[server:main]

# The internal HTTP server to use.  Currently only Paste is provided.  This
# option is required.
use = egg:Paste#http

# The port on which to listen.
port = 8989

# The address on which to listen.  By default, only listen to localhost
(Galaxy
# will not be accessible over the network).  Use '0.0.0.0' to listen on all
# available network interfaces.
#host = 127.0.0.1
host = 0.0.0.0

# Use a threadpool for the web server instead of creating a thread for each
# request.
use_threadpool = True

# Number of threads in the web server thread pool.
#threadpool_workers = 10

# Set the number of seconds a thread can work before you should kill it
(assuming it will never finish) to 3 hours.
threadpool_kill_thread_limit = 10800

[server:cn01]
use = egg:Paste#http
port = 8090
host = 127.0.0.1
use_threadpool = true
threadpool_worker = 5
[server:cn02]
use = egg:Paste#http
port = 8091
host = 127.0.0.1
use_threadpool = true
threadpool_worker = 5

Where cn01 and cn02 are cluster nodes

echo $DRMAA_LIBRARY_PATH
/usr/local/lib/libdrmaa.so

On 8 August 2013 16:58, Nate Coraor <nate@bx.psu.edu <javascript:_e({},
'cvml', 'nate@bx.psu.edu');>> wrote:
...
On Aug 7, 2013, at 9:23 PM, shenwiyn wrote:
...
Yes,and I also have the same confuse about that.Actually when I set
server:<id> in the universe_wsgi.ini as follows for a try,my Galaxy doesn't
work with Cluster,if I remove server:<id>,it work .
Hi Shenwiyn,
Are you starting all of the servers that you have defined in
universe_wsgi.ini?  If using run.sh, setting GALAXY_RUN_ALL in the
environment will do this for you:
http://wiki.galaxyproject.org/Admin/Config/Performance/Scaling
...
[server:node01]
use = egg:Paste#http
port = 8080
host = 0.0.0.0
use_threadpool = true
threadpool_workers = 5
This is my job_conf.xml :
<?xml version="1.0"?>
<job_conf>
    <plugins workers="4">
        <plugin id="local" type="runner"
load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
        <plugin id="pbs" type="runner"
load="galaxy.jobs.runners.pbs:PBSJobRunner" workers="8"/>
    </plugins>
    <handlers default="batch">
        <handler id="node01" tags="batch"/>
        <handler id="node02" tags="batch"/>
    </handlers>
    <destinations default="regularjobs">
        <destination id="local" runner="local"/>
        <destination id="regularjobs" runner="pbs" tags="cluster">
            <param
id="Resource_List">walltime=24:00:00,nodes=1:ppn=4,mem=10G</param>
            <param
id="galaxy_external_runjob_script">scripts/drmaa_external_runner.py</param>
            <param
id="galaxy_external_killjob_script">scripts/drmaa_external_killer.py</param>
            <param
id="galaxy_external_chown_script">scripts/external_chown_script.py</param>
        </destination>
   </destinations>
</job_conf>
The galaxy_external_* options are only supported with the drmaa plugin,
and actually only belong in the univese_wsgi.ini for the moment, they have
not been migrated to the new-style job configuration.  They should also
only be used if you are attempting to set up "run jobs as the real user"
job running capabilities.
...
Further more when I want to kill my jobs  by clicking
<Catch(08-08-09-12-39).jpg> in galaxy web,the job keeps on running in my
background.I do not know how to fix this.
Any help on this would be grateful.Thank you very much.
Job deletion in the pbs runner was recently broken, but a fix for this bug
will be part of the next stable release (on Monday).
--nate
...
shenwiyn
From: Jurgens de Bruin
Date: 2013-08-07 19:55
To: galaxy-dev
Subject: [galaxy-dev] Help with cluster setup
Hi,
This is my first Galaxy installation setup so apologies for stupid
questions. I am setting up Galaxy on a Cluster running Torque as the
resource manager. I am working through the documentation but I am unclear
on some things:
...
Firstly I am unable to find : start_job_runners within the
universe_wsgi.ini and I dont want to just add this anywhere - any help on
this would be create.
...
Further more this is my job_conf.xml :
<?xml version="1.0"?>
<!-- A sample job config that explicitly configures job running the way
it is configured by default (if there is no explicit config). -->
...
<job_conf>
    <plugins>
        <plugin id="hpc" type="runner"
load="galaxy.jobs.runners.drmaa:DRMAAJobRunner" workers="4"/>
    </plugins>
    <handlers>
  <!-- Additional job handlers - the id should match the name of a
             [server:<id>] in universe_wsgi.ini.
        <handler id="cn01"/>
        <handler id="cn02"/>
    </handlers>
    <destinations>
        <destination id="hpc" runner="drmaa"/>
    </destinations>
</job_conf>
Does this look meaning full, further more where to I set the additional
server:<id>
in the universe_wsgi.ini.
As background the cluster has 13 compute nodes and a shared storage
array that can be accessed by all nodes in the cluster.
Thanks again
--
Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
distinti saluti/siong/duì yú/привет
Jurgens de Bruin
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/
-- 
Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
distinti saluti/siong/duì yú/привет

Jurgens de Bruin

-- 
Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/
distinti saluti/siong/duì yú/привет

Jurgens de Bruin