Help with cluster setup

Jurgens de Bruin

7 Aug 2013 7 Aug '13

11:55 a.m.

Hi, This is my first Galaxy installation setup so apologies for stupid questions. I am setting up Galaxy on a Cluster running Torque as the resource manager. I am working through the documentation but I am unclear on some things: Firstly I am unable to find : *start_job_runners *within the universe_wsgi.ini and I dont want to just add this anywhere - any help on this would be create. Further more this is my job_conf.xml : <?xml version="1.0"?>  <job_conf> <plugins> <plugin id="hpc" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner" workers="4"/> </plugins> <handlers> <!-- Additional job handlers - the id should match the name of a [server:<id>] in universe_wsgi.ini. <handler id="cn01"/> <handler id="cn02"/> </handlers> <destinations> <destination id="hpc" runner="drmaa"/> </destinations> </job_conf> Does this look meaning full, further more where to I set the additional server:<id> in the universe_wsgi.ini. As background the cluster has 13 compute nodes and a shared storage array that can be accessed by all nodes in the cluster. Thanks again -- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет Jurgens de Bruin

Attachments:

attachment.htm (text/html — 2.0 KB)

Show replies by date

shenwiyn

8 Aug 8 Aug

1:23 a.m.

Yes,and I also have the same confuse about that.Actually when I set server:<id> in the universe_wsgi.ini as follows for a try,my Galaxy doesn't work with Cluster,if I remove server:<id>,it work . [server:node01] use = egg:Paste#http port = 8080 host = 0.0.0.0 use_threadpool = true threadpool_workers = 5 This is my job_conf.xml : <?xml version="1.0"?> <job_conf> <plugins workers="4"> <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/> <plugin id="pbs" type="runner" load="galaxy.jobs.runners.pbs:PBSJobRunner" workers="8"/> </plugins> <handlers default="batch"> <handler id="node01" tags="batch"/> <handler id="node02" tags="batch"/> </handlers> <destinations default="regularjobs"> <destination id="local" runner="local"/> <destination id="regularjobs" runner="pbs" tags="cluster"> <param id="Resource_List">walltime=24:00:00,nodes=1:ppn=4,mem=10G</param> <param id="galaxy_external_runjob_script">scripts/drmaa_external_runner.py</param> <param id="galaxy_external_killjob_script">scripts/drmaa_external_killer.py</param> <param id="galaxy_external_chown_script">scripts/external_chown_script.py</param> </destination> </destinations> </job_conf> Further more when I want to kill my jobs by clicking in galaxy web,the job keeps on running in my background.I do not know how to fix this. Any help on this would be grateful.Thank you very much. shenwiyn From: Jurgens de Bruin Date: 2013-08-07 19:55 To: galaxy-dev Subject: [galaxy-dev] Help with cluster setup Hi, This is my first Galaxy installation setup so apologies for stupid questions. I am setting up Galaxy on a Cluster running Torque as the resource manager. I am working through the documentation but I am unclear on some things: Firstly I am unable to find : start_job_runners within the universe_wsgi.ini and I dont want to just add this anywhere - any help on this would be create. Further more this is my job_conf.xml : <?xml version="1.0"?>  <job_conf> <plugins> <plugin id="hpc" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner" workers="4"/> </plugins> <handlers> <!-- Additional job handlers - the id should match the name of a [server:<id>] in universe_wsgi.ini. <handler id="cn01"/> <handler id="cn02"/> </handlers> <destinations> <destination id="hpc" runner="drmaa"/> </destinations> </job_conf> Does this look meaning full, further more where to I set the additional server:<id> in the universe_wsgi.ini. As background the cluster has 13 compute nodes and a shared storage array that can be accessed by all nodes in the cluster. Thanks again -- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет Jurgens de Bruin

Nate Coraor

2:58 p.m.

On Aug 7, 2013, at 9:23 PM, shenwiyn wrote:

...

Yes,and I also have the same confuse about that.Actually when I set server:<id> in the universe_wsgi.ini as follows for a try,my Galaxy doesn't work with Cluster,if I remove server:<id>,it work .

Hi Shenwiyn, Are you starting all of the servers that you have defined in universe_wsgi.ini? If using run.sh, setting GALAXY_RUN_ALL in the environment will do this for you: http://wiki.galaxyproject.org/Admin/Config/Performance/Scaling

...

[server:node01] use = egg:Paste#http port = 8080 host = 0.0.0.0 use_threadpool = true threadpool_workers = 5 This is my job_conf.xml : <?xml version="1.0"?> <job_conf> <plugins workers="4"> <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/> <plugin id="pbs" type="runner" load="galaxy.jobs.runners.pbs:PBSJobRunner" workers="8"/> </plugins> <handlers default="batch"> <handler id="node01" tags="batch"/> <handler id="node02" tags="batch"/> </handlers> <destinations default="regularjobs"> <destination id="local" runner="local"/> <destination id="regularjobs" runner="pbs" tags="cluster"> <param id="Resource_List">walltime=24:00:00,nodes=1:ppn=4,mem=10G</param> <param id="galaxy_external_runjob_script">scripts/drmaa_external_runner.py</param> <param id="galaxy_external_killjob_script">scripts/drmaa_external_killer.py</param> <param id="galaxy_external_chown_script">scripts/external_chown_script.py</param> </destination> </destinations> </job_conf>

The galaxy_external_* options are only supported with the drmaa plugin, and actually only belong in the univese_wsgi.ini for the moment, they have not been migrated to the new-style job configuration. They should also only be used if you are attempting to set up "run jobs as the real user" job running capabilities.

...

Further more when I want to kill my jobs by clicking <Catch(08-08-09-12-39).jpg> in galaxy web,the job keeps on running in my background.I do not know how to fix this. Any help on this would be grateful.Thank you very much.

Job deletion in the pbs runner was recently broken, but a fix for this bug will be part of the next stable release (on Monday). --nate

...

shenwiyn

From: Jurgens de Bruin Date: 2013-08-07 19:55 To: galaxy-dev Subject: [galaxy-dev] Help with cluster setup Hi,

This is my first Galaxy installation setup so apologies for stupid questions. I am setting up Galaxy on a Cluster running Torque as the resource manager. I am working through the documentation but I am unclear on some things:

Firstly I am unable to find : start_job_runners within the universe_wsgi.ini and I dont want to just add this anywhere - any help on this would be create.

Further more this is my job_conf.xml :

<?xml version="1.0"?>  <job_conf> <plugins> <plugin id="hpc" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner" workers="4"/> </plugins> <handlers> <!-- Additional job handlers - the id should match the name of a [server:<id>] in universe_wsgi.ini. <handler id="cn01"/> <handler id="cn02"/> </handlers> <destinations> <destination id="hpc" runner="drmaa"/> </destinations> </job_conf>

Does this look meaning full, further more where to I set the additional server:<id> in the universe_wsgi.ini.

As background the cluster has 13 compute nodes and a shared storage array that can be accessed by all nodes in the cluster.

Thanks again

-- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет

Jurgens de Bruin ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Jurgens de Bruin

13 Aug 13 Aug

2:53 p.m.

I am totally lost on what is happening now, I have Galaxy running but jobs are not being run: This is my setup: torque: qmgr -c 'p s' # # Create queues and set their attributes. # # # Create and define queue batch # create queue batch set queue batch queue_type = Execution set queue batch resources_default.nodes = 1 set queue batch resources_default.walltime = 01:00:00 set queue batch enabled = True set queue batch started = True # # Set server attributes. # set server scheduling = True set server acl_hosts = manager set server managers = root@* set server managers += jurgens@* set server operators = galaxy@* set server operators += jurgens@* set server operators += root@* set server default_queue = batch set server log_events = 511 set server mail_from = adm set server scheduler_iteration = 600 set server node_check_rate = 150 set server tcp_timeout = 300 set server job_stat_rate = 45 set server poll_jobs = True set server mom_job_sync = True set server keep_completed = 300 set server next_job_number = 17 set server moab_array_compatible = True This is my job_conf.xml <?xml version="1.0"?>  <job_conf> <plugins>  <plugin id="drmaa" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner"/> </plugins> <handlers default="batch"> <handler id="cn01" tags="batch"/> <handler id="cn02" tags="batch"/> </handlers> <destinations default="batch"> <destination id="batch" runner="drmaa" tag="cluster,batch"> <param id="nativeSpecfication">-q batch</param> </destination> </destinations> </job_conf> This is parts of the universe_wsgi.ini # Configuration of the internal HTTP server. [server:main] # The internal HTTP server to use. Currently only Paste is provided. This # option is required. use = egg:Paste#http # The port on which to listen. port = 8989 # The address on which to listen. By default, only listen to localhost (Galaxy # will not be accessible over the network). Use '0.0.0.0' to listen on all # available network interfaces. #host = 127.0.0.1 host = 0.0.0.0 # Use a threadpool for the web server instead of creating a thread for each # request. use_threadpool = True # Number of threads in the web server thread pool. #threadpool_workers = 10 # Set the number of seconds a thread can work before you should kill it (assuming it will never finish) to 3 hours. threadpool_kill_thread_limit = 10800 [server:cn01] use = egg:Paste#http port = 8090 host = 127.0.0.1 use_threadpool = true threadpool_worker = 5 [server:cn02] use = egg:Paste#http port = 8091 host = 127.0.0.1 use_threadpool = true threadpool_worker = 5 Where cn01 and cn02 are cluster nodes echo $DRMAA_LIBRARY_PATH /usr/local/lib/libdrmaa.so On 8 August 2013 16:58, Nate Coraor <nate@bx.psu.edu <javascript:_e({}, 'cvml', 'nate@bx.psu.edu');>> wrote:

...

On Aug 7, 2013, at 9:23 PM, shenwiyn wrote:

...
Yes,and I also have the same confuse about that.Actually when I set server:<id> in the universe_wsgi.ini as follows for a try,my Galaxy doesn't work with Cluster,if I remove server:<id>,it work .

Hi Shenwiyn,

Are you starting all of the servers that you have defined in universe_wsgi.ini? If using run.sh, setting GALAXY_RUN_ALL in the environment will do this for you:

http://wiki.galaxyproject.org/Admin/Config/Performance/Scaling

...
[server:node01] use = egg:Paste#http port = 8080 host = 0.0.0.0 use_threadpool = true threadpool_workers = 5 This is my job_conf.xml : <?xml version="1.0"?> <job_conf> <plugins workers="4"> <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/> <plugin id="pbs" type="runner" load="galaxy.jobs.runners.pbs:PBSJobRunner" workers="8"/> </plugins> <handlers default="batch"> <handler id="node01" tags="batch"/> <handler id="node02" tags="batch"/> </handlers> <destinations default="regularjobs"> <destination id="local" runner="local"/> <destination id="regularjobs" runner="pbs" tags="cluster"> <param id="Resource_List">walltime=24:00:00,nodes=1:ppn=4,mem=10G</param> <param id="galaxy_external_runjob_script">scripts/drmaa_external_runner.py</param> <param id="galaxy_external_killjob_script">scripts/drmaa_external_killer.py</param> <param id="galaxy_external_chown_script">scripts/external_chown_script.py</param> </destination> </destinations> </job_conf>

The galaxy_external_* options are only supported with the drmaa plugin, and actually only belong in the univese_wsgi.ini for the moment, they have not been migrated to the new-style job configuration. They should also only be used if you are attempting to set up "run jobs as the real user" job running capabilities.

...
Further more when I want to kill my jobs by clicking <Catch(08-08-09-12-39).jpg> in galaxy web,the job keeps on running in my background.I do not know how to fix this. Any help on this would be grateful.Thank you very much.

Job deletion in the pbs runner was recently broken, but a fix for this bug will be part of the next stable release (on Monday).

--nate

...
shenwiyn

From: Jurgens de Bruin Date: 2013-08-07 19:55 To: galaxy-dev Subject: [galaxy-dev] Help with cluster setup Hi,

This is my first Galaxy installation setup so apologies for stupid

questions. I am setting up Galaxy on a Cluster running Torque as the resource manager. I am working through the documentation but I am unclear on some things:

...
Firstly I am unable to find : start_job_runners within the

universe_wsgi.ini and I dont want to just add this anywhere - any help on this would be create.

...
Further more this is my job_conf.xml :

<?xml version="1.0"?> 

...
<job_conf> <plugins> <plugin id="hpc" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner" workers="4"/> </plugins> <handlers> <!-- Additional job handlers - the id should match the name of a [server:<id>] in universe_wsgi.ini. <handler id="cn01"/> <handler id="cn02"/> </handlers> <destinations> <destination id="hpc" runner="drmaa"/> </destinations> </job_conf>

Does this look meaning full, further more where to I set the additional server:<id> in the universe_wsgi.ini.

As background the cluster has 13 compute nodes and a shared storage array that can be accessed by all nodes in the cluster.

Thanks again

-- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет

Jurgens de Bruin ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

-- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет Jurgens de Bruin -- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет Jurgens de Bruin

Jurgens de Bruin

14 Aug 14 Aug

11:48 a.m.

Hi just to keep things up to date I have the the cluster up and running jobs are being submitted. Last problem I am facing is: 21: UCSC Main on Pig: refGene (chr18:1-61220071) error An error occurred with this dataset: *The remote data source application may be off line, please try again later. Error: [Errno socket error] [Errno 101] Network is unreachable* On 13 August 2013 16:53, Jurgens de Bruin <debruinjj@gmail.com> wrote:

...

I am totally lost on what is happening now, I have Galaxy running but jobs are not being run:

This is my setup: torque: qmgr -c 'p s' # # Create queues and set their attributes. # # # Create and define queue batch # create queue batch set queue batch queue_type = Execution set queue batch resources_default.nodes = 1 set queue batch resources_default.walltime = 01:00:00 set queue batch enabled = True set queue batch started = True # # Set server attributes. # set server scheduling = True set server acl_hosts = manager set server managers = root@* set server managers += jurgens@* set server operators = galaxy@* set server operators += jurgens@* set server operators += root@* set server default_queue = batch set server log_events = 511 set server mail_from = adm set server scheduler_iteration = 600 set server node_check_rate = 150 set server tcp_timeout = 300 set server job_stat_rate = 45 set server poll_jobs = True set server mom_job_sync = True set server keep_completed = 300 set server next_job_number = 17 set server moab_array_compatible = True

This is my job_conf.xml

<?xml version="1.0"?>

 <job_conf> <plugins>  <plugin id="drmaa" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner"/> </plugins> <handlers default="batch"> <handler id="cn01" tags="batch"/> <handler id="cn02" tags="batch"/> </handlers> <destinations default="batch"> <destination id="batch" runner="drmaa" tag="cluster,batch"> <param id="nativeSpecfication">-q batch</param> </destination> </destinations> </job_conf>

This is parts of the universe_wsgi.ini

# Configuration of the internal HTTP server.

[server:main]

# The internal HTTP server to use. Currently only Paste is provided. This # option is required. use = egg:Paste#http

# The port on which to listen. port = 8989

# The address on which to listen. By default, only listen to localhost (Galaxy # will not be accessible over the network). Use '0.0.0.0' to listen on all # available network interfaces. #host = 127.0.0.1 host = 0.0.0.0

# Use a threadpool for the web server instead of creating a thread for each # request. use_threadpool = True

# Number of threads in the web server thread pool. #threadpool_workers = 10

# Set the number of seconds a thread can work before you should kill it (assuming it will never finish) to 3 hours. threadpool_kill_thread_limit = 10800

[server:cn01] use = egg:Paste#http

port = 8090 host = 127.0.0.1 use_threadpool = true threadpool_worker = 5 [server:cn02] use = egg:Paste#http

port = 8091 host = 127.0.0.1 use_threadpool = true threadpool_worker = 5

Where cn01 and cn02 are cluster nodes

echo $DRMAA_LIBRARY_PATH /usr/local/lib/libdrmaa.so

On 8 August 2013 16:58, Nate Coraor <nate@bx.psu.edu> wrote:

...
On Aug 7, 2013, at 9:23 PM, shenwiyn wrote:

...
Yes,and I also have the same confuse about that.Actually when I set server:<id> in the universe_wsgi.ini as follows for a try,my Galaxy doesn't work with Cluster,if I remove server:<id>,it work .

Hi Shenwiyn,

Are you starting all of the servers that you have defined in universe_wsgi.ini? If using run.sh, setting GALAXY_RUN_ALL in the environment will do this for you:

http://wiki.galaxyproject.org/Admin/Config/Performance/Scaling

...
[server:node01] use = egg:Paste#http port = 8080 host = 0.0.0.0 use_threadpool = true threadpool_workers = 5 This is my job_conf.xml : <?xml version="1.0"?> <job_conf> <plugins workers="4"> <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/> <plugin id="pbs" type="runner" load="galaxy.jobs.runners.pbs:PBSJobRunner" workers="8"/> </plugins> <handlers default="batch"> <handler id="node01" tags="batch"/> <handler id="node02" tags="batch"/> </handlers> <destinations default="regularjobs"> <destination id="local" runner="local"/> <destination id="regularjobs" runner="pbs" tags="cluster"> <param id="Resource_List">walltime=24:00:00,nodes=1:ppn=4,mem=10G</param> <param id="galaxy_external_runjob_script">scripts/drmaa_external_runner.py</param> <param id="galaxy_external_killjob_script">scripts/drmaa_external_killer.py</param> <param id="galaxy_external_chown_script">scripts/external_chown_script.py</param> </destination> </destinations> </job_conf>

The galaxy_external_* options are only supported with the drmaa plugin, and actually only belong in the univese_wsgi.ini for the moment, they have not been migrated to the new-style job configuration. They should also only be used if you are attempting to set up "run jobs as the real user" job running capabilities.

...
Further more when I want to kill my jobs by clicking <Catch(08-08-09-12-39).jpg> in galaxy web,the job keeps on running in my background.I do not know how to fix this. Any help on this would be grateful.Thank you very much.

Job deletion in the pbs runner was recently broken, but a fix for this bug will be part of the next stable release (on Monday).

--nate

...
shenwiyn

From: Jurgens de Bruin Date: 2013-08-07 19:55 To: galaxy-dev Subject: [galaxy-dev] Help with cluster setup Hi,

This is my first Galaxy installation setup so apologies for stupid

questions. I am setting up Galaxy on a Cluster running Torque as the resource manager. I am working through the documentation but I am unclear on some things:

...
Firstly I am unable to find : start_job_runners within the

universe_wsgi.ini and I dont want to just add this anywhere - any help on this would be create.

...
Further more this is my job_conf.xml :

<?xml version="1.0"?> 

...
<job_conf> <plugins> <plugin id="hpc" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner" workers="4"/> </plugins> <handlers> <!-- Additional job handlers - the id should match the name of a [server:<id>] in universe_wsgi.ini. <handler id="cn01"/> <handler id="cn02"/> </handlers> <destinations> <destination id="hpc" runner="drmaa"/> </destinations> </job_conf>

Does this look meaning full, further more where to I set the additional server:<id> in the universe_wsgi.ini.

As background the cluster has 13 compute nodes and a shared storage array that can be accessed by all nodes in the cluster.

Thanks again

-- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет

Jurgens de Bruin ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

-- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет

Jurgens de Bruin

-- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет

Jurgens de Bruin

-- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет Jurgens de Bruin

Nate Coraor

29 Aug 29 Aug

2:06 p.m.

On Aug 14, 2013, at 7:48 AM, Jurgens de Bruin wrote:

...

Hi

just to keep things up to date I have the the cluster up and running jobs are being submitted. Last problem I am facing is:

21: UCSC Main on Pig: refGene (chr18:1-61220071) error An error occurred with this dataset: The remote data source application may be off line, please try again later. Error: [Errno socket error] [Errno 101] Network is unreachable

Hi Jurgens, Your cluster node will need the ability to connect to the Internet to connect to external data sources. Many clusters use private IP space and therefore must rely on NAT for a connection to the greater Internet. --nate

...

On 13 August 2013 16:53, Jurgens de Bruin <debruinjj@gmail.com> wrote: I am totally lost on what is happening now, I have Galaxy running but jobs are not being run:

This is my setup: torque: qmgr -c 'p s' # # Create queues and set their attributes. # # # Create and define queue batch # create queue batch set queue batch queue_type = Execution set queue batch resources_default.nodes = 1 set queue batch resources_default.walltime = 01:00:00 set queue batch enabled = True set queue batch started = True # # Set server attributes. # set server scheduling = True set server acl_hosts = manager set server managers = root@* set server managers += jurgens@* set server operators = galaxy@* set server operators += jurgens@* set server operators += root@* set server default_queue = batch set server log_events = 511 set server mail_from = adm set server scheduler_iteration = 600 set server node_check_rate = 150 set server tcp_timeout = 300 set server job_stat_rate = 45 set server poll_jobs = True set server mom_job_sync = True set server keep_completed = 300 set server next_job_number = 17 set server moab_array_compatible = True

This is my job_conf.xml

<?xml version="1.0"?>

 <job_conf> <plugins>  <plugin id="drmaa" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner"/> </plugins> <handlers default="batch"> <handler id="cn01" tags="batch"/> <handler id="cn02" tags="batch"/> </handlers> <destinations default="batch"> <destination id="batch" runner="drmaa" tag="cluster,batch"> <param id="nativeSpecfication">-q batch</param> </destination> </destinations> </job_conf>

This is parts of the universe_wsgi.ini

# Configuration of the internal HTTP server.

[server:main]

# The internal HTTP server to use. Currently only Paste is provided. This # option is required. use = egg:Paste#http

# The port on which to listen. port = 8989

# The address on which to listen. By default, only listen to localhost (Galaxy # will not be accessible over the network). Use '0.0.0.0' to listen on all # available network interfaces. #host = 127.0.0.1 host = 0.0.0.0

# Use a threadpool for the web server instead of creating a thread for each # request. use_threadpool = True

# Number of threads in the web server thread pool. #threadpool_workers = 10

# Set the number of seconds a thread can work before you should kill it (assuming it will never finish) to 3 hours. threadpool_kill_thread_limit = 10800

[server:cn01] use = egg:Paste#http

port = 8090 host = 127.0.0.1 use_threadpool = true threadpool_worker = 5 [server:cn02] use = egg:Paste#http

port = 8091 host = 127.0.0.1 use_threadpool = true threadpool_worker = 5

Where cn01 and cn02 are cluster nodes

echo $DRMAA_LIBRARY_PATH /usr/local/lib/libdrmaa.so

On 8 August 2013 16:58, Nate Coraor <nate@bx.psu.edu> wrote: On Aug 7, 2013, at 9:23 PM, shenwiyn wrote:

...
Yes,and I also have the same confuse about that.Actually when I set server:<id> in the universe_wsgi.ini as follows for a try,my Galaxy doesn't work with Cluster,if I remove server:<id>,it work .

Hi Shenwiyn,

Are you starting all of the servers that you have defined in universe_wsgi.ini? If using run.sh, setting GALAXY_RUN_ALL in the environment will do this for you:

http://wiki.galaxyproject.org/Admin/Config/Performance/Scaling

...
[server:node01] use = egg:Paste#http port = 8080 host = 0.0.0.0 use_threadpool = true threadpool_workers = 5 This is my job_conf.xml : <?xml version="1.0"?> <job_conf> <plugins workers="4"> <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/> <plugin id="pbs" type="runner" load="galaxy.jobs.runners.pbs:PBSJobRunner" workers="8"/> </plugins> <handlers default="batch"> <handler id="node01" tags="batch"/> <handler id="node02" tags="batch"/> </handlers> <destinations default="regularjobs"> <destination id="local" runner="local"/> <destination id="regularjobs" runner="pbs" tags="cluster"> <param id="Resource_List">walltime=24:00:00,nodes=1:ppn=4,mem=10G</param> <param id="galaxy_external_runjob_script">scripts/drmaa_external_runner.py</param> <param id="galaxy_external_killjob_script">scripts/drmaa_external_killer.py</param> <param id="galaxy_external_chown_script">scripts/external_chown_script.py</param> </destination> </destinations> </job_conf>

The galaxy_external_* options are only supported with the drmaa plugin, and actually only belong in the univese_wsgi.ini for the moment, they have not been migrated to the new-style job configuration. They should also only be used if you are attempting to set up "run jobs as the real user" job running capabilities.

...
Further more when I want to kill my jobs by clicking <Catch(08-08-09-12-39).jpg> in galaxy web,the job keeps on running in my background.I do not know how to fix this. Any help on this would be grateful.Thank you very much.

Job deletion in the pbs runner was recently broken, but a fix for this bug will be part of the next stable release (on Monday).

--nate

...
shenwiyn

From: Jurgens de Bruin Date: 2013-08-07 19:55 To: galaxy-dev Subject: [galaxy-dev] Help with cluster setup Hi,

This is my first Galaxy installation setup so apologies for stupid questions. I am setting up Galaxy on a Cluster running Torque as the resource manager. I am working through the documentation but I am unclear on some things:

Firstly I am unable to find : start_job_runners within the universe_wsgi.ini and I dont want to just add this anywhere - any help on this would be create.

Further more this is my job_conf.xml :

<?xml version="1.0"?>  <job_conf> <plugins> <plugin id="hpc" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner" workers="4"/> </plugins> <handlers> <!-- Additional job handlers - the id should match the name of a [server:<id>] in universe_wsgi.ini. <handler id="cn01"/> <handler id="cn02"/> </handlers> <destinations> <destination id="hpc" runner="drmaa"/> </destinations> </job_conf>

Does this look meaning full, further more where to I set the additional server:<id> in the universe_wsgi.ini.

As background the cluster has 13 compute nodes and a shared storage array that can be accessed by all nodes in the cluster.

Thanks again

-- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет

Jurgens de Bruin ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

-- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет

Jurgens de Bruin

-- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет

Jurgens de Bruin

-- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет

Jurgens de Bruin ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

沈维燕

30 Apr 30 Apr

8:20 a.m.

...

From your previous email,Job deletion in the pbs runner will be fixed in

Hi Nate, the next stable release Galaxy.So whether this bug has been fixed in the version of Galaxy( https://bitbucket.org/galaxy/galaxy-dist/get/3b3365a39194.zip)?Thank you very much for your help. Regards,weiyan 2013-08-08 22:58 GMT+08:00 Nate Coraor <nate@bx.psu.edu>:

...

On Aug 7, 2013, at 9:23 PM, shenwiyn wrote:

...
Yes,and I also have the same confuse about that.Actually when I set server:<id> in the universe_wsgi.ini as follows for a try,my Galaxy doesn't work with Cluster,if I remove server:<id>,it work .

Hi Shenwiyn,

Are you starting all of the servers that you have defined in universe_wsgi.ini? If using run.sh, setting GALAXY_RUN_ALL in the environment will do this for you:

http://wiki.galaxyproject.org/Admin/Config/Performance/Scaling

...
[server:node01] use = egg:Paste#http port = 8080 host = 0.0.0.0 use_threadpool = true threadpool_workers = 5 This is my job_conf.xml : <?xml version="1.0"?> <job_conf> <plugins workers="4"> <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/> <plugin id="pbs" type="runner" load="galaxy.jobs.runners.pbs:PBSJobRunner" workers="8"/> </plugins> <handlers default="batch"> <handler id="node01" tags="batch"/> <handler id="node02" tags="batch"/> </handlers> <destinations default="regularjobs"> <destination id="local" runner="local"/> <destination id="regularjobs" runner="pbs" tags="cluster"> <param id="Resource_List">walltime=24:00:00,nodes=1:ppn=4,mem=10G</param> <param id="galaxy_external_runjob_script">scripts/drmaa_external_runner.py</param> <param id="galaxy_external_killjob_script">scripts/drmaa_external_killer.py</param> <param id="galaxy_external_chown_script">scripts/external_chown_script.py</param> </destination> </destinations> </job_conf>

The galaxy_external_* options are only supported with the drmaa plugin, and actually only belong in the univese_wsgi.ini for the moment, they have not been migrated to the new-style job configuration. They should also only be used if you are attempting to set up "run jobs as the real user" job running capabilities.

...
Further more when I want to kill my jobs by clicking <Catch(08-08-09-12-39).jpg> in galaxy web,the job keeps on running in my background.I do not know how to fix this. Any help on this would be grateful.Thank you very much.

Job deletion in the pbs runner was recently broken, but a fix for this bug will be part of the next stable release (on Monday).

--nate

...
shenwiyn

From: Jurgens de Bruin Date: 2013-08-07 19:55 To: galaxy-dev Subject: [galaxy-dev] Help with cluster setup Hi,

This is my first Galaxy installation setup so apologies for stupid

questions. I am setting up Galaxy on a Cluster running Torque as the resource manager. I am working through the documentation but I am unclear on some things:

...
Firstly I am unable to find : start_job_runners within the

universe_wsgi.ini and I dont want to just add this anywhere - any help on this would be create.

...
Further more this is my job_conf.xml :

<?xml version="1.0"?> 

...
<job_conf> <plugins> <plugin id="hpc" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner" workers="4"/> </plugins> <handlers> <!-- Additional job handlers - the id should match the name of a [server:<id>] in universe_wsgi.ini. <handler id="cn01"/> <handler id="cn02"/> </handlers> <destinations> <destination id="hpc" runner="drmaa"/> </destinations> </job_conf>

Does this look meaning full, further more where to I set the additional server:<id> in the universe_wsgi.ini.

As background the cluster has 13 compute nodes and a shared storage array that can be accessed by all nodes in the cluster.

Thanks again

-- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет

Jurgens de Bruin ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Nate Coraor

1 May 1 May

7:28 p.m.

On Wed, Apr 30, 2014 at 4:20 AM, 沈维燕 <shenwiyn@gmail.com> wrote:

...

Hi Nate, From your previous email,Job deletion in the pbs runner will be fixed in

the next stable release Galaxy.So whether this bug has been fixed in the version of Galaxy( https://bitbucket.org/galaxy/galaxy-dist/get/3b3365a39194.zip)?Thank you very much for your help.

...

Regards,weiyan

Hi Weiyan, Yes, this fix is included in the April, 2014 stable release. However, I would strongly encourage you to use `hg clone` rather than downloading a static tarball. There have been a number of patches to the stable branch since its April release. In addition, the tarball linked would pull from the "default" branch of Galaxy, which includes unstable changesets. --nate

...

2013-08-08 22:58 GMT+08:00 Nate Coraor <nate@bx.psu.edu>:

...
On Aug 7, 2013, at 9:23 PM, shenwiyn wrote:

...
Yes,and I also have the same confuse about that.Actually when I set

...

...
Hi Shenwiyn,

Are you starting all of the servers that you have defined in

universe_wsgi.ini? If using run.sh, setting GALAXY_RUN_ALL in the environment will do this for you:

...
http://wiki.galaxyproject.org/Admin/Config/Performance/Scaling

...
[server:node01] use = egg:Paste#http port = 8080 host = 0.0.0.0 use_threadpool = true threadpool_workers = 5 This is my job_conf.xml : <?xml version="1.0"?> <job_conf> <plugins workers="4"> <plugin id="local" type="runner"

load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>

...
...
<plugin id="pbs" type="runner"

load="galaxy.jobs.runners.pbs:PBSJobRunner" workers="8"/>

...
</plugins> <handlers default="batch"> <handler id="node01" tags="batch"/> <handler id="node02" tags="batch"/> </handlers> <destinations default="regularjobs"> <destination id="local" runner="local"/> <destination id="regularjobs" runner="pbs" tags="cluster"> <param

id="Resource_List">walltime=24:00:00,nodes=1:ppn=4,mem=10G</param>

...
<param

id="galaxy_external_runjob_script">scripts/drmaa_external_runner.py</param>

...
<param

id="galaxy_external_killjob_script">scripts/drmaa_external_killer.py</param>

...
<param

id="galaxy_external_chown_script">scripts/external_chown_script.py</param>

...
</destination> </destinations> </job_conf>

The galaxy_external_* options are only supported with the drmaa plugin, and actually only belong in the univese_wsgi.ini for the moment, they have not been migrated to the new-style job configuration. They should also only be used if you are attempting to set up "run jobs as the real user" job running capabilities.

...
Further more when I want to kill my jobs by clicking <Catch(08-08-09-12-39).jpg> in galaxy web,the job keeps on running in my background.I do not know how to fix this. Any help on this would be grateful.Thank you very much.

Job deletion in the pbs runner was recently broken, but a fix for this bug will be part of the next stable release (on Monday).

--nate

...
shenwiyn

From: Jurgens de Bruin Date: 2013-08-07 19:55 To: galaxy-dev Subject: [galaxy-dev] Help with cluster setup Hi,

This is my first Galaxy installation setup so apologies for stupid

questions. I am setting up Galaxy on a Cluster running Torque as the resource manager. I am working through the documentation but I am unclear on some things:

...
Firstly I am unable to find : start_job_runners within the

universe_wsgi.ini and I dont want to just add this anywhere - any help on

server:<id> in the universe_wsgi.ini as follows for a try,my Galaxy doesn't work with Cluster,if I remove server:<id>,it work . this would be create.

...

...
...
Further more this is my job_conf.xml :

<?xml version="1.0"?> 

...
<job_conf> <plugins> <plugin id="hpc" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner" workers="4"/> </plugins> <handlers> <!-- Additional job handlers - the id should match the name of a [server:<id>] in universe_wsgi.ini. <handler id="cn01"/> <handler id="cn02"/> </handlers> <destinations> <destination id="hpc" runner="drmaa"/> </destinations> </job_conf>

Does this look meaning full, further more where to I set the additional server:<id> in the universe_wsgi.ini.

As background the cluster has 13 compute nodes and a shared storage array that can be accessed by all nodes in the cluster.

Thanks again

-- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет

Jurgens de Bruin ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

Nate Coraor

8 Aug 8 Aug

2:53 p.m.

On Aug 7, 2013, at 7:55 AM, Jurgens de Bruin wrote:

...

Hi,

This is my first Galaxy installation setup so apologies for stupid questions. I am setting up Galaxy on a Cluster running Torque as the resource manager. I am working through the documentation but I am unclear on some things:

Firstly I am unable to find : start_job_runners within the universe_wsgi.ini and I dont want to just add this anywhere - any help on this would be create.

Hi Jurgens, This option was removed from universe_wsgi.ini, the job handler definitions are done via the job_conf.xml file now. The following page explains that syntax: http://wiki.galaxyproject.org/Admin/Config/Jobs However, the scaling page was still for the old style of configuration, so I have just updated it for the new style: http://wiki.galaxyproject.org/Admin/Config/Performance/Scaling

...

Further more this is my job_conf.xml :

<?xml version="1.0"?>  <job_conf> <plugins> <plugin id="hpc" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner" workers="4"/> </plugins> <handlers> <!-- Additional job handlers - the id should match the name of a [server:<id>] in universe_wsgi.ini. <handler id="cn01"/> <handler id="cn02"/> </handlers> <destinations> <destination id="hpc" runner="drmaa"/> </destinations> </job_conf>

You'll need to set a default handler. Try the following config: <?xml version="1.0"?> <job_conf> <plugins> <plugin id="hpc" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner" workers="4"/> </plugins> <handlers default="handlers"> <handler id="cn01" tags="handlers"/> <handler id="cn02" tags="handlers"/> </handlers> <destinations> <destination id="hpc" runner="drmaa"/> </destinations> </job_conf>

...

Does this look meaning full, further more where to I set the additional server:<id> in the universe_wsgi.ini.

You can add them anywhere between existing sections, although I'd do it near the top of the file, above the [app:main] section. Assuming you leave [server:main] in, you could add the following definitions just above [app:main]: [server:cn01] use = egg:Paste#http port = 8090 host = 127.0.0.1 use_threadpool = true threadpool_workers = 5 [server:cn02] use = egg:Paste#http port = 8091 host = 127.0.0.1 use_threadpool = true threadpool_workers = 5 Web requests would still be handled by the server configured with the host:port in [server:main]. --nate

...

As background the cluster has 13 compute nodes and a shared storage array that can be accessed by all nodes in the cluster.

...

Thanks again

-- Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ distinti saluti/siong/duì yú/привет

Jurgens de Bruin ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/

4228

Age (days ago)

4495

Last active (days ago)

List overview

Download

8 comments

4 participants

participants (4)

Jurgens de Bruin
Nate Coraor
shenwiyn
沈维燕