On Fri, Apr 8, 2016 at 12:01 PM, Poole, Richard <r.poole@ucl.ac.uk> wrote:
Hi Nate,

Thanks for the speedy reply (hope all is well with you)!

I use Pulsar as I need the ability to stage the cluster jobs as the cluster has no access to the data storage on my local machines (as well as being unable to submit directly to cluster from my machine). I will take a careful look at your job_conf.xml as an example and go from there - thanks. 

I few more specific questions:

- is it possible to set the global GALAXY_SLOTS value somewhere (for local submissions I use pretty much the default Galaxy job_conf settings which if I understand correctly set the number of GALAXY_SLOTS to machine cores?!).

It's only possible to directly set it for the "local" job runner. By default, its value is 1. It can be modified with the "local_slots" destination parameter, as shown here:

  https://github.com/galaxyproject/galaxy/blob/dev/config/job_conf.xml.sample_advanced#L146

The "workers" option in the <plugins> section of job_conf.xml also has an effect on local job concurrency: this is the number of concurrent jobs that the local runner plugin will start (the workers value means something else entirely for all other runner plugins). Thus with the local runner plugin and one job destination that uses the local runner plugin, the number of cores that would be used by jobs should be at most `workers * local_slots`

By combining "workers" and "local_slots" you can get rudimentary control over the number of local cores allowed for jobs, but it is imperfect. For true control, a proper DRM is needed.
 
- how then would this be varied for local vs Pulsar-staged (and cluster submitted) jobs?

With Pulsar, you can either define the DRM options to use under the job manager configured in Pulsar's app.yml. For example, the example in the Pulsar documentation is for an SGE cluster that would result in setting $GALAXY_SLOTS to 8:

  http://pulsar.readthedocs.org/en/latest/job_managers.html#drmaa

You can also configure the native specification as a destination parameter in Galaxy's job_conf.xml, if you prefer.
 
- is there an explanation of the GALAXY_SLOTS syntax somewhere e.g. what does the ':-4’ mean?

This is Bourne shell parameter substitution syntax:

  http://www.tldp.org/LDP/abs/html/parameter-substitution.html

  ${parameter-default}, ${parameter:-default}
  If parameter not set, use default.

This means "if $GALAXY_SLOTS is unset, substitute the value '4' in its place."

In case the documentation is unclear, $GALAXY_SLOTS is a variable for use in tool configuration files (and should probably default to "1", not "4"). When configuring a Galaxy server, you should not have to manipulate the $GALAXY_SLOTS variable directly.
 
--nate


Thanks,
Richard


On 8 Apr 2016, at 15:05, Nate Coraor <nate@bx.psu.edu> wrote:

On Fri, Apr 8, 2016 at 9:55 AM, Poole, Richard <r.poole@ucl.ac.uk> wrote:

Could somebody point me to a good explanation of how to setup and use GALAXY_SLOTS correctly on my server? 

A basic explanation is good but I also make use of Pulsar to stage some jobs on our cluster here (my machine is 4-core and cluster I use is 12-core) so I am wondering if GALAXY_SLOTS can handle this (so I don't need to specify exact thread numbers in e.g. tool wrappers)


Hi Richard,

Is your cluster running a distributed resource manager (DRM) like PBS, grid engine, etc.? If yes, then $GALAXY_SLOTS is handled automatically based on whatever options you submit to your cluster with. If you request that a job be allocated 12 cores on a node, $GALAXY_SLOTS will be set to 12 and any tools which respect $GALAXY_SLOTS (which include all of the multicore devteam and IUC tools) will use 12 cores accordingly.

It is important to only submit tools which can use multiple cores with the multicore option, otherwise you may allocate 12 cores for a tool which will only use 1, wasting resources. Here is the job configuration file we use for usegalaxy.org which shows how to map multicore tools to multicore destinations running the Slurm DRM:


For example, `bowtie2` (line 201) runs on the `slurm_multi` destination (126) via the `dynamic_local_stampede_select_dynamic_walltime` dynamic destination (in another file, but the details are not relevant, in your case you can map directly from a tool to multicore destination defined in job_conf.xml).

If your cluster is running a DRM, you most likely do not need to run Pulsar (Galaxy has native support for pretty much all commonly used DRMs) unless you need the ability to stage files to/from the cluster or do not have direct submit access to the cluster from the Galaxy server.

--nate
 

Thanks, 

Richard


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/