On Wed, Jun 30, 2021 at 11:00 AM Luc Cornet <luc.cornet@uliege.be> wrote:
Thanks for the infos.

On our HPC system, slurm is a container pre-installed by the company who installed the system.
slurm-drmaa is not included in the container and installing would be difficult.
This is why we choose CLI instead.

In order to use drmaa with galaxy, we should install slurm-drmaa in the pre-installed slurm container. Is it correct ?

Whichever application (Galaxy or Pulsar - you said Pulsar originally but Galaxy in your most recent message) is going to interact with Slurm needs to be installed on a system configured as a Slurm client. This means that you should be able to run `squeue`, `sinfo`, etc. from the command line on that system. Once that is the case, there are 3 steps:

1. Install slurm-drmaa on the Galaxy or Pulsar server, *not* in the slurm Controller container.
2. `pip install drmaa` into Galaxy or Pulsar's virtualenv. This is already done in the case of your Pulsar server since you have `drmaa` included in `pulsar_optional_dependencies`. For Galaxy, the `galaxyproject.galaxy` role will do this for you automatically if you have enabled a DRMAA-based job runner plugin (e.g DRMAAJobRunner or SlurmJobRunner) in job_conf.xml.
3. Configure the Python drmaa library to find slurm-drmaa's libdrmaa.so, either via the DRMAA_LIBRARY_PATH environment variable, or in the case of Galaxy, in the `drmaa_library_path` runner plugin param, as shown in this example: https://github.com/galaxyproject/galaxy/blob/e74239e010ece4a4b22d7a6fe0f0f3d96b67001b/lib/galaxy/config/sample/job_conf.xml.sample_advanced#L25
 
--nate


best,
Luc

------------
Luc Cornet, PhD
Bio-informatician
Mycology and Aerobiology
Sciensano

----- Mail original -----
De: "Nate Coraor" <nate@bx.psu.edu>
À: "Luc Cornet" <luc.cornet@uliege.be>
Cc: "HelpGalaxy" <galaxy-dev@lists.galaxyproject.org>, "Baurain Denis" <Denis.Baurain@uliege.be>, "Pierre Becker" <Pierre.Becker@sciensano.be>, "Colignon David" <David.Colignon@uliege.be>
Envoyé: Mercredi 30 Juin 2021 16:50:13
Objet: [galaxy-dev] Re: Galaxy install problems

On Wed, Jun 30, 2021 at 10:30 AM Luc Cornet < [ mailto:luc.cornet@uliege.be | luc.cornet@uliege.be ] > wrote:


Dear Marius,

Many thank for your feedback.

I join to this email: the playbook, the pulsarservers.yml file and the log of pulsar playbook.

CLI plugin is for us the best solution since we have nothing to maintain. DRMAA is not actively developed for slurm, correct ?

Just to clarify, I am actively creating slurm-drmaa releases from updates that I've done plus many community contributions at [ https://github.com/natefoo/slurm-drmaa | https://github.com/natefoo/slurm-drmaa ] . We have fixed incompatibilities and bugs in slurm-drmaa with newer versions of Slurm, and have added support for new features in newer versions of Slurm. It was never a part of Slurm, if that is what you're asking, but as Marius said, it is in active use on [ http://usegalaxy.org/ | usegalaxy.org ] and many other Galaxy servers, as well as in other applications.




In the playbook, we use systemd which I think should restart pulsar but It might not be the case:

TASK [galaxyproject.pulsar : systemd daemon-reload and enable/start service] ****************************************************************************
ok: [HPC]

RUNNING HANDLER [galaxyproject.pulsar : default restart pulsar handler] *********************************************************************************
skipping: [HPC]

Currently, we never used DRMAA. The job were executed immediately on the cluster with CLI or DRMAA. We had this part in pulsarservers.yml, to activate CLI:
managers:
_default_:
type: queued_cli
job_plugin: slurm
native_specification: "-p batch --tasks=1 --cpus-per-task=2 --mem-per-cpu=1000 -t 10:00"
min_polling_interval: 0.5
amqp_publish_retry: True
amqp_publish_retry_max_retries: 5
amqp_publish_retry_interval_start: 10
amqp_publish_retry_interval_step: 10
amqp_publish_retry_interval_max: 60


Thanks for your help,
Luc


------------
Luc Cornet, PhD
Bio-informatician
Mycology and Aerobiology
Sciensano

----- Mail original -----
De: "Marius van den Beek" < [ mailto:m.vandenbeek@gmail.com | m.vandenbeek@gmail.com ] >
À: "Luc Cornet" < [ mailto:luc.cornet@uliege.be | luc.cornet@uliege.be ] >
Cc: "HelpGalaxy" < [ mailto:galaxy-dev@lists.galaxyproject.org | galaxy-dev@lists.galaxyproject.org ] >, "Baurain Denis" < [ mailto:Denis.Baurain@uliege.be | Denis.Baurain@uliege.be ] >, "Pierre Becker" < [ mailto:Pierre.Becker@sciensano.be | Pierre.Becker@sciensano.be ] >, "Colignon David" < [ mailto:David.Colignon@uliege.be | David.Colignon@uliege.be ] >
Envoyé: Mercredi 30 Juin 2021 16:02:04
Objet: [galaxy-dev] Re: Galaxy install problems

Hi Luc,

I'm sorry to hear that you're struggling to set up Galaxy to your liking.
Let me start by pointing out that [ [ http://usegalaxy.org/ | http://usegalaxy.org/ ] | [ http://usegalaxy.org/ | usegalaxy.org ] ] uses slurm with DRMAA, this is certainly going to be more performant and reliable than the CLI plugin.
There is little maintenance necessary, so maybe that is why activity on slurm-drmaa is low (See also [ [ https://github.com/natefoo/slurm-drmaa | https://github.com/natefoo/slurm-drmaa ] | [ https://github.com/natefoo/slurm-drmaa | https://github.com/natefoo/slurm-drmaa ] ] ).
I would be curious to know how you came to the conclusion that there is some incompatibility between DRMAA and slurm
Note that one of the setups we teach during the training submits via DRMAA to slurm.

Then I'd like to point out that there are a huge variety of different ways in which you can configure Galaxy and the job submission.
We teach the most common ones during the training week, with the aim that you understand how these things work together,
as well as giving you a handle on how you can manage these different settings and services using a configuration management system.
We cannot tailor a solution to your infrastructure during this week.

About your problem specifically, I had asked this on gitter before:

> Did you restart pulsar after rolling out the new config ?

to which you've answered that you re-ran the playbook, but that's not a sufficient answer.

Every playbook is different, and we cannot know if this includes a restarter service for pulsar.
Also please don't assume that everyone that could potentially help you knows ansible and the playbooks that are being taught intimately,
and in what ways you have customized your playbook.
It is much more helpful to write up the relevant settings you've changed and the logs that go with it.

You've also been asked to provide logs of the restart, which as far as I can tell you haven't provided.
You had mentioned on gitter that pulsar continues to use DRMAA to submit jobs, so you'll
want to double check whether you've really restarted pulsar after the config changes,
and look at the startup logs for pulsar, and find out how it is possible for pulsar to submit jobs
via drmaa if it is not set up to do so.

Best,
Marius




___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
%(web_page_url)s

To search Galaxy mailing lists use the unified search at:
[ http://galaxyproject.org/search/ | http://galaxyproject.org/search/ ] ___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
%(web_page_url)s

To search Galaxy mailing lists use the unified search at:
[ http://galaxyproject.org/search/ | http://galaxyproject.org/search/ ]

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  %(web_page_url)s

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/