Update: Nate Coraor pointed me to the drmaa-run utility in slurm-drmaa to do more focused testing, and it looks like the issue with running Slurm jobs from Galaxy comes down to *slurm-drmaa not working with the latest version of Slurm 18 -- 18.08.8.* I created an issue on the slurm-drmaa github page here <https://github.com/natefoo/slurm-drmaa/issues/32>. Since 18.08.8 addresses a security vulnerability <https://www.schedmd.com/news.php> that is not addressed in previous versions of Slurm, it seems like this slurm-drmaa problem will be an important issue to address for all those running Galaxy jobs on Slurm clusters. If anyone finds they *can* run jobs via slurm-drmaa with Slurm 18.08.8, I'd be interested to hear it. Phil On Tue, Sep 3, 2019 at 2:29 PM Philip Blood <blood@psc.edu> wrote:
Hi Folks,
I'm trying to get an old instance of Galaxy (16.01) working for a user who needs to use it this week for a class he is teaching (so upgrading Galaxy is not an option at the moment). Due to a recent slurm upgrade on our compute system to slurm 18.08.8, we had to replace the old slurm-drmaa 1.0.7 library <http://apps.man.poznan.pl/trac/slurm-drmaa>, which doesn't work with with 18.08.8, with Nate's forked slurm-drmaa library version 1.1.0 <https://github.com/natefoo/slurm-drmaa>. That built fine with slurm 18.08.8 and (I think) we updated all the relevant pointers in the galaxy config to point to the new slurm-drmaa 1.1.0 library.
However, now when I try to run jobs on our system I get errors (it worked fine before with slurm-drmaa 1.0.7 and the older version of slurm). So, I wanted to get a quick sanity check on whether this might be an issue with trying to use the new slurm-drmaa with an old version of Galaxy, 16.01, or if anyone has any other quick thoughts on troubleshooting this. The errors I get are below.
Best, Phil
*Short version (just the errors):* 198.91.54.159 - - [31/Aug/2019:16:31:28 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:30,366 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (2): No such file or directory* 198.91.54.159 - - [31/Aug/2019:16:31:32 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:35,372 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:37 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:40,377 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:41 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:45 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:45,383 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:49 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:50,388 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:53 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa ERROR 2019-08-31 16:31:55,393 (10) All attempts to submit job failed *
output; echo "SLURM Queue Status" >> output; echo " " >> output; echo "If your job is running on the queues, it will be listed in the reports below:" >> output; echo " " >> output; echo " " >> output; echo "Normal Report: s queue" >> output; echo " " >> output; echo " " >> output; echo " " >> output; squeue >> output; date >> output; echo " " >> output; echo " " >> output; echo "*** Full Report: squeue -l ***" >> output; echo " " >> output; squeue -l >> output; echo " " >> output; echo " " >> output; date output; echo " " >> output; echo "Local: ${LOCAL}" >> output; echo "Ramdisk: ${RAMDISK}" >> output; workdir=`pwd`; echo "workdir is $workdir" >> output; cd $LOCAL; echo "i am in `pwd`" >> $workdir/output; cd $workdir; echo "i am in `pwd`" >> output; date >> output; echo " " >> output] galaxy.tools.deps DEBUG 2019-08-31 16:30:31,259 Building dependency shell command for dependency 'samtools' galaxy.tools.deps DEBUG 2019-08-31 16:30:31,259 Find dependency samtools version None galaxy.tools.deps DEBUG 2019-08-31 16:30:31,259 Resolver tool_shed_packages returned <galaxy.tools.deps.resolvers.NullDependency object at 0x1b38390> (isnull? True) galaxy.tools.deps DEBUG 2019-08-31 16:30:31,279 Resolver galaxy_packages returned <galaxy.tools.deps.resolvers.galaxy_packages.GalaxyPackageDependency object at 0x7fc7b01e01d0> (isnull? False) galaxy.jobs.runners DEBUG 2019-08-31 16:30:31,284 (10) command is: /opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/tool_script.sh; return_code=$?; if [ -f /opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/output ] ; then cp /opt/packages/galaxy/galaxy01/databas e/job_working_directory/000/10/output /opt/packages/galaxy/galaxy01/database/files/000/dataset_10.dat ; fi; PACKAGE_BASE=/opt/packages/galaxy/galaxy01/tool_dependencies/samtools/0.1.19; export PACKAGE_BASE; . /opt/packages/galaxy/galaxy01/tool_dependencies/samtools/0.1.19/env.sh;
*Full context:* 198.91.54.159 - - [31/Aug/2019:16:30:27 +0000] "GET /api/tools/squeue/build HTTP/1.1" 200 - "https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" galaxy.tools DEBUG 2019-08-31 16:30:30,142 Validated and populated state for tool request (4.081 ms) galaxy.tools.actions INFO 2019-08-31 16:30:30,285 Handled output (100.616 ms) galaxy.tools.actions INFO 2019-08-31 16:30:30,319 Verified access to datasets (0.005 ms) galaxy.tools.execute DEBUG 2019-08-31 16:30:30,368 Tool [squeue] created job [10] (206.086 ms) galaxy.tools.execute DEBUG 2019-08-31 16:30:30,376 Executed all jobs for tool request: (233.862 ms) 198.91.54.159 - - [31/Aug/2019:16:30:30 +0000] "POST /api/tools HTTP/1.1" 200 - "https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:30 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" galaxy.jobs DEBUG 2019-08-31 16:30:30,747 (10) Working directory for job is: /opt/packages/galaxy/galaxy01/database/job_working_directory/000/10 galaxy.jobs.handler DEBUG 2019-08-31 16:30:30,751 (10) Dispatching to slurm runner galaxy.jobs DEBUG 2019-08-31 16:30:30,774 (10) Persisting job destination (destination id: LM4) galaxy.jobs.runners DEBUG 2019-08-31 16:30:30,790 Job [10] queued (38.578 ms) galaxy.jobs.handler INFO 2019-08-31 16:30:30,818 (10) Job dispatched galaxy.tools.deps DEBUG 2019-08-31 16:30:31,012 Building dependency shell command for dependency 'slurm' galaxy.tools.deps DEBUG 2019-08-31 16:30:31,013 Find dependency slurm version None galaxy.tools.deps DEBUG 2019-08-31 16:30:31,013 Resolver tool_shed_packages returned <galaxy.tools.deps.resolvers.NullDependency object at 0x1b38390> (isnull? True) galaxy.tools.deps DEBUG 2019-08-31 16:30:31,014 Resolver galaxy_packages returned <galaxy.tools.deps.resolvers.galaxy_packages.GalaxyPackageDependency object at 0x7fc78c334750> (isnull? False) galaxy.jobs.command_factory INFO 2019-08-31 16:30:31,057 Built script [/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/tool_script.sh] for tool command[PACKAGE_BASE=/opt/packages/galaxy/galaxy01/tool_dependencies/slurm/18.08.8; export PACKAGE_BASE; . /opt/packages/galaxy/galaxy0 1/tool_dependencies/slurm/18.08.8/env.sh; echo "hostname:" > output; echo " " >> output; hostname >> output; echo " " >> output; env >> output; echo " " >> output; date >> output; echo " " >> output; echo "Uptime:" >> output; echo " " >> output; uptime >> output; echo " " >> output; echo "Module s:" >> output; echo " " >> output; module avail >> output 2>&1; echo " " python "/opt/packa ges/galaxy/galaxy01/database/job_working_directory/000/10/set_metadata_jGYkkM.py" "/opt/packages/galaxy/galaxy01/tmp/tmpxmB5GA" "/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/galaxy.json" "/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_in_HistoryD
atasetAssociation_10_u2k7qq,/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_kwds_HistoryDatasetAssociation_10_2ZmSXR,/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_out_HistoryDatasetAssociation_10_shC78c,/opt/packages/galaxy/galaxy01/databa se/job_working_directory/000/10/metadata_results_HistoryDatasetAssociation_10_57x96D,/opt/packages/galaxy/galaxy01/database/files/000/dataset_10.dat,/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_override_HistoryDatasetAssociation_10_AEeOfc" 5242880; sh -c "exit $retur n_code" galaxy.jobs.runners.drmaa DEBUG 2019-08-31 16:30:31,356 (10) submitting file /opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/galaxy_10.sh galaxy.jobs.runners.drmaa DEBUG 2019-08-31 16:30:31,356 (10) native specification is: -p LM -C LM -N 1 -n 4 --ntasks-per-node=4 --mem=192500 -t 24:00:00 198.91.54.159 - - [31/Aug/2019:16:30:34 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:38 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:42 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:47 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:51 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:55 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:59 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:03 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:08 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:12 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:16 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:20 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:24 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:28 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:30,366 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (2): No such file or directory* 198.91.54.159 - - [31/Aug/2019:16:31:32 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:35,372 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:37 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:40,377 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:41 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:45 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:45,383 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:49 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:50,388 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:53 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa ERROR 2019-08-31 16:31:55,393 (10) All attempts to submit job failed *