Thanks Nate. I just updated the github issue. It turns out this error was being caused by a configuration issue that required a job name be specified in the native spec passed to drmaa-run. With jobs submitted via sbatch, the name of the script was used when no job name was specified, but it was empty if not specified explicitly to drmaa-run. Once the admin changed job_script.lua to handle nil values for job name, the tests with drmaa-run started working with Slurm 18 08.8.
Unfortunately, this did not fix my related issue with submitting jobs from Galaxy using slurm-drmaa. I am still getting the same errors. Any suggestions where to look next?
Phil
On Fri, Sep 13, 2019, 11:53 AM Nate Coraor nate@bx.psu.edu wrote:
Hi Phil,
I followed up over on the Github issue, let's track it there and we can reply here for the sake of history once we figure out what's going on.
Thanks, --nate
On Thu, Sep 12, 2019 at 12:32 AM Philip Blood blood@psc.edu wrote:
Update: Nate Coraor pointed me to the drmaa-run utility in slurm-drmaa to do more focused testing, and it looks like the issue with running Slurm jobs from Galaxy comes down to *slurm-drmaa not working with the latest version of Slurm 18 -- 18.08.8.* I created an issue on the slurm-drmaa github page here https://github.com/natefoo/slurm-drmaa/issues/32.
Since 18.08.8 addresses a security vulnerability https://www.schedmd.com/news.php that is not addressed in previous versions of Slurm, it seems like this slurm-drmaa problem will be an important issue to address for all those running Galaxy jobs on Slurm clusters.
If anyone finds they *can* run jobs via slurm-drmaa with Slurm 18.08.8, I'd be interested to hear it.
Phil
On Tue, Sep 3, 2019 at 2:29 PM Philip Blood blood@psc.edu wrote:
Hi Folks,
I'm trying to get an old instance of Galaxy (16.01) working for a user
who
needs to use it this week for a class he is teaching (so upgrading
Galaxy
is not an option at the moment). Due to a recent slurm upgrade on our compute system to slurm 18.08.8, we had to replace the old slurm-drmaa 1.0.7 library http://apps.man.poznan.pl/trac/slurm-drmaa, which
doesn't
work with with 18.08.8, with Nate's forked slurm-drmaa library version 1.1.0 https://github.com/natefoo/slurm-drmaa. That built fine with slurm 18.08.8 and (I think) we updated all the relevant pointers in the galaxy config to point to the new slurm-drmaa 1.1.0 library.
However, now when I try to run jobs on our system I get errors (it
worked
fine before with slurm-drmaa 1.0.7 and the older version of slurm). So,
I
wanted to get a quick sanity check on whether this might be an issue
with
trying to use the new slurm-drmaa with an old version of Galaxy, 16.01,
or
if anyone has any other quick thoughts on troubleshooting this. The
errors
I get are below.
Best, Phil
*Short version (just the errors):* 198.91.54.159 - - [31/Aug/2019:16:31:28 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:30,366 (10) drmaa.Session.runJob() failed, will retry: code 1:
slurm_submit_batch_job
error (2): No such file or directory* 198.91.54.159 - - [31/Aug/2019:16:31:32 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:35,372 (10) drmaa.Session.runJob() failed, will retry: code 1:
slurm_submit_batch_job
error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:37 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:40,377 (10) drmaa.Session.runJob() failed, will retry: code 1:
slurm_submit_batch_job
error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:41 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:45 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:45,383 (10) drmaa.Session.runJob() failed, will retry: code 1:
slurm_submit_batch_job
error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:49 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:50,388 (10) drmaa.Session.runJob() failed, will retry: code 1:
slurm_submit_batch_job
error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:53 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa ERROR 2019-08-31 16:31:55,393 (10) All
attempts
to submit job failed *
*Full context:* 198.91.54.159 - - [31/Aug/2019:16:30:27 +0000] "GET /api/tools/squeue/build HTTP/1.1" 200 - "
https://galaxy.bridges.psc.edu/"
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" galaxy.tools DEBUG 2019-08-31 16:30:30,142 Validated and populated state for tool request (4.081 ms) galaxy.tools.actions INFO 2019-08-31 16:30:30,285 Handled output
(100.616
ms) galaxy.tools.actions INFO 2019-08-31 16:30:30,319 Verified access to datasets (0.005 ms) galaxy.tools.execute DEBUG 2019-08-31 16:30:30,368 Tool [squeue] created job [10] (206.086 ms) galaxy.tools.execute DEBUG 2019-08-31 16:30:30,376 Executed all jobs for tool request: (233.862 ms) 198.91.54.159 - - [31/Aug/2019:16:30:30 +0000] "POST /api/tools
HTTP/1.1"
200 - "https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:30 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" galaxy.jobs DEBUG 2019-08-31 16:30:30,747 (10) Working directory for
job
is: /opt/packages/galaxy/galaxy01/database/job_working_directory/000/10 galaxy.jobs.handler DEBUG 2019-08-31 16:30:30,751 (10) Dispatching to slurm runner galaxy.jobs DEBUG 2019-08-31 16:30:30,774 (10) Persisting job
destination
(destination id: LM4) galaxy.jobs.runners DEBUG 2019-08-31 16:30:30,790 Job [10] queued
(38.578
ms) galaxy.jobs.handler INFO 2019-08-31 16:30:30,818 (10) Job dispatched galaxy.tools.deps DEBUG 2019-08-31 16:30:31,012 Building dependency
shell
command for dependency 'slurm' galaxy.tools.deps DEBUG 2019-08-31 16:30:31,013 Find dependency slurm version None galaxy.tools.deps DEBUG 2019-08-31 16:30:31,013 Resolver tool_shed_packages returned <galaxy.tools.deps.resolvers.NullDependency object at 0x1b38390> (isnull? True) galaxy.tools.deps DEBUG 2019-08-31 16:30:31,014 Resolver galaxy_packages returned <galaxy.tools.deps.resolvers.galaxy_packages.GalaxyPackageDependency
object
at 0x7fc78c334750> (isnull? False) galaxy.jobs.command_factory INFO 2019-08-31 16:30:31,057 Built script
[/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/tool_script.sh]
for tool
command[PACKAGE_BASE=/opt/packages/galaxy/galaxy01/tool_dependencies/slurm/18.08.8;
export PACKAGE_BASE; . /opt/packages/galaxy/galaxy0 1/tool_dependencies/slurm/18.08.8/env.sh; echo "hostname:" > output;
echo
" " >> output; hostname >> output; echo " " >> output; env >> output;
echo
" " >> output; date >> output; echo " " >> output; echo "Uptime:" >> output; echo " " >> output; uptime >> output; echo " " >> output; echo "Module s:" >> output; echo " " >> output; module avail >> output 2>&1; echo " "
output; echo "SLURM Queue Status" >> output; echo " " >> output; echo
"If your job is running on the queues, it will be listed in the reports below:" >> output; echo " " >> output; echo " " >> output; echo "Normal Report: s queue" >> output; echo " " >> output; echo " " >> output; echo " " >> output; squeue >> output; date >> output; echo " " >> output; echo " "
output; echo "*** Full Report: squeue -l ***" >> output; echo " " >> output; squeue -l >> output; echo " " >> output; echo " " >> output;
date
output;
echo " " >> output; echo "Local: ${LOCAL}" >> output; echo "Ramdisk: ${RAMDISK}" >> output; workdir=`pwd`; echo "workdir is $workdir" >>
output;
cd $LOCAL; echo "i am in `pwd`" >> $workdir/output; cd $workdir; echo
"i am
in `pwd`" >> output; date >> output; echo " " >> output] galaxy.tools.deps DEBUG 2019-08-31 16:30:31,259 Building dependency
shell
command for dependency 'samtools' galaxy.tools.deps DEBUG 2019-08-31 16:30:31,259 Find dependency samtools version None galaxy.tools.deps DEBUG 2019-08-31 16:30:31,259 Resolver tool_shed_packages returned <galaxy.tools.deps.resolvers.NullDependency object at 0x1b38390> (isnull? True) galaxy.tools.deps DEBUG 2019-08-31 16:30:31,279 Resolver galaxy_packages returned <galaxy.tools.deps.resolvers.galaxy_packages.GalaxyPackageDependency
object
at 0x7fc7b01e01d0> (isnull? False) galaxy.jobs.runners DEBUG 2019-08-31 16:30:31,284 (10) command is:
/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/tool_script.sh;
return_code=$?; if [ -f
/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/output
] ; then cp /opt/packages/galaxy/galaxy01/databas e/job_working_directory/000/10/output /opt/packages/galaxy/galaxy01/database/files/000/dataset_10.dat ; fi;
PACKAGE_BASE=/opt/packages/galaxy/galaxy01/tool_dependencies/samtools/0.1.19;
export PACKAGE_BASE; . /opt/packages/galaxy/galaxy01/tool_dependencies/samtools/0.1.19/env.sh; python "/opt/packa
ges/galaxy/galaxy01/database/job_working_directory/000/10/set_metadata_jGYkkM.py"
"/opt/packages/galaxy/galaxy01/tmp/tmpxmB5GA"
"/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/galaxy.json"
"/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_in_HistoryD
atasetAssociation_10_u2k7qq,/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_kwds_HistoryDatasetAssociation_10_2ZmSXR,/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_out_HistoryDatasetAssociation_10_shC78c,/opt/packages/galaxy/galaxy01/databa
se/job_working_directory/000/10/metadata_results_HistoryDatasetAssociation_10_57x96D,/opt/packages/galaxy/galaxy01/database/files/000/dataset_10.dat,/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_override_HistoryDatasetAssociation_10_AEeOfc"
5242880; sh -c "exit $retur n_code" galaxy.jobs.runners.drmaa DEBUG 2019-08-31 16:30:31,356 (10) submitting file
/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/galaxy_10.sh
galaxy.jobs.runners.drmaa DEBUG 2019-08-31 16:30:31,356 (10) native specification is: -p LM -C LM -N 1 -n 4 --ntasks-per-node=4
--mem=192500 -t
24:00:00 198.91.54.159 - - [31/Aug/2019:16:30:34 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:38 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:42 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:47 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:51 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:55 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:59 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:03 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:08 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:12 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:16 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:20 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:24 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:28 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:30,366 (10) drmaa.Session.runJob() failed, will retry: code 1:
slurm_submit_batch_job
error (2): No such file or directory* 198.91.54.159 - - [31/Aug/2019:16:31:32 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:35,372 (10) drmaa.Session.runJob() failed, will retry: code 1:
slurm_submit_batch_job
error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:37 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:40,377 (10) drmaa.Session.runJob() failed, will retry: code 1:
slurm_submit_batch_job
error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:41 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:45 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:45,383 (10) drmaa.Session.runJob() failed, will retry: code 1:
slurm_submit_batch_job
error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:49 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:50,388 (10) drmaa.Session.runJob() failed, will retry: code 1:
slurm_submit_batch_job
error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:53 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa ERROR 2019-08-31 16:31:55,393 (10) All
attempts
to submit job failed *
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: %(web_page_url)s
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/