Hi Phil, I followed up over on the Github issue, let's track it there and we can reply here for the sake of history once we figure out what's going on. Thanks, --nate On Thu, Sep 12, 2019 at 12:32 AM Philip Blood <blood@psc.edu> wrote:
Update: Nate Coraor pointed me to the drmaa-run utility in slurm-drmaa to do more focused testing, and it looks like the issue with running Slurm jobs from Galaxy comes down to *slurm-drmaa not working with the latest version of Slurm 18 -- 18.08.8.* I created an issue on the slurm-drmaa github page here <https://github.com/natefoo/slurm-drmaa/issues/32>.
Since 18.08.8 addresses a security vulnerability <https://www.schedmd.com/news.php> that is not addressed in previous versions of Slurm, it seems like this slurm-drmaa problem will be an important issue to address for all those running Galaxy jobs on Slurm clusters.
If anyone finds they *can* run jobs via slurm-drmaa with Slurm 18.08.8, I'd be interested to hear it.
Phil
On Tue, Sep 3, 2019 at 2:29 PM Philip Blood <blood@psc.edu> wrote:
Hi Folks,
I'm trying to get an old instance of Galaxy (16.01) working for a user who needs to use it this week for a class he is teaching (so upgrading Galaxy is not an option at the moment). Due to a recent slurm upgrade on our compute system to slurm 18.08.8, we had to replace the old slurm-drmaa 1.0.7 library <http://apps.man.poznan.pl/trac/slurm-drmaa>, which doesn't work with with 18.08.8, with Nate's forked slurm-drmaa library version 1.1.0 <https://github.com/natefoo/slurm-drmaa>. That built fine with slurm 18.08.8 and (I think) we updated all the relevant pointers in the galaxy config to point to the new slurm-drmaa 1.1.0 library.
However, now when I try to run jobs on our system I get errors (it worked fine before with slurm-drmaa 1.0.7 and the older version of slurm). So, I wanted to get a quick sanity check on whether this might be an issue with trying to use the new slurm-drmaa with an old version of Galaxy, 16.01, or if anyone has any other quick thoughts on troubleshooting this. The errors I get are below.
Best, Phil
*Short version (just the errors):* 198.91.54.159 - - [31/Aug/2019:16:31:28 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:30,366 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (2): No such file or directory* 198.91.54.159 - - [31/Aug/2019:16:31:32 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:35,372 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:37 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:40,377 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:41 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:45 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:45,383 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:49 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:50,388 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:53 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa ERROR 2019-08-31 16:31:55,393 (10) All attempts to submit job failed *
*Full context:* 198.91.54.159 - - [31/Aug/2019:16:30:27 +0000] "GET /api/tools/squeue/build HTTP/1.1" 200 - "https://galaxy.bridges.psc.edu/ " "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" galaxy.tools DEBUG 2019-08-31 16:30:30,142 Validated and populated state for tool request (4.081 ms) galaxy.tools.actions INFO 2019-08-31 16:30:30,285 Handled output (100.616 ms) galaxy.tools.actions INFO 2019-08-31 16:30:30,319 Verified access to datasets (0.005 ms) galaxy.tools.execute DEBUG 2019-08-31 16:30:30,368 Tool [squeue] created job [10] (206.086 ms) galaxy.tools.execute DEBUG 2019-08-31 16:30:30,376 Executed all jobs for tool request: (233.862 ms) 198.91.54.159 - - [31/Aug/2019:16:30:30 +0000] "POST /api/tools HTTP/1.1" 200 - "https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:30 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" galaxy.jobs DEBUG 2019-08-31 16:30:30,747 (10) Working directory for job is: /opt/packages/galaxy/galaxy01/database/job_working_directory/000/10 galaxy.jobs.handler DEBUG 2019-08-31 16:30:30,751 (10) Dispatching to slurm runner galaxy.jobs DEBUG 2019-08-31 16:30:30,774 (10) Persisting job destination (destination id: LM4) galaxy.jobs.runners DEBUG 2019-08-31 16:30:30,790 Job [10] queued (38.578 ms) galaxy.jobs.handler INFO 2019-08-31 16:30:30,818 (10) Job dispatched galaxy.tools.deps DEBUG 2019-08-31 16:30:31,012 Building dependency shell command for dependency 'slurm' galaxy.tools.deps DEBUG 2019-08-31 16:30:31,013 Find dependency slurm version None galaxy.tools.deps DEBUG 2019-08-31 16:30:31,013 Resolver tool_shed_packages returned <galaxy.tools.deps.resolvers.NullDependency object at 0x1b38390> (isnull? True) galaxy.tools.deps DEBUG 2019-08-31 16:30:31,014 Resolver galaxy_packages returned <galaxy.tools.deps.resolvers.galaxy_packages.GalaxyPackageDependency object at 0x7fc78c334750> (isnull? False) galaxy.jobs.command_factory INFO 2019-08-31 16:30:31,057 Built script
[/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/tool_script.sh]
for tool
command[PACKAGE_BASE=/opt/packages/galaxy/galaxy01/tool_dependencies/slurm/18.08.8;
export PACKAGE_BASE; . /opt/packages/galaxy/galaxy0 1/tool_dependencies/slurm/18.08.8/env.sh; echo "hostname:" > output; echo " " >> output; hostname >> output; echo " " >> output; env >> output; echo " " >> output; date >> output; echo " " >> output; echo "Uptime:" >> output; echo " " >> output; uptime >> output; echo " " >> output; echo "Module s:" >> output; echo " " >> output; module avail >> output 2>&1; echo " "
output; echo "SLURM Queue Status" >> output; echo " " >> output; echo "If your job is running on the queues, it will be listed in the reports below:" >> output; echo " " >> output; echo " " >> output; echo "Normal Report: s queue" >> output; echo " " >> output; echo " " >> output; echo " " >> output; squeue >> output; date >> output; echo " " >> output; echo " "
output; echo "*** Full Report: squeue -l ***" >> output; echo " " >> output; squeue -l >> output; echo " " >> output; echo " " >> output; date
output; echo " " >> output; echo "Local: ${LOCAL}" >> output; echo "Ramdisk: ${RAMDISK}" >> output; workdir=`pwd`; echo "workdir is $workdir" >> output; cd $LOCAL; echo "i am in `pwd`" >> $workdir/output; cd $workdir; echo "i am in `pwd`" >> output; date >> output; echo " " >> output] galaxy.tools.deps DEBUG 2019-08-31 16:30:31,259 Building dependency shell command for dependency 'samtools' galaxy.tools.deps DEBUG 2019-08-31 16:30:31,259 Find dependency samtools version None galaxy.tools.deps DEBUG 2019-08-31 16:30:31,259 Resolver tool_shed_packages returned <galaxy.tools.deps.resolvers.NullDependency object at 0x1b38390> (isnull? True) galaxy.tools.deps DEBUG 2019-08-31 16:30:31,279 Resolver galaxy_packages returned <galaxy.tools.deps.resolvers.galaxy_packages.GalaxyPackageDependency object at 0x7fc7b01e01d0> (isnull? False) galaxy.jobs.runners DEBUG 2019-08-31 16:30:31,284 (10) command is:
/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/tool_script.sh;
return_code=$?; if [ -f
/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/output
] ; then cp /opt/packages/galaxy/galaxy01/databas e/job_working_directory/000/10/output /opt/packages/galaxy/galaxy01/database/files/000/dataset_10.dat ; fi;
PACKAGE_BASE=/opt/packages/galaxy/galaxy01/tool_dependencies/samtools/0.1.19;
export PACKAGE_BASE; . /opt/packages/galaxy/galaxy01/tool_dependencies/samtools/0.1.19/env.sh; python "/opt/packa
ges/galaxy/galaxy01/database/job_working_directory/000/10/set_metadata_jGYkkM.py"
"/opt/packages/galaxy/galaxy01/tmp/tmpxmB5GA"
"/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/galaxy.json"
"/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_in_HistoryD
atasetAssociation_10_u2k7qq,/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_kwds_HistoryDatasetAssociation_10_2ZmSXR,/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_out_HistoryDatasetAssociation_10_shC78c,/opt/packages/galaxy/galaxy01/databa
se/job_working_directory/000/10/metadata_results_HistoryDatasetAssociation_10_57x96D,/opt/packages/galaxy/galaxy01/database/files/000/dataset_10.dat,/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_override_HistoryDatasetAssociation_10_AEeOfc"
5242880; sh -c "exit $retur n_code" galaxy.jobs.runners.drmaa DEBUG 2019-08-31 16:30:31,356 (10) submitting file
/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/galaxy_10.sh
galaxy.jobs.runners.drmaa DEBUG 2019-08-31 16:30:31,356 (10) native specification is: -p LM -C LM -N 1 -n 4 --ntasks-per-node=4 --mem=192500 -t 24:00:00 198.91.54.159 - - [31/Aug/2019:16:30:34 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:38 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:42 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:47 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:51 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:55 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:30:59 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:03 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:08 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:12 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:16 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:20 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:24 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:28 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:30,366 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (2): No such file or directory* 198.91.54.159 - - [31/Aug/2019:16:31:32 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:35,372 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:37 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:40,377 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:41 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" 198.91.54.159 - - [31/Aug/2019:16:31:45 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:45,383 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:49 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:50,388 (10) drmaa.Session.runJob() failed, will retry: code 1: slurm_submit_batch_job error (0): No error* 198.91.54.159 - - [31/Aug/2019:16:31:53 +0000] "GET /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - " https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0" *galaxy.jobs.runners.drmaa ERROR 2019-08-31 16:31:55,393 (10) All attempts to submit job failed *
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: %(web_page_url)s
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/