Thanks Nate. I just updated the github issue. It turns out this error was
being caused by a configuration issue that required a job name be specified
in the native spec passed to drmaa-run. With jobs submitted via sbatch, the
name of the script was used when no job name was specified, but it was
empty if not specified explicitly to drmaa-run. Once the admin changed
job_script.lua to handle nil values for job name, the tests with drmaa-run
started working with Slurm 18 08.8.
Unfortunately, this did not fix my related issue with submitting jobs from
Galaxy using slurm-drmaa. I am still getting the same errors. Any
suggestions where to look next?
Phil
On Fri, Sep 13, 2019, 11:53 AM Nate Coraor <nate(a)bx.psu.edu> wrote:
Hi Phil,
I followed up over on the Github issue, let's track it there and we can
reply here for the sake of history once we figure out what's going on.
Thanks,
--nate
On Thu, Sep 12, 2019 at 12:32 AM Philip Blood <blood(a)psc.edu> wrote:
> Update: Nate Coraor pointed me to the drmaa-run utility in slurm-drmaa to
> do more focused testing, and it looks like the issue with running Slurm
> jobs from Galaxy comes down to *slurm-drmaa not working with the latest
> version of Slurm 18 -- 18.08.8.* I created an issue on the slurm-drmaa
> github page here <
https://github.com/natefoo/slurm-drmaa/issues/32>.
>
> Since 18.08.8 addresses a security vulnerability
> <
https://www.schedmd.com/news.php> that is not addressed in previous
> versions of Slurm, it seems like this slurm-drmaa problem will be an
> important issue to address for all those running Galaxy jobs on Slurm
> clusters.
>
> If anyone finds they *can* run jobs via slurm-drmaa with Slurm 18.08.8,
> I'd
> be interested to hear it.
>
> Phil
>
> On Tue, Sep 3, 2019 at 2:29 PM Philip Blood <blood(a)psc.edu> wrote:
>
> > Hi Folks,
> >
> > I'm trying to get an old instance of Galaxy (16.01) working for a user
> who
> > needs to use it this week for a class he is teaching (so upgrading
> Galaxy
> > is not an option at the moment). Due to a recent slurm upgrade on our
> > compute system to slurm 18.08.8, we had to replace the old slurm-drmaa
> > 1.0.7 library <
http://apps.man.poznan.pl/trac/slurm-drmaa>, which
> doesn't
> > work with with 18.08.8, with Nate's forked slurm-drmaa library version
> > 1.1.0 <
https://github.com/natefoo/slurm-drmaa>. That built fine with
> > slurm 18.08.8 and (I think) we updated all the relevant pointers in the
> > galaxy config to point to the new slurm-drmaa 1.1.0 library.
> >
> > However, now when I try to run jobs on our system I get errors (it
> worked
> > fine before with slurm-drmaa 1.0.7 and the older version of slurm). So,
> I
> > wanted to get a quick sanity check on whether this might be an issue
> with
> > trying to use the new slurm-drmaa with an old version of Galaxy, 16.01,
> or
> > if anyone has any other quick thoughts on troubleshooting this. The
> errors
> > I get are below.
> >
> > Best,
> > Phil
> >
> > *Short version (just the errors):*
> > 198.91.54.159 - - [31/Aug/2019:16:31:28 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:30,366 (10)
> > drmaa.Session.runJob() failed, will retry: code 1:
> slurm_submit_batch_job
> > error (2): No such file or directory*
> > 198.91.54.159 - - [31/Aug/2019:16:31:32 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:35,372 (10)
> > drmaa.Session.runJob() failed, will retry: code 1:
> slurm_submit_batch_job
> > error (0): No error*
> > 198.91.54.159 - - [31/Aug/2019:16:31:37 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:40,377 (10)
> > drmaa.Session.runJob() failed, will retry: code 1:
> slurm_submit_batch_job
> > error (0): No error*
> > 198.91.54.159 - - [31/Aug/2019:16:31:41 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > 198.91.54.159 - - [31/Aug/2019:16:31:45 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:45,383 (10)
> > drmaa.Session.runJob() failed, will retry: code 1:
> slurm_submit_batch_job
> > error (0): No error*
> > 198.91.54.159 - - [31/Aug/2019:16:31:49 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:50,388 (10)
> > drmaa.Session.runJob() failed, will retry: code 1:
> slurm_submit_batch_job
> > error (0): No error*
> > 198.91.54.159 - - [31/Aug/2019:16:31:53 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > *galaxy.jobs.runners.drmaa ERROR 2019-08-31 16:31:55,393 (10) All
> attempts
> > to submit job failed *
> >
> >
> > *Full context:*
> > 198.91.54.159 - - [31/Aug/2019:16:30:27 +0000] "GET
> > /api/tools/squeue/build HTTP/1.1" 200 - "
>
https://galaxy.bridges.psc.edu/"
> > "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
> > Firefox/68.0"
> > galaxy.tools DEBUG 2019-08-31 16:30:30,142 Validated and populated state
> > for tool request (4.081 ms)
> > galaxy.tools.actions INFO 2019-08-31 16:30:30,285 Handled output
> (100.616
> > ms)
> > galaxy.tools.actions INFO 2019-08-31 16:30:30,319 Verified access to
> > datasets (0.005 ms)
> > galaxy.tools.execute DEBUG 2019-08-31 16:30:30,368 Tool [squeue] created
> > job [10] (206.086 ms)
> > galaxy.tools.execute DEBUG 2019-08-31 16:30:30,376 Executed all jobs for
> > tool request: (233.862 ms)
> > 198.91.54.159 - - [31/Aug/2019:16:30:30 +0000] "POST /api/tools
> HTTP/1.1"
> > 200 - "https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT
10.0;
> > Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > 198.91.54.159 - - [31/Aug/2019:16:30:30 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > galaxy.jobs DEBUG 2019-08-31 16:30:30,747 (10) Working directory for
> job
> > is: /opt/packages/galaxy/galaxy01/database/job_working_directory/000/10
> > galaxy.jobs.handler DEBUG 2019-08-31 16:30:30,751 (10) Dispatching to
> > slurm runner
> > galaxy.jobs DEBUG 2019-08-31 16:30:30,774 (10) Persisting job
> destination
> > (destination id: LM4)
> > galaxy.jobs.runners DEBUG 2019-08-31 16:30:30,790 Job [10] queued
> (38.578
> > ms)
> > galaxy.jobs.handler INFO 2019-08-31 16:30:30,818 (10) Job dispatched
> > galaxy.tools.deps DEBUG 2019-08-31 16:30:31,012 Building dependency
> shell
> > command for dependency 'slurm'
> > galaxy.tools.deps DEBUG 2019-08-31 16:30:31,013 Find dependency slurm
> > version None
> > galaxy.tools.deps DEBUG 2019-08-31 16:30:31,013 Resolver
> > tool_shed_packages returned <galaxy.tools.deps.resolvers.NullDependency
> > object at 0x1b38390> (isnull? True)
> > galaxy.tools.deps DEBUG 2019-08-31 16:30:31,014 Resolver galaxy_packages
> > returned
> > <galaxy.tools.deps.resolvers.galaxy_packages.GalaxyPackageDependency
> object
> > at 0x7fc78c334750> (isnull? False)
> > galaxy.jobs.command_factory INFO 2019-08-31 16:30:31,057 Built script
> >
> [/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/tool_script.sh]
> > for tool
> >
> command[PACKAGE_BASE=/opt/packages/galaxy/galaxy01/tool_dependencies/slurm/18.08.8;
> > export PACKAGE_BASE; . /opt/packages/galaxy/galaxy0
> > 1/tool_dependencies/slurm/18.08.8/env.sh; echo "hostname:" >
output;
> echo
> > " " >> output; hostname >> output; echo " "
>> output; env >> output;
> echo
> > " " >> output; date >> output; echo " "
>> output; echo "Uptime:" >>
> > output; echo " " >> output; uptime >> output; echo "
" >> output; echo
> > "Module
> > s:" >> output; echo " " >> output; module avail
>> output 2>&1; echo " "
> > >> output; echo "SLURM Queue Status" >> output; echo
" " >> output; echo
> > "If your job is running on the queues, it will be listed in the reports
> > below:" >> output; echo " " >> output; echo "
" >> output; echo "Normal
> > Report: s
> > queue" >> output; echo " " >> output; echo "
" >> output; echo " " >>
> > output; squeue >> output; date >> output; echo " "
>> output; echo " "
> >>
> > output; echo "*** Full Report: squeue -l ***" >> output; echo
" " >>
> > output; squeue -l >> output; echo " " >> output; echo
" " >> output;
> date
> > >> output;
> > echo " " >> output; echo "Local: ${LOCAL}" >>
output; echo "Ramdisk:
> > ${RAMDISK}" >> output; workdir=`pwd`; echo "workdir is
$workdir" >>
> output;
> > cd $LOCAL; echo "i am in `pwd`" >> $workdir/output; cd $workdir;
echo
> "i am
> > in `pwd`" >> output; date >> output; echo " "
>> output]
> > galaxy.tools.deps DEBUG 2019-08-31 16:30:31,259 Building dependency
> shell
> > command for dependency 'samtools'
> > galaxy.tools.deps DEBUG 2019-08-31 16:30:31,259 Find dependency samtools
> > version None
> > galaxy.tools.deps DEBUG 2019-08-31 16:30:31,259 Resolver
> > tool_shed_packages returned <galaxy.tools.deps.resolvers.NullDependency
> > object at 0x1b38390> (isnull? True)
> > galaxy.tools.deps DEBUG 2019-08-31 16:30:31,279 Resolver galaxy_packages
> > returned
> > <galaxy.tools.deps.resolvers.galaxy_packages.GalaxyPackageDependency
> object
> > at 0x7fc7b01e01d0> (isnull? False)
> > galaxy.jobs.runners DEBUG 2019-08-31 16:30:31,284 (10) command is:
> >
> /opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/tool_script.sh;
> > return_code=$?; if [ -f
> >
> /opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/output
> > ] ; then cp /opt/packages/galaxy/galaxy01/databas
> > e/job_working_directory/000/10/output
> > /opt/packages/galaxy/galaxy01/database/files/000/dataset_10.dat ; fi;
> >
> PACKAGE_BASE=/opt/packages/galaxy/galaxy01/tool_dependencies/samtools/0.1.19;
> > export PACKAGE_BASE; .
> > /opt/packages/galaxy/galaxy01/tool_dependencies/samtools/0.1.19/env.sh;
> > python "/opt/packa
> >
>
ges/galaxy/galaxy01/database/job_working_directory/000/10/set_metadata_jGYkkM.py"
> > "/opt/packages/galaxy/galaxy01/tmp/tmpxmB5GA"
> >
>
"/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/galaxy.json"
> >
>
"/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_in_HistoryD
> >
> >
>
atasetAssociation_10_u2k7qq,/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_kwds_HistoryDatasetAssociation_10_2ZmSXR,/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_out_HistoryDatasetAssociation_10_shC78c,/opt/packages/galaxy/galaxy01/databa
> >
>
se/job_working_directory/000/10/metadata_results_HistoryDatasetAssociation_10_57x96D,/opt/packages/galaxy/galaxy01/database/files/000/dataset_10.dat,/opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/metadata_override_HistoryDatasetAssociation_10_AEeOfc"
> > 5242880; sh -c "exit $retur
> > n_code"
> > galaxy.jobs.runners.drmaa DEBUG 2019-08-31 16:30:31,356 (10) submitting
> > file
> >
> /opt/packages/galaxy/galaxy01/database/job_working_directory/000/10/galaxy_10.sh
> > galaxy.jobs.runners.drmaa DEBUG 2019-08-31 16:30:31,356 (10) native
> > specification is: -p LM -C LM -N 1 -n 4 --ntasks-per-node=4
> --mem=192500 -t
> > 24:00:00
> > 198.91.54.159 - - [31/Aug/2019:16:30:34 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > 198.91.54.159 - - [31/Aug/2019:16:30:38 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > 198.91.54.159 - - [31/Aug/2019:16:30:42 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > 198.91.54.159 - - [31/Aug/2019:16:30:47 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > 198.91.54.159 - - [31/Aug/2019:16:30:51 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > 198.91.54.159 - - [31/Aug/2019:16:30:55 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > 198.91.54.159 - - [31/Aug/2019:16:30:59 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > 198.91.54.159 - - [31/Aug/2019:16:31:03 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > 198.91.54.159 - - [31/Aug/2019:16:31:08 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > 198.91.54.159 - - [31/Aug/2019:16:31:12 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > 198.91.54.159 - - [31/Aug/2019:16:31:16 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > 198.91.54.159 - - [31/Aug/2019:16:31:20 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > 198.91.54.159 - - [31/Aug/2019:16:31:24 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > 198.91.54.159 - - [31/Aug/2019:16:31:28 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:30,366 (10)
> > drmaa.Session.runJob() failed, will retry: code 1:
> slurm_submit_batch_job
> > error (2): No such file or directory*
> > 198.91.54.159 - - [31/Aug/2019:16:31:32 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:35,372 (10)
> > drmaa.Session.runJob() failed, will retry: code 1:
> slurm_submit_batch_job
> > error (0): No error*
> > 198.91.54.159 - - [31/Aug/2019:16:31:37 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:40,377 (10)
> > drmaa.Session.runJob() failed, will retry: code 1:
> slurm_submit_batch_job
> > error (0): No error*
> > 198.91.54.159 - - [31/Aug/2019:16:31:41 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > 198.91.54.159 - - [31/Aug/2019:16:31:45 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:45,383 (10)
> > drmaa.Session.runJob() failed, will retry: code 1:
> slurm_submit_batch_job
> > error (0): No error*
> > 198.91.54.159 - - [31/Aug/2019:16:31:49 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > *galaxy.jobs.runners.drmaa WARNING 2019-08-31 16:31:50,388 (10)
> > drmaa.Session.runJob() failed, will retry: code 1:
> slurm_submit_batch_job
> > error (0): No error*
> > 198.91.54.159 - - [31/Aug/2019:16:31:53 +0000] "GET
> > /api/histories/8237ee2988567c1c/contents HTTP/1.1" 200 - "
> >
https://galaxy.bridges.psc.edu/" "Mozilla/5.0 (Windows NT 10.0;
Win64;
> > x64; rv:68.0) Gecko/20100101 Firefox/68.0"
> > *galaxy.jobs.runners.drmaa ERROR 2019-08-31 16:31:55,393 (10) All
> attempts
> > to submit job failed *
> >
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> %(web_page_url)s
>
> To search Galaxy mailing lists use the unified search at:
>
http://galaxyproject.org/search/
>