Branch: refs/heads/release_18.05 Home: https://github.com/galaxyproject/galaxy Commit: 5a0f98802cab73dfcdaef01b1200e96bc6b9a8ae https://github.com/galaxyproject/galaxy/commit/5a0f98802cab73dfcdaef01b1200e... Author: Nicola Soranzo <nicola.soranzo@earlham.ac.uk> Date: 2018-06-06 (Wed, 06 Jun 2018) Changed paths: M lib/galaxy/jobs/runners/slurm.py Log Message: ----------- Strip a spurious Slurm warning from job stderr Reason for the Slurm warning ---------------------------- The Linux kernel memory controller (responsible for the memory cgroup) may run out of cgroup subsystem state (CSS) IDs. When a cgroup is created it is assigned a CSS ID to manage its state. Upon removal of the cgroup the corresponding state information (e.g. cache entries) may still exist and therefore the CSS ID is still held. When multiple frequent short-lived jobs are run on a cluster node, the number of available CSS IDs becomes exhausted and creating a new memory cgroup results in a ENOSPC ("No space left on device"), which is reported by SLURM with the message "unable to add task[pid=<PID>] to memory cg '(null)'" (even if jobs run fine). There is a bugfix for the kernel that releases the CSS ID upon cgroup destruction, but it appears to be only available after Linux 4.4. We use CentOS 7, which is based on Linux 3.10. The only temporary solutions is to reboot the affected cluster node. Thanks to @tuxtobin for the detailed analysis above. Reason for this patch --------------------- Many tools rely on an empty stderr to determine if the job was successful, either because they were never updated to use `<stdio>`/`detect_errors`, or because the underlying tool returns a non-zero exit code when successful, e.g. `tranalign` from https://toolshed.g2.bx.psu.edu/view/devteam/emboss_5/832c20329690 . Even using a `<regex>` inside `<stdio>` would not work because the Slurm warning contains the word `error`. Commit: e7b7bee82e60acea478db2dbf2cc03552ebf6cd4 https://github.com/galaxyproject/galaxy/commit/e7b7bee82e60acea478db2dbf2cc0... Author: Nate Coraor <nate@bx.psu.edu> Date: 2018-06-11 (Mon, 11 Jun 2018) Changed paths: M lib/galaxy/jobs/runners/slurm.py Log Message: ----------- Merge pull request #6293 from nsoranzo/release_18.05_strip_slurm_warning [18.05] Strip a spurious Slurm warning from job stderr Compare: https://github.com/galaxyproject/galaxy/compare/dba21dc8a426...e7b7bee82e60 **NOTE:** This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.