PP-810: cgroups v2 with systemd
Note: This page updates information from the original cgroup design in PP-325
Overview
The release of cgroups v2 in the Linux kernel combined with the adoption of systemd style service management in most popular Linux distros means that the cgroup hook in PBS Pro must be updated to support new capabilities. This document describes the interface changes that will be introduced.
Interface 1:
Synopsis: cgroup.cpuset.exclude_cpus
Detail: Allow administrator to exclude cores from being assigned to jobs by adding numeric entries to a JSON list within the cpuset section of the cgroup hook configuration file. This setting impacts the creation of vnodes when vnode_per_numa_node is set to true. When vnodes are created, their core count (resources_available.ncpus) is reduced accordingly. When vnode_per_numa_node is false, the excluded CPUs apply to the node itself. The core count (resources_available.ncpus) of the node itself is reduced in this case.
Default: Empty list, no CPUs excluded
Example:
exclude_cpus
"cpuset" : { "enabled" : true, "exclude_cpus" : [0, 8], "exclude_hosts" : ["node004"], "exclude_vntypes" : ["green_node"] },
Interface 2:
Synopsis: cgroup.cpuset.mem_fences
Detail: Allow administrator to prevent cgroup hook from binding jobs to NUMA nodes. This prevents the hook from writing values to cpuset.mems in the cpuset subsystem.
Default: True, cgroup hook will write values to cpuset.mems.
Example:
mem_fences
"cpuset" : { "enabled" : true, "mem_fences" : true, "exclude_hosts" : ["node004"], "exclude_vntypes" : ["green_node"] },
Interface 3:
Synopsis: cgroup.cpuset.mem_hardwall
Detail: Allow administrator to override the value of cpuset.mem_hardwall. The RedHat documentation discribes this as:
Default: False (zero)
Example:
mem_hardwall
"cpuset" : { "enabled" : true, "mem_hardwall" : false, "exclude_hosts" : ["node004"], "exclude_vntypes" : ["green_node"] },
Interface 4:
Synopsis: cgroup.cpuset.memory_spread_page
Detail: Allow administrator to override the value of cpuset.memory_spread_page. The RedHat documentation discribes this as:
Default: False (zero)
Example:
memory_spread_page
"cpuset" : { "enabled" : true, "memory_spread_page" : false, "exclude_hosts" : ["node004"], "exclude_vntypes" : ["green_node"] },
Upgrading PBS Pro with cgroup hook:
Migrating from versions of PBS Pro prior to 18.2 on systems that utilize systemd will leave behind subdirectories that the older cgroups hook had created.
The presence of these directories is not harmful to newer versions of the cgroups hook.
These directories will no longer be present after a reboot. The cgroups hook creates new directories when the exechost_startup event is handled.