Links

Overview

Issues with the current hook and configuration file syntax

The current hook source code and hook configuration file syntax supported makes it hard to use a single cgroup hook for either a number of clusters that do not have a single PBSPro version, or a PBSPro server that contains nodes with different configurations.

The main issues faced by people with complex clusters are:

Approach

The proposed changes shall retain backward compatibility with existing cgroup hook configuration files. This design extends the format as it currently exists.

The first set of issues can be fixed fairly easily by rearranging the code and adding some guards.

The second set of issues often makes sites require N different cgroup hooks to be imported each with their own configuration file, with each host needing to disable all the hooks but one using the configuration files. This is extremely cumbersome, especially since the hook allows "vntype" to be used to exclude collections of hosts sharing a vntype but doesn't allow limiting the hook to a set of vntypes; that idiosyncracy of the current configuration file syntax means that introducing a new vntype and hook requires all the other hooks to have their configuration updated to explicitly exclude support for that vntype. That is extremely hard to manage, and errors in the sequence of steps taken often leads to failures (when e.g. two competing hooks are both run on a host).

That which can be expressed in the current configuration file syntax causes two main issues:

Maintaining the code is also a lot harder than necessary because references to the configuration variables to enable/disable sections are currently scattered throughout the code (it would be better to confine it to one spot parsing the configuration file, so that the rest of the code can trivially find which controllers and flags are enabled).

Slightly extending the configuration file syntax supported has been demonstrated at many sites (running prototypes of the cgroup hook with changes) to fix these issues. This forms the crux of the proposal (though not the code changes, some of which are in essence to allow the hook to function properly on most host configurations).

Expressing booleans in a flexible way to express on which hosts they should be enabled

The idea of this change is to introduce a parser that can convert strings into booleans depending on the host on which the cgroup hook runs or its vntype. This can then be used to make most common configurations that use the current rather idiosyncratic and asymmetric section enablers/disabler a lot simpler, by simply expressing when "enabled" for a section or for the whole hook should evaluate to True of False.

The current proposed changes are not made to be as general as possible, but to support a lot of use cases and have been used widely by sites. The syntax supported in the extension was also made to allow customers to easily cut and paste bits of existing configuration files containing lists in exclude_hosts, run_only_on_hosts and exclude_vntype without having to rewrite them – in other words, it supports comma-separated lists.

If the strings

are recognized at the start of a string, the rest of the string is recognized as a list of vntypes/hosts for which to set the variable to True ("in:") or False ("not in:"). For all other hosts by definition the variable is set to the inverse. The entries are usually simply names but the code will allow single entries that are fnmatch.fnmatch patterns; commas are not supported in the patterns since they separate entries. This would mainly be used by sites for wildcards using * or ?.

Every section of the config file now has an "enabled" attribute, which should be set to something that is transformed into a boolean (i.e. either one of the strings that is transformed into a boolean described above, or a true boolean). If "enabled" is not defined in a section, then it is implictly taken as "true" (and possibly modified by what follows).

To go back to the examples given in the overview:

Note that this can also, in a limited fashion, be used for some numeric variables by using small editions in the rest of the code (see the first example), by allowing numeric values but converting booleans True and False to sensible numerical values – see the first example.

Most sites can use just this one way of defining "enabled".

There is one exception to the addition of "enabled" to each section: the "cgroup" section of the configuration file has always been a dictionary of dictionaries, and some portions of the existing cgroup hook code relies on all values of that dictionary to themselves be iterables (which a boolean, of course, is not). Rather than forcing the rest of the code to comply with an "elegant" structure that would have "enabled" defined at all levels, the cgroup section's "enabled" is stripped, since it expresses the same thing as "enabled" in the main section (if you disable all the controllers then having a main section becomes rather pointless).

Integration of older configuration file options to enable/disable sections and making 'exceptions' possible through their use

In most cases, the existing options for enabling/disabling sections (or the whole cgroup hook) can be replaced by the aforementioned support for strings morphed into booleans if only one option is used:

Since you can define the value for a vntype explicitly, "exclude_vntype" has become largely redundant. The recommendation would be to deprecate it, but the current implementation modulates the "enabled" flag for the section based on exclude_vntypes (in theory you can use a host-based selection in "enabled" and then modulate it based on the vntype discovered to exclude named hostbut it is counterintuitive to explicitly define a host as enabled and then implicitly exclude it based on vntype.)

Some sites would sometimes like to define "exceptions" without having to change vntype; for this the other existing options (which are slightly extended) can be used in addition to the base but now more flexible "enabled". These have always been lists and have up to now not supported wildcards, but each entry is now supported to be an fnmatch.fnmatch() pattern instead of a literal string.

If vntype-based lists are used to define "enabled" for a section, then the existing exclude_hosts configuration option can be used to modulate the answer so that site admins can still define exceptions to the rules. In order to ensure consistency, a mirror image "include_hosts" is now also parsed.

Finally, run_only_on_hosts has become largely redundant, but in the current implementation, it is defined to modulate "enabled" (i.e. if "enabled" was true but run_only_on_hosts is non-empty and does not list the host, "enabled" is set to false instead) in a way that most closely matches the plain vernacular meaning of the option. i.e.

"enabled" : "vntype in: willing"
"run_only_on_hosts" : [ "able01", "able02", "able03" ]

would leave that section disabled for all 'unwilling' vntypes (including able01..03 nodes if they are 'unwilling') but also all "willing" vntype nodes that were not one of the three listed nodes.

It is, of course, strongly discouraged to write combinations in any order that would not follow the order 

  1. "enabled",
  2. "exclude_vntypes" and "exclude_hosts",
  3. "include_hosts",
  4. "run_only_on_hosts",

because it could induce the reader into assuming different semantics than those described here (where the order in which some options modulate others is well defined). But most real-life examples of users using this feature seem to only use one or more rarely two options in this natural order.

Technical details

Examples

The following example configuration file illustrates many of the features outlined in this design.

{

        "cgroup_prefix" : "pbspro",

        "enabled" : "vntype in: type_a, type_b, no_numa, no_numa_no_cpuset",

        "periodic_resc_update" : true,

        "vnode_per_numa_node" : "vntype not in: no_numa",

        "online_offlined_nodes" : "host not in: *keepoff",

        "orphan_cleanup_race_delay": 5,

        "cgroup":

        {

               "cpuacct":

               {

                      "enabled" : "host in: dummy, tc72",

                      "exclude_hosts" : []

               },

               "cpuset":

               {

                      "enabled" : "vntype not in: *no_cpuset",

                      "controller_mount" : "/sys/fs/cgroup/cpuset",

                      "exclude_hosts" : [],

                      "exclude_vntypes" : [],

                      "memory_spread_page" : true,

                      "mem_hardwall" : false,

                      "mem_fences" : "vntype in: mem_fences, uv"

               },

               "devices":

               {

                      "enabled" : false,

                      "exclude_hosts" : [],

                      "exclude_vntypes" : [],

                      "allow" : ["b *:* rwm","c *:* rwm", ["mic/scif","rwm"],["nvidiactl","rwm", "*"],["nvidia-uvm","rwm"]]

               },

               "hugetlb":

               {

                      "enabled" : false,

                      "default" : "0MB",

                      "exclude_hosts" : [],

                      "exclude_vntypes" : []

               },

               "memory":

               {

                      "enabled" : true,

                      "default" : "256MB",

                      "reserve_memory" : "2GB",

                      "exclude_hosts" : [],

                      "exclude_vntypes" : []

               },

               "memsw":

               {

                      "enabled" : false,

                      "default" : "256MB",

                      "reserve_memory" : "2gb",

                      "exclude_hosts" : [],

                      "exclude_vntypes" : []

               }

        }

}





OSS Site Map

Project Documentation Main Page

Developer Guide Pages