Avoid device and GPU discovery on nodes where it is unnecessary

Links

Link to discussion on Developer Forum:
Link to issue:
Link to pull request: https://github.com/openpbs/openpbs/pull/2229

Overview

In situations that demand high job throughput the cgroup hook's overhead can be substantial. One important part of this overhead is that when it instantiates the NodeUtils class it always discovers devices on the hosts and calls nvidia-smi to see if there are GPUs to manage.

On most 'pure' compute hosts, sites do not want to manage devices at all; devices are most commonly managed when there are GPUs or Xeon Phi devices to be managed by the hook and you want to avoid jobs picking the wrong accelerator devices. But at many sites there are some hosts where you do want to manage devices (and in some cases control access to them), especially when there are a number of GPU hosts.

Another issue is that that hook will discover GPUs even when the "devices" subsystem is completely disabled, but the functionality to assign GPUs correctly relies on that subsystem to be enabled. If it is disabled, resources_available.ngpus is set, but if you have 2 GPUs and two jobs each want one GPU they will be assigned the same GPU (since the hook events for the second job will be unaware of the assignment for the first job, which is recorded in the devices list of its cgroup). Customers are often puzzled about "why it is not working" in this case; it would be better not to record GPU resources as available unless the cgroup hook configuration is set up to mange the GPUs correctly.

Additionally, on many hosts, there area lot of devices, which make debugging hooks cumbersome when the MOM $logevent is cranked up to have more MoM log messages. Allowing the neutering of device discovery would make it easier to read MoM logs without having to scroll through log lines that are humongous.

Interface

The configuration file now supports the subsystem to be enabled on some hosts and disabled on others. This change will simply neuter device discovery if the "devices" controller/subsystem is disabled in the configuration file for the relevant host As a result, we also avoid the misconfiguration that makes resources_available.ngpus be positive while the configuration file is not set up properly to enable "devices" on the GPU host. If "devices" is disabled, then resources_available.ngpus is always zero.

The only net result of skipping device discovery is that the dictionary that records devices to manage becomes the empty dictionary. Since no event will actually use the dictionary to manage anything when the subsystem is disabled, that does no harm (and even does a lot of good in culling MoM log messages from irrelevant content).

For sites that wish to enable the devices subsystem on hosts with out GPUs, it introduces another flag, set to True by default, in the main section of the configuration file, "discover_gpus". If that is set to False ("false" in JSON) or 0, then device discovery will not be suppressed, but nvidia-smi will not be called.

OSS Site Map

Project Documentation Main Page

Developer Guide Pages