(Selectively) disable 'ngpus' resource management in the cgroup hook
Links
- Link to discussion on Developer Forum: https://community.openpbs.org/t/selectively-disable-ngpus-resource-management-in-the-cgroup-hook/2781
- Link to issue:
- Link to pull request: <PR link if available>
Overview
Some customers would like the cgroup hook to stop managing the ngus resource completely on some nodes but manage it on others.
One use case is a site with test nodes with a single GPU that is to be shared by more than one job, mainly for testing. The site administrators plan to set it to a fixed number larger than one, using either v2 config files or using qmgr, to manage how much to oversubscribe the GPU resources.
The current hook does not allow this: it currently behaves as the manager of the ngpus resource on vnodes, and wil set resources_available.ngpus according to the ngpus which it can assign correctly. This entails that if the 'devices' controller is disabled or if 'discover_gpus' is disabled, it will actively set it to 0 to indicate that there are no GPUs available to assign to jobs; that in turn means that jobs requesting the ngpus resource will never use vnodes on the host.
In the current hook there is thus no way to allow oversubscription of ngpus resources: resources_available.ngpus is set to the correct value of GPUs the hook knows it can manage and assign, and jobs that request ngpus resources are assigned one GPU per ngpus in the job's request for their exclusive use.
This proposal introduces a new flag 'ngpus_ext_managed' in the main section of the configuration file; enabling the flag indicates that resources_available.ngpus is managed externally to the hook and that the hook should not assign GPUs to jobs when jobs request the ngpus resource.
By default it is false. If set to 'true' in the pbs_cgroups.CF configuration file:
- discover_gpus will be disabled, since there is no point in discovering GPUs if the hook is instructed not to manage GPU assignment
- the hook will never set resources_available.ngpus on the vnodes of the host, allowing a site to use v2 config files or qmgr to set it to any value (including values that do not correspond to the amount of GPUs on the node)
- the hook will not attempt to assign individual GPUs to the job; the hook will allow a job on a node regardless of the 'ngpus' resources requested by the job, leaving it to the external manager of resources_available.ngpus and the scheduler and server (which manages resources_assigned.ngpus) to manage the vnode resources 'ngpus'. Note: this entails that if the 'devices' subsection is enabled and configured for device isolation, all GPUs must be listed for jobs to be able to use them, unless a separate hook adds the GPU devices to the lists of devices allowed. The simplest way to deal with this is to disable the 'devices' controller selectively on nodes where 'ngpus_ext_managed' is enabled.
Since this is a boolean flag, it is possible to use the current config file syntax to enable ngpus to be managed by the hook on some nodes and to be externally managed on some other nodes.
Project Documentation Main Page