Nvidia MIG Support
Follow the PBS Pro Design Document Guidelines.
Links
Link to discussion on Developer Forum: https://community.openpbs.org/t/nvidia-mig-support/2382
Link to issue: <issue link if available>
Link to pull request: Initial support for MIG-enabled GPUs by vstumpf · Pull Request #2142 · openpbs/openpbs
Link to pull request updating the MIG UUID format Switch to obtaining nvidia a100 MIG identifier/UUID from nvidia-smi -L directly, rather than constructing tuple format· Pull Request #2452 · openpbs/openpbs
Overview
From the nvidia documentation, "the new Multi-Instance GPU (MIG) feature allows the NVIDIA A100 GPU to be securely partitioned into up to seven separate GPU Instances for CUDA applications".
PBS will recognize the separate GPU instances as a separate gpus.
Pre-requisites
MIG must be enabled on the gpu. See nvidia documentation how to enable it.
The nvidia kernel module parameter nv_cap_enable_devfs must be enabled (set to 1).
The admin must create GPU Instances and Compute Instances before starting the MoM.
Terminology
MIG = Multi Instance GPU
GI = GPU Instance (A MIG GPU can have multiple GIs)
CI = Compute Instance (A GI can have multiple CIs)
How it works
The cgroups hook currently loads the gpu information via nvidia-smi
. At this point, it will also note if MIG is enabled on any gpus. If a GPU has MIG enabled, it will look up the GIs, and replace the physical GPU with the GIs it finds.
This means, if a node has a MIG split into 7 GIs, it will replace the 1 physical GPU with the 7 GIs, and ngpus will be 7.
Now in order for the job to be able to use the GI, a CI(s) needs to be created for that GI. Follow the nvidia documentation on how to do this.
The job’s cgroup needs multiple devices allowed in order to use the GI. It requires the following devices:
The GI (look through /dev/nvidia-caps)
All CIs that are in the GI (look through /dev/nvidia-caps)
The GPU device that has the GIs are created on (/dev/nvidia0, /dev/nvidia1, etc)
The nvidiactl device (required for the GPU device)
Even though the job has access to the GPU device, because MIG is enabled it doesn’t have access to ALL the GIs on the system.
External Dependencies
This relies on the nvidia-smi
command. It uses the output of nvidia-smi mig -lgi
to list the GIs and nvidia-smi mig -lci
to list the CIs.
Unfortunately, these commands only output in table format, like so:
[vstumpf@gpusrv-01 ~]$ sudo nvidia-smi mig -lgi
+----------------------------------------------------+
| GPU instances: |
| GPU Name Profile Instance Placement |
| ID ID Start:Size |
|====================================================|
| 0 MIG 1g.5gb 19 7 0:1 |
+----------------------------------------------------+
The hook will use regex to match this output, but if the output format changes, a patch will be required.
On the two machines I tested on, nvidia-smi mig -lci
had different formats:
[vstumpf@gpusrv-01 ~]$ nvidia-smi -h
NVIDIA System Management Interface -- v455.45.01
[vstumpf@gpusrv-01 ~]$ sudo nvidia-smi mig -lci
+--------------------------------------------------------------------+
| Compute instances: |
| GPU GPU Name Profile Instance Placement |
| Instance ID ID Start:Size |
| ID |
|====================================================================|
| 0 7 MIG 1g.5gb 0 0 0:1 |
+--------------------------------------------------------------------+
versus
# nvidia-smi -h
NVIDIA System Management Interface -- v450.51.06
# nvidia-smi mig -lci
+-------------------------------------------------------+
| Compute instances: |
| GPU GPU Name Profile Instance |
| Instance ID ID |
| ID |
|=======================================================|
| 0 7 MIG 1g.5gb 0 0 |
+-------------------------------------------------------+
It also uses the nvidia-smi -L
command to list out the UUIDs of each MIG device. This command is used to update the $CUDA_VISIBLE_DEVICES
environment variable, which is used to specify which CIs a particular job would run on.
CUDA_VISIBLE_DEVICES
Instead of CUDA_VISIBLE_DEVICES being filled with the UUIDs of the GPUs, it will be instead the UUID of the CIs.
Previously compute instances were identified via the format MIG-GPU-<GPU_UUID>/<GI_ID>/<CI_ID>, but now each compute instance in each GI would have it's own UUID of format MIG-<MIG_UUID>
External Interface Changes
There are no changes to the external interface. If MIG is enabled and there are GPU Instances created, the hook will automatically use them.