Add MIG setup to the GPU page #311

nwneisen · 2025-02-17T19:17:02Z

This change is still waiting for tickets and configuration to be finalized

Document MIG (Multi-Instance GPU) functionality
https://mirantis.jira.com/browse/BOP-1310
https://github.com/MirantisContainers/mke/pull/437
https://github.com/MirantisContainers/mke-operator/pull/194

MIG is a GPU setting that allows a GPU to be partitioned so that multiple workloads can be ran in parallel without interfering with one another. Nvidia has two strategies for doing this

single - The GPU partitions are all the same size
mixed - The GPU partitions can be of various sizes

The initial implementation only supports the single strategy to match MKE3 functionality.

Configuration
MIG is disabled default in MKE. It can be enabled by setting the MIG section's enabled field to true in the mkeconfig.

Official documentation: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-mig.html#about-multi-instance-gpu

Note
Enabling MIG on a GPU that was previously running will cause the pods to reboot even if it has workloads running.

devicePlugins:
  nvidiaGPU:
    enabled: true
    mig:
      enabled: true

A profile can then be applied to a node using the nvidia.com/mig.config: <MIG profile>. Supported profiles vary by GPU and can be found at: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#getting-started-with-mig

The text was updated successfully, but these errors were encountered:

nwneisen added the GPU label Feb 17, 2025

KoryKessel-Mirantis self-assigned this Feb 17, 2025

KoryKessel-Mirantis added the Docs-B Should Do label Feb 17, 2025

KoryKessel-Mirantis linked a pull request Feb 18, 2025 that will close this issue

MIG GPU topic #313

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MIG setup to the GPU page #311

Add MIG setup to the GPU page #311

nwneisen commented Feb 17, 2025 •

edited

Loading

Add MIG setup to the GPU page #311

Add MIG setup to the GPU page #311

Comments

nwneisen commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

nwneisen commented Feb 17, 2025 •

edited

Loading