Skip to content

Add MIG setup to the GPU page #311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nwneisen opened this issue Feb 17, 2025 · 0 comments · May be fixed by #313
Open

Add MIG setup to the GPU page #311

nwneisen opened this issue Feb 17, 2025 · 0 comments · May be fixed by #313
Assignees
Labels
Docs-B Should Do GPU

Comments

@nwneisen
Copy link
Contributor

nwneisen commented Feb 17, 2025

This change is still waiting for tickets and configuration to be finalized

Document MIG (Multi-Instance GPU) functionality
https://mirantis.jira.com/browse/BOP-1310
https://github.com/MirantisContainers/mke/pull/437
https://github.com/MirantisContainers/mke-operator/pull/194

MIG is a GPU setting that allows a GPU to be partitioned so that multiple workloads can be ran in parallel without interfering with one another. Nvidia has two strategies for doing this

  • single - The GPU partitions are all the same size
  • mixed - The GPU partitions can be of various sizes

The initial implementation only supports the single strategy to match MKE3 functionality.

Configuration
MIG is disabled default in MKE. It can be enabled by setting the MIG section's enabled field to true in the mkeconfig.

Official documentation: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-mig.html#about-multi-instance-gpu

Note
Enabling MIG on a GPU that was previously running will cause the pods to reboot even if it has workloads running.

devicePlugins:
  nvidiaGPU:
    enabled: true
    mig:
      enabled: true

A profile can then be applied to a node using the nvidia.com/mig.config: <MIG profile>. Supported profiles vary by GPU and can be found at: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#getting-started-with-mig

@nwneisen nwneisen added the GPU label Feb 17, 2025
@KoryKessel-Mirantis KoryKessel-Mirantis self-assigned this Feb 17, 2025
@KoryKessel-Mirantis KoryKessel-Mirantis added the Docs-B Should Do label Feb 17, 2025
@KoryKessel-Mirantis KoryKessel-Mirantis linked a pull request Feb 18, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs-B Should Do GPU
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants