Skip to content

Commit d551d45

Browse files
authored
Merge pull request kubernetes#5432 from ndixita/pod-level-resources-ippr-alpha
KEP-5419: Add non-goals, and risks and mitigations
2 parents 7f93d68 + 9279011 commit d551d45

File tree

1 file changed

+60
-1
lines changed
  • keps/sig-node/5419-pod-level-resources-in-place-resize

1 file changed

+60
-1
lines changed

keps/sig-node/5419-pod-level-resources-in-place-resize/README.md

Lines changed: 60 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,10 @@
55
- [Summary](#summary)
66
- [Motivation](#motivation)
77
- [Goals](#goals)
8+
- [Non Goals](#non-goals)
89
- [Proposal](#proposal)
910
- [Notes/Constraints/Caveats](#notesconstraintscaveats)
11+
- [Risks and Mitigations](#risks-and-mitigations)
1012
- [Design Details](#design-details)
1113
- [Design Principles](#design-principles)
1214
- [Components/Features changes](#componentsfeatures-changes)
@@ -102,7 +104,29 @@ This proposal aims to:
102104
1. Extend the In-Place Pod Resize (IPPR) functionality to support dynamic
103105
adjustments of pod-level CPU and Memory resources.
104106
2. Ensure compatibility and proper interaction between pod-level IPPR and existing container-level IPPR mechanisms.
105-
3. Provide clear mechanisms for tracking and reporting the actual allocated pod-level resources in PodStatus
107+
3. Provide clear mechanisms for tracking and reporting the actual allocated
108+
pod-level resources in PodStatus
109+
110+
### Non Goals
111+
This KEP focuses solely on extending IPPR to pod-level resources, so the non-goals
112+
are largely the same as [IPPR's
113+
non-goals](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources#non-goals).
114+
These include:
115+
116+
1. This KEP focuses solely on in-place resizing of core compute resources (CPU and
117+
Memory) at the pod level. Extending this functionality to other resource types
118+
(e.g., GPUs, network bandwidth) is outside the current scope.
119+
120+
2. This KEP does not aim to implement dynamic changes to a pod's QoS class based on
121+
in-place resource resize operations.
122+
123+
3. No dynamic adjustments for Init Containers that have already finished and can't
124+
be restarted.
125+
126+
4. No automatic removal of lower-priority pods to make room for a pod that's resizing its resources.
127+
128+
5. This KEP doesn't aim to fix every complex timing issue that can happen between
129+
the Kubelet and the scheduler during resizes that already exist in [KEP#1287](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/1287-in-place-update-pod-resources/README.md).
106130

107131
## Proposal
108132
### Notes/Constraints/Caveats
@@ -118,6 +142,41 @@ This proposal aims to:
118142

119143
3. This feature relies on the PodLevelResources, InPlacePodVerticalScaling and InPlacePodLevelResourcesVerticalScaling feature gates being enabled.
120144

145+
### Risks and Mitigations
146+
This KEP focuses solely on extending IPPR to pod-level resources, so the risks
147+
are largely the same as [IPPR's
148+
risks and mitigations](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources#risks-and-mitigations)
149+
& [Pod-Level Resources' risks and
150+
mitigations](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2837-pod-level-resource-spec/README.md#risks-and-mitigations).
151+
These include:
152+
153+
1. Backward compatibility: For pods with pod-level resources, when Pod.Spec.Resources
154+
becomes representative of desired state, and Pod's actual resource configurations are
155+
tracked in Pod.Status.Resources, applications that query PodSpec and rely on
156+
Resources in PodSpec to determine resource configurations will see values that
157+
may not represent actual configurations. As a mitigation, this change needs to be
158+
documented and highlighted in the release notes, and in
159+
top-level Kubernetes documents.
160+
161+
2. Scheduler race condition: If a resize happens concurrently with the scheduler
162+
evaluating the node where the pod is resized, it can result in a node being
163+
over-scheduled, which will cause the pod to be rejected with an OutOfCPU or
164+
OutOfMemory error. Solving this race condition is out of scope for this KEP, but
165+
a general solution may be considered in the future.
166+
167+
3. Since Pod Level Resource Specifications is an opt-in feature, merging the feature related changes won't impact existing workloads. Moreover, the feature will be rolled out gradually, beginning with an alpha release for testing and gathering feedback. This will be followed by beta and GA releases as the feature matures and potential problems and improvements are addressed.
168+
169+
4. While this feature doesn't alter the existing cgroups structure, it does change how pod-level cgroup values are determined. Currently, Kubernetes derived these values from the container-level cgroup settings. However, with Pod Level Resource Specifications enabled, pod-level cgroup settings will be directly set based on the values specified in the pod's resource spec stanza, if set. This change in behavior could potentially affect:
170+
171+
Workloads or tools that rely on reading cgroup values: This means that any workloads or tools that depend on reading or interpreting container cgroup values might observe different derived values if pod-level resources are specified without container level settings.
172+
173+
Third-party schedulers or tools that make assumptions about pod-level resource calculation: These tools might require adjustments to accommodate the new way pod-level resources are determined.
174+
175+
To mitigate potential issues, the feature documentation will clearly highlight this change and its potential impact. This will allow users to:
176+
177+
- Adjust their pod-level and container-level resource settings as needed
178+
- Modify any custom schedulers or tools to align with new resource calculation method.
179+
121180
## Design Details
122181

123182
### Design Principles

0 commit comments

Comments
 (0)