New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet: more resilient node allocatable ephemeral-storage data getter #101882
kubelet: more resilient node allocatable ephemeral-storage data getter #101882
Conversation
/test pull-kubernetes-e2e-capz-ha-control-plane |
/test pull-kubernetes-e2e-capz-ha-control-plane |
/test pull-kubernetes-e2e-capz-ha-control-plane |
1 similar comment
/test pull-kubernetes-e2e-capz-ha-control-plane |
45d8bb5
to
6dff31a
Compare
/test pull-kubernetes-e2e-capz-ha-control-plane |
@@ -1060,7 +1060,32 @@ func isKernelPid(pid int) bool { | |||
return err != nil && os.IsNotExist(err) | |||
} | |||
|
|||
// GetCapacity returns node capacity data for "cpu", "memory", "ephemeral-storage", and "huge-pages*" | |||
// At present this method is only invoked when introspecting ephemeral storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this comment to emphasize the practical need to work harder to deliver ephemeral-storage data: This GetCapacity
method is only called to get ephemeral storage data. See here where we refer to it when defining the set of MachineInfo
setter functions:
https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/kubelet/kubelet_node_status.go#L617
Those set of funcs are then assigned to the main kubelet instance:
https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/kubelet/kubelet.go#L848
And then those status funcs are invoked every time kubelet sets the node status for a node:
https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/kubelet/kubelet_node_status.go#L583
What's interesting here is that when we invoke MachineInfo
with GetCapacity()
as the capacityFunc
, we see that we only use capacityFunc
to derive ephemeral storage data, and nothing else:
https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/kubelet/nodestatus/setters.go#L332
Thus, we should improve the resiliency/reliability of GetCapacity()
to actually return real ephemeral storage allocatable data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so where cpu and memory is retrieved from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPU and memory are stored in the containerManager instance's capacity
property when containerManager is instantiated, right here:
https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/kubelet/cm/container_manager_linux.go#L246
However, storage is not stored at that time, which is why the race condition occurs, and why we may need to call it "just in time" during node info retrieval/node registration. (See this comment: https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/kubelet/cm/container_manager_linux.go#L245)
CPU/memory info is gathered using another function, see here:
https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/kubelet/nodestatus/setters.go#L296
there are 2 tl;dr points:
- storage and CPU/memory are retrieved discretely, using individual code paths
- CPU/memory is gathered immediately when the kubelet runtime starts, and has its own error handling (see code); storage is gathered a little later than CPU/memory, and thus node info events may occur before storage has been gathered, leading to node registrations with "0" allocatable ephemeral-storage.
This PR accomplishes the following:
If we detect that allocatable ephemeral-storage has not yet been gathered during a node info/node registration operation, we attempt to eagerly gather that storage data "just in time" in order to prevent registering a node with "0" allocatable ephemeral-storage.
Hope that makes sense!
@Random-Liu @sjenning can you kindly take a look at this PR for final approval/merge? thanks! |
Another issue related to what this PR does: |
/check-cla |
/easycla |
gentle ping @mrunalp @tallclair |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jackfrancis, jsafrane, mrunalp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
5374e97
to
1dc066f
Compare
1dc066f
to
b2d8cd1
Compare
/lgtm |
Looks like I encountered #107414 /retest |
b2d8cd1
to
ab14cba
Compare
/lgtm |
Yes, just had to clean up |
/retest |
thx again @mrunalp for the approval merry-go-round 🙏 |
This was approved before the code freeze, but not merged for CI failures. 😂 |
What type of PR is this?
What this PR does / why we need it:
This PR modifies the
GetCapacity
containerManager method that is used to derive allocatable ephemeral storage node data. This change adds a "just-in-time" cAdvisor call to gather rootfs storage data in case the containerManager runtime instance hasn't yet stored that data.This change deals with a race that is documented in code here:
https://github.com/kubernetes/kubernetes/blob/v1.21.0/pkg/kubelet/cm/container_manager_linux.go#L245
In summary: (1) the containerManager is instantiated, and then (2) Starts; but, node info may have been gathered between steps 1 and 2 in some instances, which leads to outcomes in which the node data is missing ephemeral storage allocatable data. Additionally, that "incomplete" node representation is subject to being submitted to the apiserver during node registration, leading to further side-effects during cluster bootstrapping where the authoritative node data declares that it has zero allocatable ephemeral storage.
Which issue(s) this PR fixes:
Fixes #99305
Fixes #102715
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: