Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet/stats: update cadvisor stats provider with new log location #108115

Merged
merged 3 commits into from May 12, 2022

Conversation

haircommander
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

in #74441,
the namespace and name were added to the pod log location.

However, cAdvisor stats provider wasn't correspondingly updated.

since CRI-O uses cAdvisor stats provider by default, despite being a CRI implementation,
eviction with ephemeral storage and container logs doesn't work as expected, until now!

Signed-off-by: Peter Hunt pehunt@redhat.com

Which issue(s) this PR fixes:

Fixes #107447

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix a bug where CRI implementations that use cAdvisor stats provider (CRI-O) don't evict pods when their logs exceed ephemeral storage limit.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 14, 2022
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 14, 2022
@haircommander
Copy link
Contributor Author

/sig node
/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 14, 2022
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 14, 2022
@haircommander
Copy link
Contributor Author

a unit test is likely in order, but I'll need to tackle that tomorrow

in kubernetes#74441,
the namespace and name were added to the pod log location.

However, cAdvisor stats provider wasn't correspondingly updated.

since CRI-O uses cAdvisor stats provider by default, despite being a CRI implementation,
eviction with ephemeral storage and container logs doesn't work as expected, until now!

Signed-off-by: Peter Hunt <pehunt@redhat.com>
…hemeral stats

this commit updates checkEphemeralStorage to be able to add container log stats, if applicable.

It also updates the old check when container log stats aren't found to be more accurate.
Specifically, this check previously worked because of a fluke programming accident:

according to this block in pkg/kubelet/stats/helper.go:113
```
if result.Rootfs != nil {
    rootfsUsage := *cfs.BaseUsageBytes
    result.Rootfs.UsedBytes = &rootfsUsage
}
```

BaseUsageBytes should be the value added, not TotalUsageBytes. However, since in this case
one also needs to account for the calculated log size, which is TotalUsageBytes - BaseUsageBytes
using TotalUsageBytes value accidentally worked.

Updating the case to use the correct value AND log offset fixes this accident and makes
the behavior more in line with what happens when calculating ephemeral storage.

Signed-off-by: Peter Hunt <pehunt@redhat.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 15, 2022
@haircommander
Copy link
Contributor Author

/retest

@ehashman
Copy link
Member

/triage accepted
/assign @bobbypage

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 24, 2022
@ehashman ehashman moved this from Triage to Needs Reviewer in SIG Node PR Triage Mar 24, 2022
@bobbypage
Copy link
Member

Thanks for the fix!

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 31, 2022
@ehashman ehashman moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Apr 4, 2022
@RA489
Copy link

RA489 commented May 12, 2022

/assign @dashpole for approval.

@dashpole
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dashpole, haircommander

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2022
@k8s-ci-robot k8s-ci-robot merged commit 3688442 into kubernetes:master May 12, 2022
SIG Node PR Triage automation moved this from Needs Approver to Done May 12, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone May 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

Rotated container log files size not counted towards ephemeral-storage's limit on CRI-O
6 participants