Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add container probe duration metrics #104484

Merged
merged 1 commit into from Jul 29, 2022

Conversation

jackfrancis
Copy link
Contributor

@jackfrancis jackfrancis commented Aug 20, 2021

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds duration metrics for kubelet container probes.

Which issue(s) this PR fixes:

Fixes #101035

Special notes for your reviewer:

Does this PR introduce a user-facing change?

add container probe duration metrics

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

add container probe duration metrics

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 20, 2021
@jackfrancis
Copy link
Contributor Author

@SergeyKanzhelev how does this look as a draft change for probe duration metrics? are we on the right track?

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 20, 2021
@pacoxu pacoxu added this to Waiting on Author in SIG Node PR Triage Sep 2, 2021
@pacoxu pacoxu moved this from Waiting on Author to Triage in SIG Node PR Triage Sep 2, 2021
@mmiranda96
Copy link
Contributor

/triage accepted
/priority important-soon

Could you update the proper unit tests for the new feature? Also, I'm not 100% sure on this, but it looks like the jobs are flaking. Rerunning them...
/retest

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 20, 2021
@mmiranda96 mmiranda96 moved this from Triage to Waiting on Author in SIG Node PR Triage Sep 20, 2021
@mmiranda96
Copy link
Contributor

Also, please edit the section Does this PR introduce a user-facing change?, since it does introduce a new change.

@pacoxu

This comment has been minimized.

@pacoxu

This comment has been minimized.

@k8s-ci-robot
Copy link
Contributor

@pacoxu: /release-note-edit must be used with a release note block.

In response to this:

/release-note-edit

add container probe duration metrics

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pacoxu
Copy link
Member

pacoxu commented Sep 26, 2021

/release-note-edit

add container probe duration metrics

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Sep 26, 2021
@pacoxu pacoxu moved this from Waiting on Author to Needs Reviewer in SIG Node PR Triage Sep 26, 2021
@ehashman
Copy link
Member

/sig instrumentation

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 21, 2022
@@ -474,3 +475,50 @@ func TestStartupProbeDisabledByStarted(t *testing.T) {
expectContinue(t, w, w.doProbe(), msg)
expectResult(t, w, results.Success, msg)
}

func TestGetPodLabelName(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgrisonnet thx for the continual feedback!

Does the below set of cases prove out everything we want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gentle ping @dgrisonnet @SergeyKanzhelev

This PR is ready for another "consider for merge" review

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new changes looks good from my side, but I would love to have a second pair of eyes on these assumptions: #104484 (comment)

@pacoxu
Copy link
Member

pacoxu commented Feb 10, 2022

/assign

@jackfrancis
Copy link
Contributor Author

gentle ping @pacoxu

thank you!

@pacoxu
Copy link
Member

pacoxu commented Feb 16, 2022

I did some testing. Here are four examples including static pod and general pods.

[root@daocloud ~]# kubectl get pod --show-labels -A
NAMESPACE          NAME                                       READY   STATUS             RESTARTS         AGE     LABELS
default            liveness-exec                              0/1     CrashLoopBackOff   23 (54s ago)     75m     test=liveness
calico-system      calico-kube-controllers-59c45ff85c-x2gts   1/1     Running            10 (27d ago)     56d     k8s-app=calico-kube-controllers,pod-template-hash=59c45ff85c
calico-system      calico-node-d2jpf                          1/1     Running            4 (27d ago)      56d     controller-revision-hash=86b6bcc6dc,k8s-app=calico-node,pod-template-generation=1
calico-system      calico-typha-65cdbb598d-644x5              1/1     Running            7 (27d ago)      56d     k8s-app=calico-typha,pod-template-hash=65cdbb598d
kube-system        kube-apiserver-daocloud                    1/1     Running            0                81m     component=kube-apiserver,tier=control-plane

The metrics for pods above

prober_probe_duration_seconds_sum{container="liveness",namespace="default",pod="liveness-exec",probe_type="Liveness"} 6.563395632
prober_probe_duration_seconds_sum{container="calico-kube-controllers",namespace="calico-system",pod="calico-kube-controllers",probe_type="Liveness"} 25.634286655000015
prober_probe_duration_seconds_sum{container="calico-kube-controllers",namespace="calico-system",pod="calico-kube-controllers",probe_type="Readiness"} 26.072259959999986
prober_probe_duration_seconds_sum{container="calico-node",namespace="calico-system",pod="calico-node",probe_type="Liveness"} 0.6949132140000004
prober_probe_duration_seconds_sum{container="calico-node",namespace="calico-system",pod="calico-node",probe_type="Readiness"} 60.58489301800003
prober_probe_duration_seconds_sum{container="calico-typha",namespace="calico-system",pod="calico-typha",probe_type="Liveness"} 0.7015105350000005
prober_probe_duration_seconds_sum{container="calico-typha",namespace="calico-system",pod="calico-typha",probe_type="Readiness"} 0.6845843959999995
prober_probe_duration_seconds_sum{container="kube-apiserver",namespace="kube-system",pod="kube-apiserver-daocloud",probe_type="Liveness"} 3.810594282999996
prober_probe_duration_seconds_sum{container="kube-apiserver",namespace="kube-system",pod="kube-apiserver-daocloud",probe_type="Readiness"} 44.2331933840001
prober_probe_duration_seconds_sum{container="kube-apiserver",namespace="kube-system",pod="kube-apiserver-daocloud",probe_type="Startup"} 0.006615101

Copy link
Member

@pacoxu pacoxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 16, 2022
@ehashman ehashman moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Feb 28, 2022
@jackfrancis
Copy link
Contributor Author

@SergeyKanzhelev can we merge this?

@jackfrancis
Copy link
Contributor Author

@SergeyKanzhelev @pacoxu @dgrisonnet should we take advantage of the beginning of the 1.25 cycle and add this to the milestone and merge?

@pacoxu
Copy link
Member

pacoxu commented Jun 28, 2022

kindly ping @dgrisonnet
/cc @dashpole @mrunalp
Do you think this metric is acceptable?

@dgrisonnet
Copy link
Member

I somehow missed the pings, sorry about that. I think this PR is good to go now that my initial concerns have been addressed.

/unhold
/lgtm

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 28, 2022
@jackfrancis
Copy link
Contributor Author

gentle ping @mrunalp

@dashpole
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dashpole, jackfrancis, logicalhan, mrunalp, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2022
@mrunalp mrunalp moved this from Needs Approver to Done in SIG Node PR Triage Jul 29, 2022
@k8s-ci-robot k8s-ci-robot merged commit 126c076 into kubernetes:master Jul 29, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone Jul 29, 2022
@jackfrancis jackfrancis deleted the prober-duration-metrics branch November 4, 2022 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

Create a duration metric for probes