Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy recordPluginMetrics in CycleState.Clone #108727

Merged
merged 1 commit into from May 6, 2022

Conversation

sanposhiho
Copy link
Member

@sanposhiho sanposhiho commented Mar 16, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

We currently don't copy recordPluginMetrics field in CycleState.Clone() so recordPluginMetrics in the copied CycleCtate is always false.
But if we do so, for example, the following part will always be false. Is this the expected behavior..?

if !state.ShouldRecordPluginMetrics() {

Ref: #108724 (comment)

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix a bug where metrics are not recorded during Preemption(PostFilter).

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 16, 2022
@k8s-ci-robot
Copy link
Contributor

@sanposhiho: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Mar 16, 2022
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 16, 2022
Copy link
Member

@denkensk denkensk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add an UT or add one case in exist UT for it?

@sanposhiho
Copy link
Member Author

/retest

@sanposhiho
Copy link
Member Author

/retest

1 similar comment
@sanposhiho
Copy link
Member Author

/retest

@sanposhiho
Copy link
Member Author

@Huang-Wei your comment is for #108724, right?
Copied your comment in #108724 (comment)

@alculquicondor
Copy link
Member

We only Clone the state when we are doing preemption. I think we don't care about metrics within each sub-step of preemption, as preemption itself has an overall latency metric.

@Huang-Wei
Copy link
Member

@Huang-Wei your comment is for #108724, right?

ah, yes...

@Huang-Wei
Copy link
Member

Huang-Wei commented Mar 21, 2022

Hmm.. seems we had a bug that the following metrics were never recorded:

  • scheduler_plugin_execution_duration_seconds{extension_point=PreFilterExtensionAddPod}
  • scheduler_plugin_execution_duration_seconds{extension_point=PreFilterExtensionRemovePod}

as the cloned state always return false on ShouldRecordPluginMetrics(): runPreFilterExtensionAddPod and runPreFilterExtensionRemovePod.

I think we don't care about metrics within each sub-step of preemption, as preemption itself has an overall latency metric.

The impact is on RunPreFilterExtensionAddPod() and RunPreFilterExtensionRemovePod(). I think we should copy the value. See the explanation above.

@alculquicondor
Copy link
Member

Right, but should we record the metrics for Filter/Score that run inside preemption?

@Huang-Wei
Copy link
Member

Huang-Wei commented Mar 21, 2022

Right, but should we record the metrics for Filter/Score that run inside preemption?

I think so. Actually plugin_execution_duration_seconds is not that useful to diagnose how a pod's overall scheduling looks like. I interpreted it simply to represent a plugin's running at a particular extension point, so it's more "stateless", and thus whether preemption is involved doesn't quite matter. Another note to read the plugin metrics as "stateless" is the samplePluginRate (10%).

@@ -41,6 +41,7 @@ func TestCycleStateClone(t *testing.T) {
data: data1,
}
state.Write(key, originalValue)
state.SetRecordPluginMetrics(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test both true and false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Changed the whole test to be table-driven test and added the test for that as a test case.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Mar 24, 2022
@k8s-ci-robot k8s-ci-robot removed the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 24, 2022
@sanposhiho
Copy link
Member Author

/retest


// we need to compare *sync.Map,
// because we cannot define cmp.Comparer for sync.Map or CycleState due to the issue from go vet.
if diff := cmp.Diff(&state.storage, &stateCopy.storage, cmpOpts...); diff != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I think nothing is checking recordPluginMetrics?

is it a lot of changes if you make CycleState.storage a pointer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I think nothing is checking recordPluginMetrics?

Ah,, right...

if you make CycleState.storage a pointer?

nice idea. No concerns from me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@sanposhiho
Copy link
Member Author

/retest

Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

This can wait for 1.25

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 31, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, Huang-Wei, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Huang-Wei
Copy link
Member

since it's difficult to use cmp.Diff to compare sync.Map

I see. #108727 (comment) was based on the old implementation with regular map and mutex, now with sync.map, it seems we need extra work to compare the state.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 10, 2022
@sanposhiho sanposhiho force-pushed the fix-bug-clone branch 4 times, most recently from c6f312b to cadfa7b Compare April 10, 2022 12:22
@sanposhiho
Copy link
Member Author

/retest

@Huang-Wei
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 14, 2022
@sanposhiho
Copy link
Member Author

@Huang-Wei @alculquicondor

Is it okay to /unhold?

Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 6, 2022
@sanposhiho
Copy link
Member Author

sanposhiho commented May 6, 2022

/retest

(#109863 #109864)

@k8s-ci-robot k8s-ci-robot merged commit 03a46ac into kubernetes:master May 6, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants