Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supply denominators #110164

Merged
merged 1 commit into from Jul 25, 2022
Merged

Conversation

MikeSpreitzer
Copy link
Member

@MikeSpreitzer MikeSpreitzer commented May 23, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR fixes the metric denominator problems noted in issues #109846 and #110160 .
There is more APF metrics cleanup to follow in later PRs.

Which issue(s) this PR fixes:

Fixes #109846
Fixes #110160

Special notes for your reviewer:

Does this PR introduce a user-facing change?

The `priority_level_request_utilization` metric histogram is adjusted so that for the cases where `phase=waiting` the denominator is the cumulative capacity of all of the priority level's queues.
The `read_vs_write_current_requests` metric histogram is adjusted, in the case of using API Priority and Fairness instead of max-in-flight, to divide by the relevant limit: sum of queue capacities for waiting requests, sum of seat limits for executing requests.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/sig api-machinery
/sig instrumentation
/cc @dgrisonnet
/cc @wojtek-t
@tkashem
@lavalamp

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 23, 2022
@MikeSpreitzer
Copy link
Member Author

/retest

@MikeSpreitzer
Copy link
Member Author

@cyang49

@MikeSpreitzer
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2022
@leilajal
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 24, 2022
@logicalhan
Copy link
Member

/assign @dgrisonnet

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 14, 2022
@MikeSpreitzer
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 15, 2022
@MikeSpreitzer
Copy link
Member Author

The latest force-push is a rebase onto master, now that that has all the pre-reqs.

@@ -162,6 +164,9 @@ type configController struct {
// name to the state for that level. Every name referenced from a
// member of `flowSchemas` has an entry here.
priorityLevelStates map[string]*priorityLevelState

// totalLimitConsumers are to be kept appraised of the overall limits
totalLimitConsumers []func(maxWaitingRequests, maxExecutingRequests int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm still struggling with understand why we need to complicate the code so much for metrics.

What I would do is simply in lines 546-548 hardcode the metrics (which are P&F metrics anyway) and do something like:

fcmetrics.ReadWriteConcurrencyGaugeVec.NewForLabelValuesSafe(0, 1, []string{fcmetrics.LabelValueWaiting, epmetrics.ReadOnlyKind}),SetDenominator(meal.MaxWaiting)
fcmetrics.ReadWriteConcurrencyGaugeVec.NewForLabelValuesSafe(0, 1, []string{fcmetrics.LabelValueWaiting, epmetrics.MutatingKind}).SetDenominator(meal.MaxWaiting)
(and same for mutating)

I think that passing thing around, all those fields etc. is making this code very hard to follow...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that lines 546--548 in apf_controller.go need to do what is primarily an up-call, to the higher layer in priority-and-fairness.go that updates the watermark data structures there and in maxinflight.go. You have proposed a solution that folds in the knowledge of how those watermark data structures are composed, lessening the modularity. That is not a deal-breaker, but something to consider.

I am struggling to remember why apf_filter.go:Interface allows registration of a multiplicity of request limit consumers; I think only one is needed. That would itself provide some simplification.

Note that every call to NewForLabelValuesSafe returns a distinct object --- with its own denominator variable. For this reason, the replacement you suggested would not have the desired effect.

I will think about ways to simplify.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vector structure there is not really needed. We statically know that it has exactly four members. We can replace the vector with four individuals; that will eliminate the calls to NewForLabelValuesSafe with their attendant new object every time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, wait, we need to use a vector so that all four time series show up as one metric with different label sets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revised, using a combination of the not-incorrect ideas above.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 22, 2022
)

// Gaue of number of readonly requests waiting / limit on those. Can be used directly after GetReadOnlyWaitingConcurrency()
var ReadOnlyWaitingConcurrency RatioedGauge
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make those vars private (we want people to use Get* methods to ensure those get initialized correctly before).

})
return MutatingExecutingConcurrency
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And please squash the commits - this looks great now!

@wojtek-t
Copy link
Member

Thank you Mike! In my opinion this looks much better now and it's much easier to follow!

@@ -78,6 +76,9 @@ func WithPriorityAndFairness(
klog.Warningf("priority and fairness support not found, skipping")
return handler
}
waitingMark.readOnlyObserver = fcmetrics.GetReadOnlyWaitingConcurrency()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I did not think that writes of the same value would be a problem. Easily fixed...

@MikeSpreitzer MikeSpreitzer force-pushed the supply-denominators branch 4 times, most recently from c9d6c72 to d37e1b5 Compare July 23, 2022 03:24
@MikeSpreitzer
Copy link
Member Author

/retest

@MikeSpreitzer
Copy link
Member Author

/test pull-kubernetes-unit

}, observationMaintenancePeriod, stopCh)
var initMaxinflightOnce sync.Once

func initMaxinflight(nonMutatingLimit, mutatingLimit int) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for consistency with other places in this file: initMaxInFlight ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

watermark.readOnlyObserver.SetDenominator(float64(nonMutatingLimit))
klog.V(2).InfoS("Set denominator for readonly requests", "limit", nonMutatingLimit)
} else {
klog.V(2).InfoS("Did not set denominator for readonly requests")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest removing this - the lack of log is also an information. (Same below).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
initMaxinflight(nonMutatingLimit, mutatingLimit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that synchronization on once is relatively cheap, but still I think we should avoid that in the handler function.
Why not calling it directly from WithMaxInFlightLimit - all metrics should be initialized at this point anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unsure of that.
I was not confident that this piece of code should rely on that knowledge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -79,6 +80,13 @@ func WithPriorityAndFairness(
return handler
}
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
initAPFOnce.Do(func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for max-in-flight.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Co-authored-by: JUN YANG <yang.jun22@zte.com.cn>
@MikeSpreitzer
Copy link
Member Author

The force-pushes to 631c6950eae and fdd921c make the changes suggested above and add logging of whether the efficient or inefficient case happened.

@wojtek-t
Copy link
Member

/lgtm
/approve

Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 25, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MikeSpreitzer, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 25, 2022
@k8s-ci-robot k8s-ci-robot merged commit d48c067 into kubernetes:master Jul 25, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone Jul 25, 2022
@MikeSpreitzer MikeSpreitzer deleted the supply-denominators branch July 26, 2022 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
7 participants