Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3327: Add CPUManager policy option to align CPUs by Socket instead of by NUMA node #111278

Merged
merged 3 commits into from Aug 2, 2022

Conversation

arpitsardhana
Copy link

@arpitsardhana arpitsardhana commented Jul 20, 2022

What type of PR is this?

/kind feature
/sig node

What this PR does / why we need it:

This PR implements logic of KEP-3327

Which issue(s) this PR fixes:

Enhancement KEP-3327

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Add a new `align-by-socket` policy option to cpu manager `static` policy. When enabled CPU's to be aligned at socket boundary rather than NUMA boundary.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3327-align-by-socket

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jul 20, 2022
@k8s-ci-robot
Copy link
Contributor

@arpitsardhana: You must be a member of the kubernetes/milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Milestone Maintainers Team and have them propose you as an additional delegate for this responsibility.

In response to this:

What type of PR is this?

/kind feature
/sig node
/milestone v1.25

What this PR does / why we need it:

This PR implements logic of KEP-3327

Which issue(s) this PR fixes:

Enhancement KEP-3327

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Add a new `align-by-socket` policy option to cpu manager `static` policy. When enabled CPU's to be aligned at socket boundary rather than NUMA boundary.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3327-align-by-socket

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 20, 2022
@k8s-ci-robot
Copy link
Contributor

Welcome @arpitsardhana!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jul 20, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @arpitsardhana. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. labels Jul 20, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. label Jul 20, 2022
@endocrimes
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 20, 2022
@pacoxu pacoxu added this to Triage in SIG Node PR Triage Jul 20, 2022
@endocrimes
Copy link
Member

@arpitsardhana Looks like this has a little bit of work to do to get basic tests/builds working before it's ready for review, but looks like a good start, thanks!

@arpitsardhana
Copy link
Author

Thanks @endocrimes I did run quick-release on my system(mac) and verified the unit test are working ok.
I shall check above failures and fix them

@ffromani
Copy link
Contributor

/cc
/cc @swatisehgal
/priority important-soon
/triage accepted
priority important-soon because this is a 1.25 feature - couldn't think of a better way to represent it

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 22, 2022
@mrunalp
Copy link
Contributor

mrunalp commented Jul 29, 2022

/assign @klueska

@@ -115,11 +115,11 @@ func NewStaticPolicyOptions(policyOptions map[string]string) (StaticPolicyOption

func ValidateStaticPolicyOptions(opts StaticPolicyOptions, topology *topology.CPUTopology, topologyManager topologymanager.Store) error {
if opts.AlignBySocket == true {
//1. not compatible with topology manager single numa policy option
//1. Not compatible with topology manager single-numa-node policy option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the numbers from these comments

if opts.AlignBySocket == true {
//1. Not compatible with topology manager single-numa-node policy option.
if topologyManager.GetPolicy().Name() == topologymanager.PolicySingleNumaNode {
return fmt.Errorf("Topolgy manager Single numa policy is incompatible with CPUManager Align by socket policy option")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typos in the string, please also use a %s to pull in the actual name of the policy as stored in topologymanager.PolicySingleNumaNode instead of writing it out yourself.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in latest commit

@@ -99,3 +112,17 @@ func NewStaticPolicyOptions(policyOptions map[string]string) (StaticPolicyOption
}
return opts, nil
}

func ValidateStaticPolicyOptions(opts StaticPolicyOptions, topology *topology.CPUTopology, topologyManager topologymanager.Store) error {
if opts.AlignBySocket == true {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove == true

@@ -115,6 +115,10 @@ func NewStaticPolicy(topology *topology.CPUTopology, numReservedCPUs int, reserv
if err != nil {
return nil, err
}
err = ValidateStaticPolicyOptions(opts, topology, affinity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think part of the problem here is that this new flag probably belongs on the topologymanager as a "topology-manager-policy-option", not the CPUManager (for which we don't have any infrastructure yet). Let's leave it as is for the time being until we have time to evaluate if it makes sense moving this flag to the topologymanager in a future release (which may just have the opposite problem for the 2nd check going on which is CPUManager specific).

// socket aligned hint. It will ensure that first socket aligned available CPUs are
// allocated before we try to find CPUs across socket to satify allocation request.
if p.options.AlignBySocket {
socketBits := p.topology.CPUDetails.SocketsInNUMANodes(numaBits[:]...).ToSliceNoSort()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you making a copy of the numaBits slice before passing it? Is it necessary for some reason?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, i had some difficulty in calling SocketInNumaNodes since it is variadic function. Intent was to convert numBits into slice and then pass.
However, fixed in latest commit.

} else {
for _, numaNodeID := range numaBits {
alignedCPUs = alignedCPUs.Union(allocatableCPUs.Intersection(p.topology.CPUDetails.CPUsInNUMANodes(numaNodeID)))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's nice to avoid else statements when possible, but I think this is fine for now. We can revisit this as a follow-up.

//isHintSocketAligned function return true if numa nodes in hint are within one socket.
func (p *staticPolicy) isHintSocketAligned(hint bitmask.BitMask) bool {
numaNodes := hint.GetBits()
if p.topology.CPUDetails.SocketsInNUMANodes(numaNodes[:]...).Size() == 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to make a copy of the numaNodes slice before passing it in here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it in latest commit

if hints[i].NUMANodeAffinity.Count() == minAffinitySize {
hints[i].Preferred = true
}
}

return hints
}

//isHintSocketAligned function return true if numa nodes in hint are within one socket.
func (p *staticPolicy) isHintSocketAligned(hint bitmask.BitMask) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the name it seems this should take a hint as a paramater, not a bitmask.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed function input parameter

@@ -95,6 +95,7 @@ type HintProvider interface {
//Store interface is to allow Hint Providers to retrieve pod affinity
type Store interface {
GetAffinity(podUID string, containerName string) TopologyHint
GetPolicy() Policy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a strange addition to an interface called Store. I never actually liked the name Store to begin with but it was part of the original design before I got involved. I feel fairly strongly that this new function needs to live elsewhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the interface which is passed to CPUManager from where i can deduce anything about Topology manager allowing me to fetch topology manager policy and perform validation checks in policy options for compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I took a closer look at this.

I think the issue is that we should not have an interface called Store in the topologymanager, and instead should call this interface something more generic (I don't have a good suggestion off the top of my head though). Let's leave it in here for now as you have done, and do a follow-up PR to rename this interface to something more appropriate.

@arpitsardhana
Copy link
Author

Tried my best to address all comments.

Things to track on Beta:
--> Best place to perform CPU policy option validation especially if there is dependency of topology manager policy & cpu m manager policy option
--> Code readability improvement in allocateCPU(). Explore possibilities to avoid else statements in code
--> Explore if topology manager is correct place to get socket aligned hint. Either Topology manager policy or policy option of best-effort to generate hint which is more likely socket aligned from perspective of hint providers

@ffromani
Copy link
Contributor

ffromani commented Aug 1, 2022

I'll have another pass today/ASAPish, but I agree with Kevin's comments and we seem to be in agreement about the general direction.

Comment on lines 334 to 347
numaBits := numaAffinity.GetBits()
// If align-by-socket policy option is enabled, NUMA based hint is expanded to
// socket aligned hint. It will ensure that first socket aligned available CPUs are
// allocated before we try to find CPUs across socket to satisfy allocation request.
if p.options.AlignBySocket {
socketBits := p.topology.CPUDetails.SocketsInNUMANodes(numaBits...).ToSliceNoSort()
for _, socketID := range socketBits {
alignedCPUs = alignedCPUs.Union(allocatableCPUs.Intersection(p.topology.CPUDetails.CPUsInSockets(socketID)))
}
} else {
for _, numaNodeID := range numaBits {
alignedCPUs = alignedCPUs.Union(allocatableCPUs.Intersection(p.topology.CPUDetails.CPUsInNUMANodes(numaNodeID)))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to avoid an if/else and make this a little cleaner here would be to introduce a new function as

func (p *staticPolicy) getAlignedCPUs(numaAffinity bitmask.BitMask) {
	alignedCPUs := cpuset.NewCPUSet()
	if p.options.AlignBySocket {
		socketBits := p.topology.CPUDetails.SocketsInNUMANodes(numaAffinity.GetBits()...).ToSliceNoSort()
		for _, socketID := range socketBits {
			alignedCPUs = alignedCPUs.Union(allocatableCPUs.Intersection(p.topology.CPUDetails.CPUsInSockets(socketID)))
		}
		return alignedCPUs
	}

	for _, numaNodeID := range numaBits {
		alignedCPUs = alignedCPUs.Union(allocatableCPUs.Intersection(p.topology.CPUDetails.CPUsInNUMANodes(numaNodeID)))
	}
	
	return alignedCPUs
}

And then at the callsite here, just do:

alignedCPUs := p.getAlignedCPUs(numaAffinity)

@@ -514,10 +558,18 @@ func TestStaticPolicyAdd(t *testing.T) {
for _, testCase := range smtalignOptionTestCases {
runStaticPolicyTestCase(t, testCase)
}
for _, testCase := range alignBySocketOptionTestCases {
defer featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, pkgfeatures.CPUManagerPolicyAlphaOptions, true)()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's odd to put the defer in the middle of a for loop like this. Defers are function scoped and will only be run once this entire function exits. You are essentially adding a new function to run at function exit each time around the loop. Is this your intention here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will likely need to add a wrapping function around runStaticPolicyTestCase() called something like runStaticPolicyTestCaseWithFeatureGates() that sets up the necessary feature gates around the test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in latest commit

}

func runStaticPolicyTestCase(t *testing.T, testCase staticPolicyTest) {
policy, _ := NewStaticPolicy(testCase.topo, testCase.numReservedCPUs, cpuset.NewCPUSet(), topologymanager.NewFakeManager(), testCase.options)
topoManager := topologymanager.NewFakeManager()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just call the variable tm if you don't want to spell it out completely. topoManager is awkward.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in latest commit

Comment on lines +46 to +50
// NewFakeManagerWithPolicy returns an instance of fake topology manager with specified policy
func NewFakeManagerWithPolicy(policy Policy) Manager {
klog.InfoS("NewFakeManagerWithPolicy")
return &fakeManager{
policy: policy,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of introducing yet another NewFakeManager() function. What if we want a FakeManager with a hint and a policy set? We need to come up with something more generic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally there should be a generic FakeManager(). I added since we already have one withAffinity, however underlying structure is same. I can track it for beta to improve FakeManager()

Comment on lines 43 to 45
func (m *mockAffinityStore) GetPolicy() topologymanager.Policy {
return m.policy
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ever get exercised anyhwere? It looks like you just always set the policy to nil in all test cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit

},
{
NUMANodeAffinity: m0011,
Preferred: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is correct. When setting this flag, think of a socket as a super-NUMA-node for the purpose of alignment. So long as the set of bits in the NUMA mask all point to the same socket, then alignment should be preferred.

}
activePods = append(activePods, &pod)
}
defer featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, pkgfeatures.CPUManagerPolicyAlphaOptions, true)()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again -- it's weird to have this here in the middle of the function like this. If you want it set during this test, jut put it at the top of the function, not in the middle.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit

})
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point it would be good to add unit tests in pkg/kubelet/cm that can test across all of the hint-providers (with their various options) instead of just testing each of them in isolation like this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will track this item for beta.
I think testcases under pkg/kubelet/cm are definitely required and should be tracked via separate issue

Comment on lines +85 to +90
{
option: AlignBySocketOption,
featureGate: pkgfeatures.CPUManagerPolicyAlphaOptions,
featureGateEnable: true,
expectedAvailable: true,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add one more test case that ensures this is not available if CPUManagerPolicyBetaOptions is set.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

// isHintSocketAligned function return true if numa nodes in hint are within one socket.
func (p *staticPolicy) isHintSocketAligned(hint topologymanager.TopologyHint) bool {
numaNodesBitMask := hint.NUMANodeAffinity.GetBits()
return p.topology.CPUDetails.SocketsInNUMANodes(numaNodesBitMask...).Size() == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it sufficient to just check Size() == 1? What if minAffinitySize is greater than the number of numa nodes on a socket?

Should this really be something like:

minSockets := (minAffinitySize + numaNodesPerSocket - 1) / numaNodesPerSocket
return p.topology.CPUDetails.SocketsInNUMANodes(numaNodesBitMask...).Size() == minSockets

We should also add a test case for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example topology where above will apply?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is independent of the topology. It depends on the request size. Imagine a 2 socket machine with 4 NUMA nodes per socket and 20 CPUs per NUMA node. If a request comes in for 81 CPUs, this requires at least 5 NUMA nodes (meaning minAffinity will be 5) and (by definition) at least 2 sockets.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thought process i had is:
If allocation need to spread across two sockets, CPUs by definition are not socket aligned. Assumption is by socket alignment, we mean CPUs from one socket are preferred for allocation over cross-socket allocation.
In 2 socket architecture, With above logic, if allocation spreads two sockets then all hints become preferred and the whole point of finding optimized hint may become moot.

In my view, considering minSockets may be relevant for architecture where there are more than 2 sockets(viz 4 socket or 6 sockets etc) where we want the CPUs to be localized to let say 2 sockets.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intent is to have simplicity, in sense that CPU's will be considered aligned if they can come from one-socket. In my opinion, Making align-by-socket to consider CPUs as aligned within minAffinitySockets(2 or more ) may bring complexity whose performance returns may not be much. Since anyway allocation is anyway spreading across sockets

I could be wrong and open to suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If allocation need to spread across two sockets, CPUs by definition are not socket aligned.

This is true, but I don't see how that is relevant here. The point of the align-by-socket flag is not to try and force alignment to a single socket, but rather to generate hints at the socket boundary rather than the NUMA boundary.

If an allocation request has so many CPUs that it must always pull from more than one socket to satisfy its allocation (meaning, the topology dictates this, not the current allocation to other pods), then we need to generate a preferred hint that is representative of this.

Not doing so would make this flag inconsistent with its purpose -- i.e. to generate hints as if we were working at socket boundaries instead of NUMA boundaries.

Copy link
Contributor

@klueska klueska Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the reason we need to do this loop on the other side (instead of just assuming things always come from one socket):

	if p.options.AlignBySocket {
		socketBits := p.topology.CPUDetails.SocketsInNUMANodes(numaBits...).ToSliceNoSort()
		for _, socketID := range socketBits {
			alignedCPUs = alignedCPUs.Union(allocatableCPUs.Intersection(p.topology.CPUDetails.CPUsInSockets(socketID)))
		}
		return alignedCPUs
	}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit

func (p *staticPolicy) getAlignedCPUs(numaAffinity bitmask.BitMask, allocatableCPUs cpuset.CPUSet) cpuset.CPUSet {
alignedCPUs := cpuset.NewCPUSet()
numaBits := numaAffinity.GetBits()
// If align-by-socket policy option is enabled, NUMA based hint is expanded to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: line break here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit

Comment on lines 503 to 505
if err != nil {
panic(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just remove the panic here and ignore the error. If an erorr is returned here as part of test declaration we have bigger issues.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit


for _, testCase := range testCases {
t.Run(testCase.description, func(t *testing.T) {
defer featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, pkgfeatures.CPUManagerPolicyAlphaOptions, true)()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: line break after this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit

@@ -32,13 +32,18 @@ import (
)

type mockAffinityStore struct {
hint topologymanager.TopologyHint
hint topologymanager.TopologyHint
policy topologymanager.Policy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove policy from here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit

Copy link
Contributor

@klueska klueska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final comment, but it is not a blocker for merging.

Thanks for being patient with the comments / addressing them so quickly.

/lgtm
/approve

Comment on lines +590 to +593
numaNodesPerSocket := p.topology.NumNUMANodes / p.topology.NumSockets
if numaNodesPerSocket == 0 {
return false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this can never happen because we gate calling this function by p.options.AlignBySocket which will error out at startup if p.topology.NumNUMANodes > p.topology.NumSockets.

I thin it's fine to leave it here for completeness though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! Prefer to leave it there, In future other functions might leverage isHintSocketAligned and we may run into div by 0 issues

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 2, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: arpitsardhana, klueska

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 2, 2022
@klueska
Copy link
Contributor

klueska commented Aug 2, 2022

/retest

@k8s-ci-robot
Copy link
Contributor

@arpitsardhana: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-kind-ipv6 d92fd83 link unknown /test pull-kubernetes-e2e-kind-ipv6

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@klueska
Copy link
Contributor

klueska commented Aug 2, 2022

/test pull-kubernetes-e2e-kind-ipv6

@k8s-ci-robot k8s-ci-robot merged commit 9fb1f67 into kubernetes:master Aug 2, 2022
SIG Node PR Triage automation moved this from Waiting on Author to Done Aug 2, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone Aug 2, 2022
@@ -571,10 +572,50 @@ func (p *staticPolicy) generateCPUTopologyHints(availableCPUs cpuset.CPUSet, reu
// to the minAffinitySize. Only those with an equal number of bits set (and
// with a minimal set of numa nodes) will be considered preferred.
for i := range hints {
if p.options.AlignBySocket && p.isHintSocketAligned(hints[i], minAffinitySize) {
hints[i].Preferred = true
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that if the AlignBySocket option is enabled, the number of NUMA nodes in the final CPU allocation may not be minimal (although there may be CPU allocations with fewer NUMA nodes, and within a socket)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It may not be minimal.

This option makes it so that allocation decisions are made at the socket boundary, rather than the NUMA node boundary. All CPUs will come from the same socket, but not necessarily the same NUMA node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to also consider the number of NUMA nodes if the AlignBySocket option is enabled? First the CPUs should be in a socket, then the minimum number of NUMA nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not guarantees by the API (because if you are using this setting, you should not be caring about NUMA alignment, but only socket alignment), but under the hood, CPUs will in fact be handed out on one NUMA node first before moving onto the next.

This ensures all CPUs in the same socket are considered aligned:

socketBits := p.topology.CPUDetails.SocketsInNUMANodes(numaBits...).ToSliceNoSort()

And this (depending on the setting of DistributeCPUsAcrossNUMA) will make sure to pull those CPUs from a minimal set of NUMA nodes:

func (p *staticPolicy) takeByTopology(availableCPUs cpuset.CPUSet, numCPUs int) (cpuset.CPUSet, error) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I got it. Does this mean that the final allocated CPUs in the Allocate() method may be different from the TopologyHint returned by the GetTopologyHints() method and merged by the Topology Manager?

Assume NUMA node 0 and NUMA node 1 are on the same socket and they both have 2 CPUs. AlignBySocket is enabled while DistributeCPUsAcrossNUMA is not enabled. An incoming container requests 2 CPUs.

  1. In the GetTopologyHints() method, the returned TopologyHints are {{01, true}, {10, true}, {11, true}}
  2. In Topology Manager, the merged result is {11, true}
  3. In the Allocate() method, the final allocated CPUs to the container are both taken from NUMA node 0.

Is my understanding correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

8 participants