Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore EndpointSlices that are marked for deletion #109624

Merged

Conversation

aryan9600
Copy link
Member

@aryan9600 aryan9600 commented Apr 22, 2022

Signed-off-by: Sanskar Jaiswal jaiswalsanskar078@gmail.com

What type of PR is this?

/kind bug

What this PR does / why we need it:

Ignores EndpointSlices that have been marked for deletion, to prevent the controller from getting stuck while calculating stale slices for a service.

Which issue(s) this PR fixes:

Fixes #108779

Special notes for your reviewer:

Related issue: #108267

Does this PR introduce a user-facing change?

EndpointSlices marked for deletion are now ignored during reconciliation.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Apr 22, 2022
@k8s-ci-robot
Copy link
Contributor

Please note that we're already in Test Freeze for the release-1.24 branch. This means every merged PR has to be cherry-picked into the release branch to be part of the upcoming v1.24.0 release.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 22, 2022
@k8s-ci-robot
Copy link
Contributor

@aryan9600: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 22, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @aryan9600. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Apr 22, 2022
@aryan9600
Copy link
Member Author

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 22, 2022
@k8s-ci-robot k8s-ci-robot added the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Apr 22, 2022
@danwinship
Copy link
Contributor

/assign @robscott

Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this @aryan9600!

@@ -100,6 +100,9 @@ func (est *EndpointSliceTracker) StaleSlices(service *v1.Service, endpointSlices
providedSlices[endpointSlice.UID] = endpointSlice.Generation
g, ok := gfs[endpointSlice.UID]
if ok && (g == deletionExpected || g > endpointSlice.Generation) {
if endpointSlice.DeletionTimestamp != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this just pushes the problem further down. This will allow the sync to continue which is good, but we also need a change in the reconciler (or similar area) to ensure that the reconcile logic also ignores EndpointSlices that are terminating, otherwise we'll be in a worse place than when we started.

@robscott
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 25, 2022
@aojea
Copy link
Member

aojea commented Apr 25, 2022

/cc

@k8s-ci-robot k8s-ci-robot requested a review from aojea April 25, 2022 19:50
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 25, 2022
@aryan9600 aryan9600 force-pushed the fix-endpointslice-deletion branch 2 times, most recently from ac8d9f6 to 3e5ef5a Compare April 26, 2022 07:03
@aryan9600 aryan9600 requested a review from robscott April 26, 2022 19:50
@aryan9600 aryan9600 force-pushed the fix-endpointslice-deletion branch 2 times, most recently from f28b044 to b473a9e Compare April 27, 2022 18:58
@@ -557,3 +560,14 @@ func trackSync(err error) {
}
endpointslicemetrics.EndpointSliceSyncs.WithLabelValues(metricLabel).Inc()
}

func dropEndpointSlicesPendingDeletion(endpointSlices []*discovery.EndpointSlice) []*discovery.EndpointSlice {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forget my previous comment, this is right

@@ -364,6 +364,9 @@ func (c *Controller) syncService(key string) error {
return err
}

// Drop EndpointSlices that have been marked for deletion to prevent the controller from getting stuck.
endpointSlices = dropEndpointSlicesPendingDeletion(endpointSlices)

if c.endpointSliceTracker.StaleSlices(service, endpointSlices) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robscott I can't remember all the details now of the tracker, is there anything else we should do in the endpointSliceTracker?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting for future us - this is right, explained by https://github.com/kubernetes/kubernetes/pull/109624/files#r860592618.

Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aryan9600! This change LGTM, just a couple tiny nits but will approve after those are addressed.

Comment on lines 282 to 283
// The EndpointSlice marked for deletion should be ignored by the controller, and thus
// should not result in only one action from the client (on the initial EndpointSlice).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to explain which action we're actually expecting here. I'm assuming it's just a read from the controller?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure. It's not a read, it's an update where the non-terminating endpointslice's trigger time annotation is removed.

Comment on lines 1878 to 1901
assert.Len(t, result, 2)
for _, endpointSlice := range endpointSlices {
if endpointSlice.Name == "epSlice1" {
t.Errorf("Expected EndpointSlice marked for deletion to be dropped.")
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more complicated, but it would be nice to have some kind of test that actually verified the precise contents of the remaining list to be sure there's nothing funny happening like the objects being emptied out. (I know that's impossible with the current code, just would likely slightly more thorough tests here).

Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 6, 2022
@robscott
Copy link
Member

robscott commented Jun 9, 2022

Thanks @aryan9600! I think this bug fix is worth cherry picking back to previous versions as well - I can approve if you create the cherry picks.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 9, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aryan9600, robscott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 9, 2022
@aryan9600
Copy link
Member Author

/retest-required

@k8s-ci-robot k8s-ci-robot merged commit e8d6b76 into kubernetes:master Jun 9, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone Jun 9, 2022
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jun 9, 2022
k8s-ci-robot added a commit that referenced this pull request Jun 9, 2022
…09624-upstream-release-1.24

Automated cherry pick of #109624: Ignore EndpointSlices that are already marked for deletion
k8s-ci-robot added a commit that referenced this pull request Jun 9, 2022
…09624-upstream-release-1.22

Automated cherry pick of #109624: Ignore EndpointSlices that are already marked for deletion
k8s-ci-robot added a commit that referenced this pull request Jun 9, 2022
…09624-upstream-release-1.23

Automated cherry pick of #109624: Ignore EndpointSlices that are already marked for deletion
dgrisonnet pushed a commit to dgrisonnet/kubernetes that referenced this pull request Sep 2, 2022
…/fix-endpointslice-deletion

Ignore EndpointSlices that are marked for deletion

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EndpointSlice controllers should ignore EndpointSlices that are in a terminating state
5 participants