Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pod stuck in termination state when mount fails or gets skipped after kubelet restart #110670

Merged
merged 6 commits into from Jul 27, 2022

Conversation

gnufied
Copy link
Member

@gnufied gnufied commented Jun 21, 2022

This PR partially uses changes from - #110575

And fixes #96635 and #70044

This PR:

  1. Keeps tracks of volumes which were skipped during reconstruction with the assumption that since they were present in DSOW, they will make it to ASOW.
  2. Enhances the check that makes volumes uncertain on unmounted - https://github.com/kubernetes/kubernetes/pull/110670/files#diff-3e8f8a6e5eca57db8e84bcc6c896f0a3b10289fe55d8e999b4bdf1cfdc0de394L785 to ensure that, we are not accidentally marking volumes which were added via reconsturction as unmounted because mount is failing after kubelet restart.

There are two important things in this PR:

  1. It adds volumes skipped during reconstruction as uncertain in ASOW.
  2. But above change alone is not enough to fix the bug (or unmount the volume), because - when kubelet will retry to mount the volume, mount will fail and it will mark the volume we added as uncertain in previous step as "unmounted" - https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/operationexecutor/operation_generator.go#L788 . So basically if mount operation fails, kubelet will undo our hard work of adding volumes which we added as uncertain in step#1.

Now it is clear that - we need to add volumes we read from disk as uncertain, so as kubelet can unmount them properly. But there are two ways to fix the second problem (i.e where kubelet is marking mount point as unmounted, if mount fails for a volume which was marked uncertain before).

  1. In this PR - I am keeping track of such reconstructed volumes in ASOW, so as even if mount fails for the volume it must not mark the volume was unmounted, if volume was added via reconstruction.
  2. There is another easier way - which is to make all volumes uncertain in ASOW if mount fails. That is what - Make mount errors uncertain #110650 PR does. This code however breaks an existing behaviour where if kubelet starts mounting a volume and may be operation timed-out (marked uncertain) and kubelet retries and mount fails then volume is marked as unmounted and a pod that was using this volume can be deleted without unmounting the volume. If we treat every mount failure as uncertain mount, then volume must be unmounted for pod to be deleted.
Unmount volumes correctly for reconstructed volumes even if mount operation fails after kubelet restart

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/storage Categorizes an issue or PR as relevant to SIG Storage. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 21, 2022
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jun 21, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnufied

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 21, 2022
@aojea
Copy link
Member

aojea commented Jun 21, 2022

/assign @smarterclayton @bobbypage

pkg/volume/util/operationexecutor/operation_executor.go Outdated Show resolved Hide resolved
pkg/volume/util/operationexecutor/operation_generator.go Outdated Show resolved Hide resolved
pkg/volume/util/operationexecutor/operation_generator.go Outdated Show resolved Hide resolved
}

if uncertainVolumeCount > 0 {
// If the volume has device to mount, we mark its device as mounted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"mark its device as uncertain"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@pacoxu pacoxu added this to Needs Reviewer in SIG Node PR Triage Jun 24, 2022
@pacoxu pacoxu moved this from Needs Reviewer to Waiting on Author in SIG Node PR Triage Jun 24, 2022
@pacoxu pacoxu moved this from Waiting on Author to Triage in SIG Node PR Triage Jun 24, 2022
@gnufied gnufied force-pushed the fix-pod-deletion-terminating branch from c1623ee to 3801ad1 Compare June 29, 2022 20:21
@jingxu97
Copy link
Contributor

How about during reconstruction, we put all reconstructed volumes into actual state with uncertain mode even if desired state has this information (we might need to sync between desired state and actual state which already there)?

@gnufied
Copy link
Member Author

gnufied commented Jun 30, 2022

@jingxu97 we could and that is what #108180 PR by @jsafrane does. But adding all volumes to ASW in uncertain state during reconstruction brings new challenges, such as if we add a volume to ASW before it gets added to DSW, then we could accidently unmount the volume. So it has all these additional checks to safeguard against similar failure modes - https://github.com/kubernetes/kubernetes/pull/108180/files#diff-2acbd2d6c088f99f9c7e9042b3449418b4824675d2692350c332d0f276489268R40 ( @jsafrane correct me if I am wrong).

But that alone will not fix the issue, because we still need to avoid marking volumes which we added via recontruction to ASW as "unmounted" (from previous "uncertan" state if mount fails) - f291291#diff-3e8f8a6e5eca57db8e84bcc6c896f0a3b10289fe55d8e999b4bdf1cfdc0de394R787

So we have this middle ground PR of mine, which avoids those problems and still fixes issues we set out to fix.

@gnufied
Copy link
Member Author

gnufied commented Jun 30, 2022

/retest

@jsafrane
Copy link
Member

As Hemant pointed out. will need all reconstructed volumes in ASW for my SELinux work. But I can have it behind a feature gate + we don't need to worry about backporting it as a bugfix to older releases. (And yes, I need to update it with the latest Hemant's fixes from this PR.) This PR can be somewhere in middle and could be backported to older releases.

@gnufied
Copy link
Member Author

gnufied commented Jun 30, 2022

Yeah eventually once jan's PR gets merged and enabled by default - we could delete the separate processReconstructedVolumes function and have reconstruction add all volumes. We will still need part#2 of this PR - which tracks reconstructed volumes in ASOW, so as we are not accidently flipping mount state of uncertain volumes which were added to ASOW via reconstruction on mount failures.

@k8s-ci-robot k8s-ci-robot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 30, 2022
Add unit tests for removing reconstructed volumes from ASOW
Also only remove volumes from skippedDuringReconstruction only if
volume was marked as attached.
@gnufied gnufied force-pushed the fix-pod-deletion-terminating branch from ee28639 to 835e8cc Compare July 23, 2022 11:36
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 23, 2022
@gnufied
Copy link
Member Author

gnufied commented Jul 24, 2022

/retest

@jsafrane
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 27, 2022
@gnufied
Copy link
Member Author

gnufied commented Jul 27, 2022

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 27, 2022
@k8s-ci-robot k8s-ci-robot merged commit 9ad4c5c into kubernetes:master Jul 27, 2022
SIG Node PR Triage automation moved this from Triage to Done Jul 27, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone Jul 27, 2022
@rphillips
Copy link
Member

Created a cherry pick for 1.24 here.

k8s-ci-robot added a commit that referenced this pull request Aug 10, 2022
…10670-upstream-release-1.24

Automated cherry pick of #110670: Keep track of each pod that uses a volume during
jsafrane added a commit to jsafrane/kubernetes that referenced this pull request Nov 4, 2022
To preserve fix in kubernetes#110670,
add an unit test that check a volume is *uncertain* even after final mount
error when it was reconstructed.

And actually fix a regression introduced in the previous patch.
jaehnri pushed a commit to jaehnri/kubernetes that referenced this pull request Jan 3, 2023
To preserve fix in kubernetes#110670,
add an unit test that check a volume is *uncertain* even after final mount
error when it was reconstructed.

And actually fix a regression introduced in the previous patch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

kubelet is not able to delete pod with mounted secret/configmap after restart
9 participants