Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: perform dockershim cleanup for 1.25 #110022

Conversation

neolit123
Copy link
Member

@neolit123 neolit123 commented May 12, 2022

/kind cleanup

What this PR does / why we need it:

Given kubeadm 1.25 only supports kubelet 1.25 and 1.24,
1.23 related logic around dockershim can be removed.

  • Don't clean the directories
    /var/lib/dockershim, /var/runkubernetes, /var/lib/cni
  • Pass the CRISocket directly to the kubelet
    --container-runtime-endpoint flag without extra handling
    of dockershim
  • No longer apply the --container-runtime=remote flag
    as that is the only possible value in 1.24 and 1.25
  • Update unit tests

Note: we are still passing --pod-infra-container-image
to avoid the pause image to be GCed by the kubelet.
#106893

Which issue(s) this PR fixes:

xref kubernetes/kubeadm#2626

Special notes for your reviewer:

NONE

Does this PR introduce a user-facing change?

kubeadm: perform additional dockershim cleanup. Treat all container runtimes as remote by using the flag "--container-runtime=remote", given dockershim was removed in 1.24 and given kubeadm 1.25 supports a kubelet version of 1.24 and 1.25. The flag "--network-plugin" will no longer be used for new clusters. Stop cleaning up the following dockershim related directories on "kubeadm reset": "/var/lib/dockershim", "/var/runkubernetes", "/var/lib/cni"

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Given kubeadm 1.25 only supports kubelet 1.25 and 1.24,
1.23 related logic around dockershim can be removed.

- Don't clean the directories
/var/lib/dockershim, /var/runkubernetes, /var/lib/cni
- Pass the CRISocket directly to the kubelet
--container-runtime-endpoint flag without extra handling
of dockershim
- No longer apply the --container-runtime=remote flag
as that is the only possible value in 1.24 and 1.25
- Update unit tests


Note: we are still passing --pod-infra-container-image
to avoid the pause image to be GCed by the kubelet.
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 12, 2022
@neolit123
Copy link
Member Author

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 12, 2022
@k8s-ci-robot k8s-ci-robot added area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 12, 2022
@pacoxu
Copy link
Member

pacoxu commented May 13, 2022

var defaultKnownCRISockets = []string{
constants.CRISocketContainerd,
constants.CRISocketCRIO,
constants.CRISocketDocker,
}

Should the CRISocketDocker constant be removed and should we remove it from the default-known array?

Copy link
Member

@pacoxu pacoxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 13, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123, pacoxu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@neolit123
Copy link
Member Author

neolit123 commented May 13, 2022

Should the CRISocketDocker constant be removed and should we remove it from the default-known array?

It has the cri-dockerd socket so we should keep it, also keep it as part of autodetection:

CRISocketDocker = "unix:///var/run/cri-dockerd.sock"

@pacoxu
Copy link
Member

pacoxu commented May 13, 2022

Got it.

@k8s-ci-robot k8s-ci-robot merged commit 9d85e18 into kubernetes:master May 13, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone May 13, 2022
@neolit123
Copy link
Member Author

neolit123 commented May 13, 2022

looks like this PR broke:
https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-kubelet-1-23-on-latest

problem appears to be that we are testing kubeadm at master (1.25) against a 1.23 kubelet.
the 1.23 kubelet requires the --container-runtime=remote flag, otherwise it will default to docker.
at least that's what i think it is.

May 13 22:06:35 kinder-xony-control-plane-1 kubelet[295]: I0513 22:06:35.095953 295 container_manager_linux.go:321] "Creating device plugin manager" devicePluginEnabled=true
May 13 22:06:35 kinder-xony-control-plane-1 kubelet[295]: I0513 22:06:35.096072 295 state_mem.go:36] "Initialized new in-memory state store"
May 13 22:06:35 kinder-xony-control-plane-1 kubelet[295]: I0513 22:06:35.096147 295 kubelet.go:313] "Using dockershim is deprecated, please consider using a full-fledged CRI implementation"
May 13 22:06:35 kinder-xony-control-plane-1 kubelet[295]: I0513 22:06:35.096182 295 client.go:80] "Connecting to docker on the dockerEndpoint" endpoint="unix:///var/run/docker.sock"
May 13 22:06:35 kinder-xony-control-plane-1 kubelet[295]: I0513 22:06:35.096196 295 client.go:99] "Start docker client with request timeout" timeout="2m0s"
May 13 22:06:35 kinder-xony-control-plane-1 kubelet[295]: E0513 22:06:35.096551 295 server.go:302] "Failed to run kubelet" err="failed to run Kubelet: failed to get docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"

https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-kubelet-1-23-on-latest/1525234968983244800/artifacts/kinder-xony-control-plane-1/kubelet.log

--container-runtime string Default: docker

https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

i guess, what we can do is bring the flag --container-runtime=remote back in kubeadm 1.25 and remove it in 1.26 or later. maybe when it's removed from the kubelet (as it's currently deprecated).

https://github.com/kubernetes/kubernetes/pull/110022/files#diff-971a1f41476f9fc507a738327091a497a3bd193ad7b17ca46454e975dfcb62e4L114

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants