Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix kube-proxy bug with multiple LB IPs and source ranges #109826

Merged
merged 3 commits into from May 6, 2022

Conversation

danwinship
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

Due to incorrect nesting of the LB-related loops in syncProxyRules, if you had a LoadBalancer service with multiple LoadBalancer IPs, and a LoadBalancerSourceRange that overlapped the node IP, we would generate the rules incorrectly (with the result that traffic from the node to the second LB IP would probably be dropped rather than accepted).

eg, given LoadBalancer IPs 1.2.3.4 and 5.6.7.8, node IP 192.168.0.2, and source ranges [192.168.0.0/24, 203.0.113.0/25], we would generate:

-A KUBE-FW-XPGD46QRK7WJZT7O -s 192.168.0.0/24 -j KUBE-EXT-XPGD46QRK7WJZT7O
-A KUBE-FW-XPGD46QRK7WJZT7O -s 203.0.113.0/25 -j KUBE-EXT-XPGD46QRK7WJZT7O
-A KUBE-FW-XPGD46QRK7WJZT7O -s 1.2.3.4 -j KUBE-EXT-XPGD46QRK7WJZT7O
-A KUBE-FW-XPGD46QRK7WJZT7O -j KUBE-MARK-DROP
-A KUBE-FW-XPGD46QRK7WJZT7O -s 192.168.0.0/24 -j KUBE-EXT-XPGD46QRK7WJZT7O
-A KUBE-FW-XPGD46QRK7WJZT7O -s 203.0.113.0/25 -j KUBE-EXT-XPGD46QRK7WJZT7O
-A KUBE-FW-XPGD46QRK7WJZT7O -s 5.6.7.8 -j KUBE-EXT-XPGD46QRK7WJZT7O
-A KUBE-FW-XPGD46QRK7WJZT7O -j KUBE-MARK-DROP

rather than the desired

-A KUBE-FW-XPGD46QRK7WJZT7O -s 192.168.0.0/24 -j KUBE-EXT-XPGD46QRK7WJZT7O
-A KUBE-FW-XPGD46QRK7WJZT7O -s 203.0.113.0/25 -j KUBE-EXT-XPGD46QRK7WJZT7O
-A KUBE-FW-XPGD46QRK7WJZT7O -s 1.2.3.4 -j KUBE-EXT-XPGD46QRK7WJZT7O
-A KUBE-FW-XPGD46QRK7WJZT7O -s 5.6.7.8 -j KUBE-EXT-XPGD46QRK7WJZT7O
-A KUBE-FW-XPGD46QRK7WJZT7O -j KUBE-MARK-DROP

(The allow-from-LB-IP rule is added only when the source ranges overlap the node IP, and is explained in the code as:

	// For VIP-like LBs, the VIP is often added as a local
	// address (via an IP route rule).  In that case, a request
	// from a node to the VIP will not hit the loadbalancer but
	// will loop back with the source IP set to the VIP.  We
	// need the following rule to allow requests from this node.

)

This is not a regression; our refactorings in 1.24 correctly preserved the existing behavior, which has apparently always been wrong. (It's probably uncommon to have multiple LB IPs for the same service, so no one ever noticed?)

Which issue(s) this PR fixes:

none; discovered while refactoring for another PR

Does this PR introduce a user-facing change?

Fixed a long-standing but very obscure bug involving Services of type LoadBalancer with multiple IPs and a LoadBalancerSourceRanges that overlaps the node IP.

/sig network
/priority important-soon

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 5, 2022
@k8s-ci-robot
Copy link
Contributor

@danwinship: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 5, 2022
The LoadBalancer rules change if the node IP is in one of the
LoadBalancerSourceRange subnets, so make sure to set nodeIP on the
fake proxier so we can test this, and add a second source range to
TestLoadBalancer containing the node IP. (This changes the result of
one flow test that previously expected that node-to-LB would be
dropped.)
The various loops in the LoadBalancer rule section were mis-nested
such that if a service had multiple LoadBalancer IPs, we would write
out the firewall rules multiple times (and the allowFromNode rule for
the second and later IPs would end up being written after the "else
DROP" rule from the first IP).
@aojea
Copy link
Member

aojea commented May 5, 2022

/lgtm
indeed , very subtle

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 5, 2022
@pacoxu
Copy link
Member

pacoxu commented May 6, 2022

/retest

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

1 similar comment
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-ci-robot k8s-ci-robot merged commit 2b3508e into kubernetes:master May 6, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone May 6, 2022
@danwinship danwinship deleted the multi-load-balancer branch May 6, 2022 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants