Skip to content

Conversation

@danwinship
Copy link
Contributor

@danwinship danwinship commented May 27, 2025

  • One-line PR description: Post-KEP-3866, figure out what to do about "default" kube-proxy backend and deprecated backends

(This is WIP, but ready for review; the WIP-iness is about figuring out the scope/details of what we want to do.)

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 27, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels May 27, 2025
@k8s-ci-robot k8s-ci-robot requested a review from aojea May 27, 2025 02:09
@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label May 27, 2025
@k8s-ci-robot k8s-ci-robot requested a review from thockin May 27, 2025 02:09
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 27, 2025
Comment on lines +177 to +191
Another possibility would be to deprecate the existing multi-mode
kube-proxy binary in favor of having separate `kube-proxy-iptables`,
`kube-proxy-ipvs`, and `kube-proxy-nftables` binaries (and perhaps,
eventually, separate images). That would also work well with the plan
to deprecate `ipvs` mode (and would allow us to completely remove
the existing deprecated CLI options)...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is painful for developers and users, more binaries means more images to maintain and to release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple binaries in the same image (built from nearly the same sources) would not really be much more work for anyone.

we could maybe even do the argv[0] hack and just have a single binary, but have it behave differently depending on whether you invoke it as kube-proxy or kube-proxy-nftables...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I was assuming independent artifacts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, both options (multiple binaries, one image / multiple binaries, multiple images) are things to consider. Clearly the single image version is simpler though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after having to revert the nodemanager code because of a totally sane simplification we made (and of course the same thing earlier with the config/CLI flag changes), I'm starting to think that "deprecate the existing kube-proxy binary so we can replace it with a new one that doesn't have years of weird backward-compatibility constraints" is actually a really great idea... (@aroradaman )

Copy link
Member

@aojea aojea Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danwinship do you mean create new binaries for each flavor?
... the problem is to make this in a backwards compatible way, or you install a wrapper with the same name or you keep one version (the default) with the default name, there is a lot of tooling that will depend on the existing image name ... also kube-proxy is an official release artifact so we also need to talk with sig release about that in advance but I'm +1 on the idea personally, I went through this problem recently for kube-network-policies and after trying multiple approaches sounds like the best solution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My theory is that we would keep shipping a kube-proxy binary that would work exactly like it does now (except for maybe eventually not supporting ipvs), and then three new binaries, kube-proxy-iptables, kube-proxy-ipvs, and kube-proxy-nftables, that would each only support 1 backend, and could get rid of all of the legacy CLI/config options. (And new config options would only be added to the new binaries, to force people to eventually migrate.)


- Moving `kube-proxy-ipvs` to a staged repository.

- Moving `kube-proxy-ipvs` to a non-staged repository.
Copy link
Member

@aojea aojea May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will just fork the entire code in its own repo and open it for new maintainers, basically advocating for this option and what you conclude in the next paragraph

@aojea
Copy link
Member

aojea commented May 30, 2025

a good exercise for people willing to help will be to create a standalone repo with the ipvs proxy and windows proxy from the existing code in k/k to show feasibility ... I think that if that works we just start the deprecation period and point the people that wants to use them to this new repo ...

@shaneutt
Copy link
Member

shaneutt commented Jun 5, 2025

/cc

@k8s-ci-robot k8s-ci-robot requested a review from shaneutt June 5, 2025 16:11

## Proposal

### Determine a timeline for declaring `nftables` to be the "preferred" backend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spit balling an idea here:
As a GKE user, we just use whatever GKE give us. Last I checked on v1.31 clusters, we get iptables. I'd very much like us to move to nftables, exposing me and our team to nftables, and also helping shake out any bugs that we may hit.

In other KEPs I've seen graduation criteria include conditions saying that enough clouds needs to use a feature before it can be declared GA... I wonder if we need to do something similar here? Something along the lines of "Go convince X providers to set nftables to default".
I assume a cloud provider will need some sort of motivation to do so, so I guess we may need to give them a reason to do that too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graduation criteria like that are generally for features that are implemented outside of k/k. You can't declare a NetworkPolicy or LoadBalancer feature GA if network plugins / cloud providers haven't implemented it.

The motivation for clouds/distros/etc to move to nftables is that it's faster and more efficient. If nobody was moving to nftables by default, that would probably be a good signal that there's something wrong and we need to deal with it before declaring nftables default. But I don't think it quite works in the other direction: the fact that Amazon and Google have decided to ship new-enough kernels that they can support nftables doesn't imply that everyone else is running new-enough kernels that they can support nftables...


- Figure out the situation around "the default kube-proxy backend".

- Figure out a plan for deprecating the `ipvs` backend.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the sig-network call, I asked if it was possible to move this item into its own KEP, so we could move on it faster.
What was the reason that that wasn't a good idea?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessarily not a good idea. I was saying the ipvs plan may want to be informed by what we decide about making nftables the default, but I guess we're already telling people to stop using it either way, so maybe it does make sense to split it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was saying the ipvs plan may want to be informed by what we decide about making nftables the default, but I guess we're already telling people to stop using it either way, so maybe it does make sense to split it out.

Since ipvs is opt-in (ie: it's not default), I assume people have a reason they want to use ipvs. My assumption is that the reason is for performance. So in theory nftables fixes that for them, and no matter what we do with the default, they can chose to use nftables too.

I also assume that iptables is here to stay, for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but nftables only fixes it for them if they can run nftables, and currently the kernel requirement for being able to run nftables is much more recent than for kubernetes as a whole (and we don't currently have good information about what percentage of kubernetes users have a new-enough kernel to support nftables kube-proxy).

Also FWIW, the most recent "maybe you shouldn't use ipvs" issue was filed against 1.27, which is before even alpha nftables...

@aroradaman
Copy link
Member

/cc

shared kube-proxy code into the kube-proxy staging repo
(`k8s.io/kube-proxy`) so that it could be more easily shared by
out-of-tree service proxy implementations. That KEP was never merged,
though there was general agreement with the high-level idea.
Copy link
Member

@aojea aojea Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more than agreement is more about some of us expressing our concerns about if is worth to do it ...kpng also thought there were a lot of people going to use a shared libraries that implemented Services and talked about an existing demand for it ... but the anecdotal evidence is than more that a user story is a developer story and there was no such demand in the industry

Copy link
Contributor Author

@danwinship danwinship Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is mostly a developer story, but there is potentially user benefit. Cilium and ovn-kubernetes have both lagged behind in implementing new Service features, and that lag could be reduced by having more shared code that they could make use of. Assuming we believe that users want the new Service features, then having alternative service proxies implement those features sooner rather than later is a good thing for users. (CategorizeEndpoints is an example of a service functionality building block that can easily be shared by arbitrary backends. Admittedly, some features inevitably require mostly backend-specific code and would not be helped by a "proxy library".)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point , after working on both codebases for Services I had to admit this will be hard to accomplish in the short term but it will be a clear benefit for kubernetes and the ecosystem if we can achieve this +1

- Figure out (in conjunction with SIG Windows) what we want to do with
the `winkernel` backend.

Assuming we end up deciding we want to move any proxy implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubernetes is batteries included and we need to release with all the core components, kube-proxy implements services that is a core feature, I do not see how we can get kube-proxy out of the monorepo without having another implementation of Services in-tree ... the monorepo ensure all development, APIs and implementation of those APIs evolve in sync, having things separately cause a lot of problems for supportability and additional development friction problems by having to sync both projects for any feature developed, it will imply adding changes to k/k then to k/proxy and then sync both ... also the risk of deadlocking and regressions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k/k is not batteries included; it does not include a pod networking implementation. (It used to, but we removed it!)

But anyway, removing kube-proxy from k/k is explicitly a Non-Goal of the KEP.

Copy link
Member

@aojea aojea Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your are right, my bad, kindnet is filling that gap today in most kubernetes projects... should we also think in bringing it into kubernetes-sigs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not related to this KEP, but yes, I definitely support having kindnet be an official reference implementation of pod networking.)

@princepereira
Copy link

princepereira commented Sep 22, 2025

a good exercise for people willing to help will be to create a standalone repo with the ipvs proxy and windows proxy from the existing code in k/k to show feasibility ... I think that if that works we just start the deprecation period and point the people that wants to use them to this new repo ...

Just wondering whether the approach has been finalized. Will it be a separate repository within the Kubernetes project, for example: kubernetes/kube-proxy-windows?

Thanks,

@danwinship
Copy link
Contributor Author

Nothing has been decided yet...

@adrianmoisey adrianmoisey changed the title WIP: KEP-5343: Updates to kube-proxy-backends KEP-5343: Updates to kube-proxy-backends Oct 6, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 6, 2025
@adrianmoisey adrianmoisey changed the title KEP-5343: Updates to kube-proxy-backends WIP: KEP-5343: Updates to kube-proxy-backends Oct 6, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 6, 2025
@kannon92
Copy link
Contributor

kannon92 commented Oct 9, 2025

I see that #5343 is on the 1.35 enhancement board but I'm not sure if this KEP is ready for 1.35?

Should we remove this from the milestone?

@danwinship
Copy link
Contributor Author

yeah, we still don't have a plan here

@danwinship
Copy link
Contributor Author

Updated the discussion of IPVS deprecation. PTAL.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/network Categorizes an issue or PR as relevant to SIG Network. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants