-
Notifications
You must be signed in to change notification settings - Fork 1.6k
WIP: KEP-5343: Updates to kube-proxy-backends #5344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| Another possibility would be to deprecate the existing multi-mode | ||
| kube-proxy binary in favor of having separate `kube-proxy-iptables`, | ||
| `kube-proxy-ipvs`, and `kube-proxy-nftables` binaries (and perhaps, | ||
| eventually, separate images). That would also work well with the plan | ||
| to deprecate `ipvs` mode (and would allow us to completely remove | ||
| the existing deprecated CLI options)... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is painful for developers and users, more binaries means more images to maintain and to release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple binaries in the same image (built from nearly the same sources) would not really be much more work for anyone.
we could maybe even do the argv[0] hack and just have a single binary, but have it behave differently depending on whether you invoke it as kube-proxy or kube-proxy-nftables...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I was assuming independent artifacts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, both options (multiple binaries, one image / multiple binaries, multiple images) are things to consider. Clearly the single image version is simpler though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after having to revert the nodemanager code because of a totally sane simplification we made (and of course the same thing earlier with the config/CLI flag changes), I'm starting to think that "deprecate the existing kube-proxy binary so we can replace it with a new one that doesn't have years of weird backward-compatibility constraints" is actually a really great idea... (@aroradaman )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danwinship do you mean create new binaries for each flavor?
... the problem is to make this in a backwards compatible way, or you install a wrapper with the same name or you keep one version (the default) with the default name, there is a lot of tooling that will depend on the existing image name ... also kube-proxy is an official release artifact so we also need to talk with sig release about that in advance but I'm +1 on the idea personally, I went through this problem recently for kube-network-policies and after trying multiple approaches sounds like the best solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My theory is that we would keep shipping a kube-proxy binary that would work exactly like it does now (except for maybe eventually not supporting ipvs), and then three new binaries, kube-proxy-iptables, kube-proxy-ipvs, and kube-proxy-nftables, that would each only support 1 backend, and could get rid of all of the legacy CLI/config options. (And new config options would only be added to the new binaries, to force people to eventually migrate.)
|
|
||
| - Moving `kube-proxy-ipvs` to a staged repository. | ||
|
|
||
| - Moving `kube-proxy-ipvs` to a non-staged repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will just fork the entire code in its own repo and open it for new maintainers, basically advocating for this option and what you conclude in the next paragraph
|
a good exercise for people willing to help will be to create a standalone repo with the ipvs proxy and windows proxy from the existing code in k/k to show feasibility ... I think that if that works we just start the deprecation period and point the people that wants to use them to this new repo ... |
|
/cc |
|
|
||
| ## Proposal | ||
|
|
||
| ### Determine a timeline for declaring `nftables` to be the "preferred" backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spit balling an idea here:
As a GKE user, we just use whatever GKE give us. Last I checked on v1.31 clusters, we get iptables. I'd very much like us to move to nftables, exposing me and our team to nftables, and also helping shake out any bugs that we may hit.
In other KEPs I've seen graduation criteria include conditions saying that enough clouds needs to use a feature before it can be declared GA... I wonder if we need to do something similar here? Something along the lines of "Go convince X providers to set nftables to default".
I assume a cloud provider will need some sort of motivation to do so, so I guess we may need to give them a reason to do that too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graduation criteria like that are generally for features that are implemented outside of k/k. You can't declare a NetworkPolicy or LoadBalancer feature GA if network plugins / cloud providers haven't implemented it.
The motivation for clouds/distros/etc to move to nftables is that it's faster and more efficient. If nobody was moving to nftables by default, that would probably be a good signal that there's something wrong and we need to deal with it before declaring nftables default. But I don't think it quite works in the other direction: the fact that Amazon and Google have decided to ship new-enough kernels that they can support nftables doesn't imply that everyone else is running new-enough kernels that they can support nftables...
|
|
||
| - Figure out the situation around "the default kube-proxy backend". | ||
|
|
||
| - Figure out a plan for deprecating the `ipvs` backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the sig-network call, I asked if it was possible to move this item into its own KEP, so we could move on it faster.
What was the reason that that wasn't a good idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessarily not a good idea. I was saying the ipvs plan may want to be informed by what we decide about making nftables the default, but I guess we're already telling people to stop using it either way, so maybe it does make sense to split it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was saying the ipvs plan may want to be informed by what we decide about making nftables the default, but I guess we're already telling people to stop using it either way, so maybe it does make sense to split it out.
Since ipvs is opt-in (ie: it's not default), I assume people have a reason they want to use ipvs. My assumption is that the reason is for performance. So in theory nftables fixes that for them, and no matter what we do with the default, they can chose to use nftables too.
I also assume that iptables is here to stay, for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but nftables only fixes it for them if they can run nftables, and currently the kernel requirement for being able to run nftables is much more recent than for kubernetes as a whole (and we don't currently have good information about what percentage of kubernetes users have a new-enough kernel to support nftables kube-proxy).
Also FWIW, the most recent "maybe you shouldn't use ipvs" issue was filed against 1.27, which is before even alpha nftables...
|
/cc |
| shared kube-proxy code into the kube-proxy staging repo | ||
| (`k8s.io/kube-proxy`) so that it could be more easily shared by | ||
| out-of-tree service proxy implementations. That KEP was never merged, | ||
| though there was general agreement with the high-level idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more than agreement is more about some of us expressing our concerns about if is worth to do it ...kpng also thought there were a lot of people going to use a shared libraries that implemented Services and talked about an existing demand for it ... but the anecdotal evidence is than more that a user story is a developer story and there was no such demand in the industry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is mostly a developer story, but there is potentially user benefit. Cilium and ovn-kubernetes have both lagged behind in implementing new Service features, and that lag could be reduced by having more shared code that they could make use of. Assuming we believe that users want the new Service features, then having alternative service proxies implement those features sooner rather than later is a good thing for users. (CategorizeEndpoints is an example of a service functionality building block that can easily be shared by arbitrary backends. Admittedly, some features inevitably require mostly backend-specific code and would not be helped by a "proxy library".)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point , after working on both codebases for Services I had to admit this will be hard to accomplish in the short term but it will be a clear benefit for kubernetes and the ecosystem if we can achieve this +1
| - Figure out (in conjunction with SIG Windows) what we want to do with | ||
| the `winkernel` backend. | ||
|
|
||
| Assuming we end up deciding we want to move any proxy implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubernetes is batteries included and we need to release with all the core components, kube-proxy implements services that is a core feature, I do not see how we can get kube-proxy out of the monorepo without having another implementation of Services in-tree ... the monorepo ensure all development, APIs and implementation of those APIs evolve in sync, having things separately cause a lot of problems for supportability and additional development friction problems by having to sync both projects for any feature developed, it will imply adding changes to k/k then to k/proxy and then sync both ... also the risk of deadlocking and regressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k/k is not batteries included; it does not include a pod networking implementation. (It used to, but we removed it!)
But anyway, removing kube-proxy from k/k is explicitly a Non-Goal of the KEP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your are right, my bad, kindnet is filling that gap today in most kubernetes projects... should we also think in bringing it into kubernetes-sigs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not related to this KEP, but yes, I definitely support having kindnet be an official reference implementation of pod networking.)
Just wondering whether the approach has been finalized. Will it be a separate repository within the Kubernetes project, for example: kubernetes/kube-proxy-windows? Thanks, |
|
Nothing has been decided yet... |
|
I see that #5343 is on the 1.35 enhancement board but I'm not sure if this KEP is ready for 1.35? Should we remove this from the milestone? |
|
yeah, we still don't have a plan here |
d08b62d to
b9f058c
Compare
|
Updated the discussion of IPVS deprecation. PTAL. |
b9f058c to
b823da5
Compare
(This is WIP, but ready for review; the WIP-iness is about figuring out the scope/details of what we want to do.)