Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce compatibility version for Kubernetes control-plane upgrades #4330

Open
4 tasks done
logicalhan opened this issue Nov 6, 2023 · 60 comments · Fixed by kubernetes/kubernetes#122891
Open
4 tasks done
Assignees
Labels
sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team
Milestone

Comments

@logicalhan
Copy link
Member

logicalhan commented Nov 6, 2023

Enhancement Description

  • One-line enhancement description (can be used as a release note):

Introduce compatibility version in Kubernetes components to enhance Kubernetes control-plane upgrades. See safer Kubernetes upgrades for more details.

Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Nov 6, 2023
@jeremyrickard
Copy link
Contributor

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 7, 2023
@jpbetz
Copy link
Contributor

jpbetz commented Jan 19, 2024

/sig architecture

@k8s-ci-robot k8s-ci-robot added the sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. label Jan 19, 2024
@jpbetz
Copy link
Contributor

jpbetz commented Jan 23, 2024

/milestone 1.30

@k8s-ci-robot
Copy link
Contributor

@jpbetz: You must be a member of the kubernetes/milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Milestone Maintainers Team and have them propose you as an additional delegate for this responsibility.

In response to this:

/milestone 1.30

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jpbetz added a commit to jpbetz/kubernetes that referenced this issue Jan 23, 2024
We won't need to do this manually forever. Long term kubernetes/enhancements#4330 will set it.  But for now we bump it for each release.
@jpbetz
Copy link
Contributor

jpbetz commented Jan 23, 2024

/milestone v1.30

@johnbelamaric
Copy link
Member

/lead-opted-in

@jpbetz
Copy link
Contributor

jpbetz commented Feb 6, 2024

/label lead-opted-in

@k8s-ci-robot k8s-ci-robot added the lead-opted-in Denotes that an issue has been opted in to a release label Feb 6, 2024
@mickeyboxell mickeyboxell moved this to At Risk for Enhancements Freeze in 1.30 Enhancements Tracking Feb 8, 2024
@mickeyboxell
Copy link

Hello @logicalhan 👋, Enhancements team here.

Just checking in as we approach enhancements freeze today ([02:00 UTC Friday 9th February 2024 / 18:00 PDT Thursday 8th February 2024](https://everytimezone.com/s/1ade3dca)):.

This enhancement is targeting for stage alpha for v1.30 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • Make sure to update your status to implementable in the kep.yaml file.
  • Please let me know when this enhancement is

The status of this enhancement is marked as at risk for enhancement freeze. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

@mickeyboxell
Copy link

mickeyboxell commented Feb 9, 2024

Hello 👋, v1.30 Enhancements team here.

Unfortunately, this enhancement did not meet requirements for enhancements freeze.

If you still wish to progress this enhancement in {current release}, please file an exception request. Thanks!

  • It looks like you still need to update the KEP status to implementable for latest-milestone: 1.30.

@mickeyboxell mickeyboxell moved this from At Risk for Enhancements Freeze to Removed from Milestone in 1.30 Enhancements Tracking Feb 9, 2024
@salehsedghpour
Copy link

/milestone clear

@k8s-ci-robot k8s-ci-robot removed this from the v1.30 milestone Feb 9, 2024
@jpbetz
Copy link
Contributor

jpbetz commented Feb 9, 2024

This is due to a clerical error. #4395 was intended to be the KEP PR but we neglected to notice is was still marked as provisional. #4502 fixes this.

@salehsedghpour
Copy link

@jpbetz Please make sure that #4502 is being merged and also consider filing an exception request.

@jpbetz
Copy link
Contributor

jpbetz commented Feb 9, 2024

@jpbetz Please make sure that #4502 is being merged and also consider filing an exception request.

#4502 is merged. Do we need a full exception for a clerical error like this?

@sftim
Copy link

sftim commented Jan 12, 2025

I have a suggestion for the title of this KEP.

How about "Backwards compatibility for Kubernetes control-plane"? That's the feature we'd be adding.

@siyuanfoundation
Copy link
Contributor

siyuanfoundation commented Jan 22, 2025

@neolit123
Copy link
Member

neolit123 commented Feb 25, 2025

q: what would be the k8s project stance if the user wishes to emulate a version of k8s that is out of support? with this proposal going to beta (n-3) the scenario becomes possible.

for example, 1.35 is supported for 3 releases, but 1.36 can emulate 1.35 for two more releases before 1.36 going out of support, and finally 1.37 can emulate 1.35 for one additional release before 1.37 going out of support.

this results in the possibility of a user staying on 1.35 for 2 years / 6 releases, thinking they might be on a supported release.
the KEP does not outline as goals and non-goals any similar LTS topics, such as should an emulated version receive support.
at the same time i do see some mentions of this KEP in the wg-lts agenda document.

@aojea
Copy link
Member

aojea commented Feb 25, 2025

I don't think emulate means fully support that version or fully emulate that version, it means enable the codepaths that were running on that version in current version, but there is a lot of other code that changed completely, so support as a "fully supported version" is unrealistic

@pohly
Copy link
Contributor

pohly commented Feb 25, 2025

Should the user be made aware of being out of support when they ask to emulate a release that is too old?

@sftim
Copy link

sftim commented Feb 25, 2025

Should the user be made aware of being out of support when they ask to emulate a release that is too old?

Kubernetes code doesn't (currently) know support EOL dates.

@liggitt
Copy link
Member

liggitt commented Feb 25, 2025

Should the user be made aware of being out of support when they ask to emulate a release that is too old?

There are already internal mechanical bounds on the versions that can be requested to be emulated. We would not change the behavior of 1.32 emulating 1.31 when 1.31 reaches EOL... the 1.32 binaries would continue to function exactly the same way when asked to emulate 1.31

@sdodson
Copy link

sdodson commented Feb 25, 2025

Seems like support status should follow the binary version rather than the compatibility version, if not then then the flexibility of this feature will be pretty limited.

@liggitt
Copy link
Member

liggitt commented Feb 25, 2025

Seems like support status should follow the binary version rather than the compatibility version, if not then then the flexibility of this feature will be pretty limited.

Yes, that is how this works

@maxin93
Copy link

maxin93 commented Feb 27, 2025

Hello, I found a problem k3s-io/k3s#11853 related to this feature. In K3S, feature-gate of control components cannot be applied normally.

The possible cause is that the DefaultComponentGlobalsRegistry is a global object, the command of each control component are initialized in parallel, each component will invoke AddFlags(fs *pflag.FlagSet), but the command of apiserver is executed before kcm&scheduler. As a result, componentGlobalsRegistry.featureGatesConfig is overwritten. The reference of this object is used in the Set interface of ColonSeparatedMultimapStringString. Therefore, after the value of featuregate flag of apiserver is set, the other control components invoke func AddFlags(fs *pflag.FlagSet). As a result, the value of this object may be set to empty. As a result, the featuregate printed in the flag is empty.

Emm, can someone help?

@neolit123
Copy link
Member

these global vars do seem problematic for the 'binary smashing' that k3s is doing. if you have a pr in mind you could send it. i think there is also potential for other concurrency issues.

@maxin93
Copy link

maxin93 commented Feb 27, 2025

these global vars do seem problematic for the 'binary smashing' that k3s is doing. if you have a pr in mind you could send it. i think there is also potential for other concurrency issues.

I just have an immature idea, can we separate this global var by Component?

@neolit123
Copy link
Member

these global vars do seem problematic for the 'binary smashing' that k3s is doing. if you have a pr in mind you could send it. i think there is also potential for other concurrency issues.

I just have an immature idea, can we separate this global var by Component?

deferring to @siyuanfoundation

@siyuanfoundation
Copy link
Contributor

siyuanfoundation commented Feb 27, 2025

Yes, it is possible to separate this global var by Component. But that would require touching a lot of code because all places uses the globalDefaultFeatureGate when querying feature gate.

We are working on a fix of not resetting the feature values if AddFlags() is called multiple times. kubernetes/kubernetes#130079 (comment). That should fix the bug if there is not conflicting flags between different components. Is that sufficient? @maxin93

@maxin93
Copy link

maxin93 commented Mar 3, 2025

Sorry, let me describe my understanding and let me know if it is correct. The premise of this fix is that different components do not have the same flag. But different components will actually have the same flags (v, feature-gates, metric...), which still conflicts, right?

@siyuanfoundation
Copy link
Contributor

Sorry, let me describe my understanding and let me know if it is correct. The premise of this fix is that different components do not have the same flag. But different components will actually have the same flags (v, feature-gates, metric...), which still conflicts, right?

With the fix in kubernetes/kubernetes#130079, different components can have the same flag, as long they do not conflict with each other, for example setting featureA=true in one component and featureA=false in another.

@maxin93
Copy link

maxin93 commented Mar 4, 2025

I found a new issue that were not addressed in that fix: In the current K3S, the controller and scheduler are launched with goroutine almost at the same time. They will all parse their flag feature-gate into GlobalsRegistry.featureGatesConfig, the value of this global variable ComponentGlobalsRegistry.featureGatesConfig is overwritten by the value of the component that is executed later, and then the controller and scheduler start calling ComponentGlobalsRegistry.Set() and use the same feature gates. As a result, the featuregate of the component that is started first does not take effect.

@liggitt
Copy link
Member

liggitt commented Mar 4, 2025

In the current K3S, the controller and scheduler are launched with goroutine almost at the same time

The code inside kubernetes/kubernetes structures these commands as separate binaries, as we test and release. If k3s is reworking them into a single binary, working through the assumptions that breaks and keeping up with new assumptions that breaks is the responsibility of the k3s implementation (in this case, something simple like a synchronization point between those goroutine commands after parsing flags and after setting global feature gates might be sufficient for k3s)

We won't try to do things that cause issues in a rework like k3s has done, but we can't reasonably support or block on issues caused by out-of-tree significant restructurings of the implementation either.

@dims
Copy link
Member

dims commented Mar 4, 2025

@maxin93 in addition to what @liggitt said, do you mind working with the k3s folks directly. thanks!

@maxin93
Copy link

maxin93 commented Mar 5, 2025

Yes, I'm trying to work with k3s. Thanks for your suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team
Projects
Status: Deferred
Status: Tracked for Doc Freeze
Development

Successfully merging a pull request may close this issue.