-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4330: Change API availability to forward compatibility #5038
base: master
Are you sure you want to change the base?
KEP-4330: Change API availability to forward compatibility #5038
Conversation
siyuanfoundation
commented
Jan 14, 2025
•
edited
Loading
edited
- One-line PR description: change api availability to forward compatible to make it easier to handle API graduation in compatibility version.
- Issue link: Introduce compatibility version for Kubernetes control-plane upgrades #4330
- Other comments: more discussions can be found in [Compatibility Version] Guidance on API version promotion kubernetes#127791
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: siyuanfoundation The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
6abb021
to
fbbc5f7
Compare
Instead, we are proposing to keep forward compatibility of all AIPs in compatibility version mode: | ||
All APIs group-versions declarations would be modified | ||
to track which Kubernetes version the API group-versions are removed at. APIs introduced between the emulation version and | ||
the binary version are still available if the stability level is no less than the APIs introduced before or at the emulation version and enabled at the emulation version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this is referring to API enablement/disablement on a group-version level. However, API lifecycles are generally individual on the group-version-resource level. I think it would make more sense to discuss the individual beta->GA resources rather than the entire group version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
|
||
Alpha APIs may not be enabled in conjunction with emulation version. | ||
|
||
The forward compatibility of APIs should not affect data compatibility because storage version is still controlled by the `MinCompatibilityVersion` regardless of whether the data are created through future versions of the API endpoints. Webhooks should also work fine if the matching policy is `Equivalent`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a section on how MinCompatibilityVersion
will change storage version since the policy has always been n-1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in #5006
to track which Kubernetes version the API group-versions are introduced (or | ||
removed) at. | ||
removed) at. But in practice, that would make code changes of API graduation intractable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intractable only for withholding API introductions. API removals don't need special handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the whole section. Please review again see if it is more clear.
@@ -610,21 +623,25 @@ The storage version of each group-resource is the newest | |||
|
|||
### API availability | |||
|
|||
Similar to feature flags, all APIs group-versions declarations will be modified | |||
Ideally, similar to feature flags, all APIs group-versions declarations should be modified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this referring to the prerelease lifecycle files or a different mechanism? prerelease lifecycle operates on the types instead of the group version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is referring to the the prerelease lifecycle.
fbbc5f7
to
ebd2f55
Compare
removed) at: if an API is introduced after (or removed before) the emulation version, it should not be available at the emulation version. But in practice, that would make code changes of API graduation that wants to use newer APIs intractable. | ||
For example, if a controller is switching from Beta to GA API at the binary version, a full emulation would mean the whole controller code needs to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that we're doing this specifically for controllers. Are there other use cases or should we simplify the wording here and just state up front that we're doing this to support controllers specifically?
the emulation version is set to. I.e. `--runtime-config=api/<version>` needs | ||
to be able to turn on APIs exactly like it did for each Kubernetes version that | ||
emulation version can be set to. | ||
If an API has GAed and has not been removed at the emulated version, it would be enabled by default at the emulation version. If a newer stable version of the GA API has been introduced between the emulation version and the binary version, the new GA API would also be enabled at the emulation version along with the old GA API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comparison (maybe in a table), or how emulation works for normal APIs vs how it works for un-emulatable (non-emulatable?) features might make this information easier to digest.
|
||
GA APIs should match the exact set of APIs enabled in GA for the Kubernetes version | ||
the emulation version is set to. | ||
Instead, we are proposing to keep forward compatibility of all APIs in compatibility version mode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're proposing that non-emulatable feature controllers to update to use the newest available API, ignoring emulation API enablement, right? Maybe say this more directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, mainly for controllers for now. May expand to kublet if we want to add compat version in node. Updated the section.
f8bce11
to
961b0c9
Compare
961b0c9
to
9d67760
Compare
|
||
With the `NonEmulatableFeatures` list, the server would fail to start if | ||
1. the `EmulationVersion != BinaryVersion` and | ||
1. any feature in the `NonEmulatableFeatures` list is disabled by default at the emulation version and explicitly enabled by the `--feature-gates` flag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What alternatives were considered here? It's not clear to me WHY we're deciding to refuse to start apiservers that have a non-emulatable beta on vs. allowing the beta to be turned on but not guaranteeing emulation for the feature.
Having the apiserver refuse to start is a potential footgun for anyone with automation that uses emulation version, and also offers the capability to turn on beta features/APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giving more thoughts to it, I think the alternative is the better way to go. Updated the section with pros and cons.
69475d9
to
32e11d2
Compare
32e11d2
to
0cdaece
Compare
0cdaece
to
a101848
Compare
``` | ||
To fully emulate the controller for an older version, anywhere v1 api/type is referenced, it would need to switch to the v1beta version if the emulation version is older than the binary version. This would mean a lot of extra work, complicated testing rules, and high maintenance cost even for simple API graduations, while the emulation fidelity is unreliable with the extra complexity. | ||
|
||
So instead of truly emulating the feature controllers and API availability at the emulation version, we are proposing to keep forward compatibility of all APIs in compatibility version mode and have the non-emulatable feature controllers to update to use the newest available API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say we add or remove field when we graduate from v1beta1 in 1.32 to v1 in 1.33, we may not be able to emulate in the 1.33 binary the behavior of the 1.32 controller ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add a v1 in 1.33, we still need to support clients using v1beta1. So we still should have the code and thus be able to emulate a "v1beta1 only" mode of operation.
One implication of this KEP is probably that we need to keep supporting older versions and behavior for longer than before, but I haven't looked into the details of that yet and could be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if a 1.32 controller does thing/A. A 1.33 controller does thing/A and also does wrongThing/B. Does this meant that switching to an emulation version for 1.32 does not stop the 1.33 controller from doing wrongThing/B?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing is to support fields and other support the behaviors associated to those fields, what I'm trying to point out is that the controller in 1.33 has a different behavior than the one 1.32, that will not now to use this new field ... however, this is solved with this https://docs.google.com/document/d/1eLulBZ988bWNwrwnz9V70xVbQmHOYjrQpXRwiGmgcFM/edit?disco=AAABYEhCDY0 , if beta to GA does not have change on behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a difference between supporting fields in APIs and supporting the behaviors. API field changes should always be backward compatible to have data compatibility in storage (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md).
For behavior changes in the code itself, unless we have versioned controllers, it is always best effort: we can either feature gate the new changes, or follow the examples in https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/4330-compatibility-versions#feature-gating-changes. Behavior changes generally are localized, and should be doable with a limited number of if ... else ...
switches. If it is big changes that touch a lot of codes, should it really be done for a beta API?
So yes, it is possible if a 1.32 controller does thing/A. A 1.33 controller does thing/A and also does wrongThing/B. switching to an emulation version for 1.32 does not stop the 1.33 controller from doing wrongThing/B if the code to do wrongThing/B is not properly gated based on emulation version.
The proposed API availability changes is meant to only solve the API endpoint problem that is prevalent in API graduations.
`v1: introduced=1.28, removed=1.32`<br>`v2beta1: introduced=1.31, removed=1.32`<br>`v2: introduced=1.32` | 1.30 | `api/v1` | ||
`v1: introduced=1.28, removed=1.32`<br>`v2beta1: introduced=1.31, removed=1.32`<br>`v2: introduced=1.32` | 1.31 | `api/v1` always available, `api/v2beta1` and `api/v2` available only when `--runtime-config=api/v2beta1=true` | ||
`v1: introduced=1.28, removed=1.32`<br>`v2beta1: introduced=1.31, removed=1.32`<br>`v2: introduced=1.32` | 1.33 | `api/v2` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't remove v1 apis, https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-parts-of-the-api
Rule #4a: API lifetime is determined by the API stability level
GA API versions may be marked as deprecated, but must not be removed within a major version of Kubernetes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the table
the emulation version is set to. I.e. `--runtime-config=api/<version>` needs | ||
to be able to turn on APIs exactly like it did for each Kubernetes version that | ||
emulation version can be set to. | ||
For the use case of upgrading control plane binary version without changing the emulation version, this would mean api servers should be upgraded first before any other components, because an api server of the old binary version would not be able to serve a controller of the new binary version even though the emulation version is the same as the old binary version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is not the case today? apiserver should be always be upgraded first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
today although apiserver should be upgraded first according to the doc, you can still upgrade all the control plane components in one node first - apiserver1, cm1, scheduler1, before moving on to the next node. That is not a problem today because apis support N-1, N, N+1. But with emulation version, this might not work anymore.
|
||
Alpha APIs may not be enabled in conjunction with emulation version. | ||
#### Alternatives to API forward compatibility | ||
To make API graduation workable for controller code change under compatibility version, we have considered and rejected the following alternative options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also always assume that the logic in the controllers is the same between versions, but that may not be the case, if the controller and the API change between versions, or also when an existing GA API add a new field and the controller depends on that new field, you'll never being able to emulate the previous behavior ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to #5038 (comment)
For these cases, we should try to emulate these changing features at best effort, and make sure the clients are aware of the risks of enabling default-off Beta features with emulation version. | ||
|
||
##### Alternatives | ||
An alternative would be to keep a exception list of `NonEmulatableFeatures`, and make the server fail to start if any of the `NonEmulatableFeatures` are enabled in compatibility mode. This approach has the benefits of predictable outcomes. The downsides are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already have this exception list, all the feature gates that are beta and disabled by default are NonEmulatableFeatures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think not all the feature gates that are beta and disabled by default would break emulation necessarily, like a new field in an existing api, e.g. SELinuxMountReadWriteOncePod
. But the users should be aware of the risks if they explicitly enable the disabled-by-default beta features
``` | ||
To fully emulate the controller for an older version, anywhere v1 api/type is referenced, it would need to switch to the v1beta version if the emulation version is older than the binary version. This would mean a lot of extra work, complicated testing rules, and high maintenance cost even for simple API graduations, while the emulation fidelity is unreliable with the extra complexity. | ||
|
||
So instead of truly emulating the feature controllers and API availability at the emulation version, we are proposing to keep forward compatibility of all APIs in compatibility version mode and have the non-emulatable feature controllers to update to use the newest available API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if a 1.32 controller does thing/A. A 1.33 controller does thing/A and also does wrongThing/B. Does this meant that switching to an emulation version for 1.32 does not stop the 1.33 controller from doing wrongThing/B?
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.32` | 1.30 | NA | ||
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.32` | 1.31 | `api/v1beta1` and `api/v1` available only when `--runtime-config=api/v1beta1=true` | ||
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.33` | 1.33 | `api/v1` | ||
`v1beta1: introduced=1.31, removed=1.32`<br>`v1beta2: introduced=1.32` | 1.31 | `api/v1beta1` and `api/v1beta2` available only when `--runtime-config=api/v1beta1=true` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is v1beta2 on in this scenario since it wasn't even introduced in 1.31?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v1beta2 and v1beta1 have the same stability level, so the controller may have moved on to v1beta2 in the binary. By forward compatibility, we are exposing apis that are introduced later than the emulation version that are the same / more stable then the apis enabled at the emulation version.
Signed-off-by: Siyuan Zhang <[email protected]>
a101848
to
32a06b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this feature described as Removed: 1.31
in the 1.30 release below?
enhancements/keps/sig-architecture/4330-compatibility-versions/README.md
Lines 438 to 449 in 41a49ff
The steps to remove the Beta feature would be: | |
| Release | Stage | Feature tracking information | | |
| ------- | ----- | ------------------------------------------------- | | |
| 1.26 | beta | Beta: 1.26 | | |
| 1.27 | beta | Beta: 1.26, Deprecated: 1.27 | | |
| 1.28 | beta | Beta: 1.26, Deprecated: 1.27 | | |
| 1.29 | beta | Beta: 1.26, Deprecated: 1.27 | | |
| 1.30 | - | Beta: 1.26, Deprecated: 1.27, Removed: 1.31 | | |
| 1.31 | - | Beta: 1.26, Deprecated: 1.27, Removed: 1.31 | | |
| 1.32 | - | Beta: 1.26, Deprecated: 1.27, Removed: 1.31 | | |
| 1.33 | - | **`if featureGate enabled { // implement feature }` code may be removed at this step** | |
-----|-----|----- | ||
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.32` | 1.30 | NA because we do not support emulating alpha apis. | ||
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.32` | 1.31 | `api/v1beta1` and `api/v1` available only when `--runtime-config=api/v1beta1=true` | ||
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.33` | 1.33 | `api/v1` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a gap here: beta removed in 1.32, but v1 introduced in 1.33.
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.33` | 1.33 | `api/v1` | |
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.32` | 1.33 | `api/v1` |
Beta API graduated to GA|off-by-default|on|API available only when `--runtime-config=api/v1beta1=true`|API `api/v1` available | ||
Beta API removed|off-by-default|-|API available only when `--runtime-config=api/v1beta1=true`|API `api/v1beta1` does not exist | ||
on-by-default Beta API removed|on-by-default|-|API available unless `--runtime-config=api/v1beta1=false`|API `api/v1beta1` does not exist | ||
Beta API graduated to GA|off-by-default|on|API `api/v1beta1` and `api/v1` available only when `--runtime-config=api/v1beta1=true`|API `api/v1` available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "graduated to GA" imply that the Beta API is removed (from N)?
Should --runtime-config=api/v1beta1=true
have some effect when emulation-version=N?
on-by-default Beta API removed|on-by-default|-|API available unless `--runtime-config=api/v1beta1=false`|API `api/v1beta1` does not exist | ||
Beta API graduated to GA|off-by-default|on|API `api/v1beta1` and `api/v1` available only when `--runtime-config=api/v1beta1=true`|API `api/v1` available | ||
Beta API removed|off-by-default|-|API `api/v1beta1` available only when `--runtime-config=api/v1beta1=true`|API `api/v1beta1` does not exist | ||
on-by-default Beta API removed|on-by-default|-|API `api/v1beta1` available unless `--runtime-config=api/v1beta1=false`|API `api/v1beta1` does not exist | ||
API Storage version changed|v1beta1|v1|Resources stored as v1beta1|Resources stored as v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the storage version depend on the compatibility version? If so, should it be mentioned here?
What happens in this case for Story 2, administrator needs to roll back?
- 1pm: emulation=N-1, everything is stored v1beta1
- 2pm: emulation=N, so some items are stored as v1
- 3pm: emulation=N-1, so...?