Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4330: Change API availability to forward compatibility #5038

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

siyuanfoundation
Copy link
Contributor

@siyuanfoundation siyuanfoundation commented Jan 14, 2025

  • One-line PR description: change api availability to forward compatible to make it easier to handle API graduation in compatibility version.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 14, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: siyuanfoundation
Once this PR has been reviewed and has the lgtm label, please assign dims for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 14, 2025
@siyuanfoundation
Copy link
Contributor Author

siyuanfoundation commented Jan 14, 2025

@siyuanfoundation siyuanfoundation force-pushed the compat-versions-api branch 2 times, most recently from 6abb021 to fbbc5f7 Compare January 14, 2025 05:37
Instead, we are proposing to keep forward compatibility of all AIPs in compatibility version mode:
All APIs group-versions declarations would be modified
to track which Kubernetes version the API group-versions are removed at. APIs introduced between the emulation version and
the binary version are still available if the stability level is no less than the APIs introduced before or at the emulation version and enabled at the emulation version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this is referring to API enablement/disablement on a group-version level. However, API lifecycles are generally individual on the group-version-resource level. I think it would make more sense to discuss the individual beta->GA resources rather than the entire group version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.


Alpha APIs may not be enabled in conjunction with emulation version.

The forward compatibility of APIs should not affect data compatibility because storage version is still controlled by the `MinCompatibilityVersion` regardless of whether the data are created through future versions of the API endpoints. Webhooks should also work fine if the matching policy is `Equivalent`.
Copy link
Member

@Jefftree Jefftree Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a section on how MinCompatibilityVersion will change storage version since the policy has always been n-1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in #5006

to track which Kubernetes version the API group-versions are introduced (or
removed) at.
removed) at. But in practice, that would make code changes of API graduation intractable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intractable only for withholding API introductions. API removals don't need special handling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the whole section. Please review again see if it is more clear.

@@ -610,21 +623,25 @@ The storage version of each group-resource is the newest

### API availability

Similar to feature flags, all APIs group-versions declarations will be modified
Ideally, similar to feature flags, all APIs group-versions declarations should be modified
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this referring to the prerelease lifecycle files or a different mechanism? prerelease lifecycle operates on the types instead of the group version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is referring to the the prerelease lifecycle.

Comment on lines 628 to 629
removed) at: if an API is introduced after (or removed before) the emulation version, it should not be available at the emulation version. But in practice, that would make code changes of API graduation that wants to use newer APIs intractable.
For example, if a controller is switching from Beta to GA API at the binary version, a full emulation would mean the whole controller code needs to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that we're doing this specifically for controllers. Are there other use cases or should we simplify the wording here and just state up front that we're doing this to support controllers specifically?

the emulation version is set to. I.e. `--runtime-config=api/<version>` needs
to be able to turn on APIs exactly like it did for each Kubernetes version that
emulation version can be set to.
If an API has GAed and has not been removed at the emulated version, it would be enabled by default at the emulation version. If a newer stable version of the GA API has been introduced between the emulation version and the binary version, the new GA API would also be enabled at the emulation version along with the old GA API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comparison (maybe in a table), or how emulation works for normal APIs vs how it works for un-emulatable (non-emulatable?) features might make this information easier to digest.


GA APIs should match the exact set of APIs enabled in GA for the Kubernetes version
the emulation version is set to.
Instead, we are proposing to keep forward compatibility of all APIs in compatibility version mode:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're proposing that non-emulatable feature controllers to update to use the newest available API, ignoring emulation API enablement, right? Maybe say this more directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, mainly for controllers for now. May expand to kublet if we want to add compat version in node. Updated the section.

@siyuanfoundation siyuanfoundation force-pushed the compat-versions-api branch 3 times, most recently from f8bce11 to 961b0c9 Compare January 14, 2025 22:39
@siyuanfoundation siyuanfoundation changed the title KEP-4330: Change API availability to forward compatibility and add UnEmulatableFeatures KEP-4330: Change API availability to forward compatibility and add NonEmulatableFeatures Jan 15, 2025

With the `NonEmulatableFeatures` list, the server would fail to start if
1. the `EmulationVersion != BinaryVersion` and
1. any feature in the `NonEmulatableFeatures` list is disabled by default at the emulation version and explicitly enabled by the `--feature-gates` flag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What alternatives were considered here? It's not clear to me WHY we're deciding to refuse to start apiservers that have a non-emulatable beta on vs. allowing the beta to be turned on but not guaranteeing emulation for the feature.

Having the apiserver refuse to start is a potential footgun for anyone with automation that uses emulation version, and also offers the capability to turn on beta features/APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving more thoughts to it, I think the alternative is the better way to go. Updated the section with pros and cons.

@siyuanfoundation siyuanfoundation force-pushed the compat-versions-api branch 3 times, most recently from 69475d9 to 32e11d2 Compare January 16, 2025 19:42
@siyuanfoundation siyuanfoundation changed the title KEP-4330: Change API availability to forward compatibility and add NonEmulatableFeatures KEP-4330: Change API availability to forward compatibility Jan 16, 2025
@siyuanfoundation
Copy link
Contributor Author

/cc @deads2k @johnbelamaric

```
To fully emulate the controller for an older version, anywhere v1 api/type is referenced, it would need to switch to the v1beta version if the emulation version is older than the binary version. This would mean a lot of extra work, complicated testing rules, and high maintenance cost even for simple API graduations, while the emulation fidelity is unreliable with the extra complexity.

So instead of truly emulating the feature controllers and API availability at the emulation version, we are proposing to keep forward compatibility of all APIs in compatibility version mode and have the non-emulatable feature controllers to update to use the newest available API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say we add or remove field when we graduate from v1beta1 in 1.32 to v1 in 1.33, we may not be able to emulate in the 1.33 binary the behavior of the 1.32 controller ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add a v1 in 1.33, we still need to support clients using v1beta1. So we still should have the code and thus be able to emulate a "v1beta1 only" mode of operation.

One implication of this KEP is probably that we need to keep supporting older versions and behavior for longer than before, but I haven't looked into the details of that yet and could be wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if a 1.32 controller does thing/A. A 1.33 controller does thing/A and also does wrongThing/B. Does this meant that switching to an emulation version for 1.32 does not stop the 1.33 controller from doing wrongThing/B?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing is to support fields and other support the behaviors associated to those fields, what I'm trying to point out is that the controller in 1.33 has a different behavior than the one 1.32, that will not now to use this new field ... however, this is solved with this https://docs.google.com/document/d/1eLulBZ988bWNwrwnz9V70xVbQmHOYjrQpXRwiGmgcFM/edit?disco=AAABYEhCDY0 , if beta to GA does not have change on behavior

Copy link
Contributor Author

@siyuanfoundation siyuanfoundation Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a difference between supporting fields in APIs and supporting the behaviors. API field changes should always be backward compatible to have data compatibility in storage (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md).

For behavior changes in the code itself, unless we have versioned controllers, it is always best effort: we can either feature gate the new changes, or follow the examples in https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/4330-compatibility-versions#feature-gating-changes. Behavior changes generally are localized, and should be doable with a limited number of if ... else ... switches. If it is big changes that touch a lot of codes, should it really be done for a beta API?

So yes, it is possible if a 1.32 controller does thing/A. A 1.33 controller does thing/A and also does wrongThing/B. switching to an emulation version for 1.32 does not stop the 1.33 controller from doing wrongThing/B if the code to do wrongThing/B is not properly gated based on emulation version.

The proposed API availability changes is meant to only solve the API endpoint problem that is prevalent in API graduations.

Comment on lines 675 to 677
`v1: introduced=1.28, removed=1.32`<br>`v2beta1: introduced=1.31, removed=1.32`<br>`v2: introduced=1.32` | 1.30 | `api/v1`
`v1: introduced=1.28, removed=1.32`<br>`v2beta1: introduced=1.31, removed=1.32`<br>`v2: introduced=1.32` | 1.31 | `api/v1` always available, `api/v2beta1` and `api/v2` available only when `--runtime-config=api/v2beta1=true`
`v1: introduced=1.28, removed=1.32`<br>`v2beta1: introduced=1.31, removed=1.32`<br>`v2: introduced=1.32` | 1.33 | `api/v2`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't remove v1 apis, https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-parts-of-the-api

Rule #4a: API lifetime is determined by the API stability level
GA API versions may be marked as deprecated, but must not be removed within a major version of Kubernetes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the table

the emulation version is set to. I.e. `--runtime-config=api/<version>` needs
to be able to turn on APIs exactly like it did for each Kubernetes version that
emulation version can be set to.
For the use case of upgrading control plane binary version without changing the emulation version, this would mean api servers should be upgraded first before any other components, because an api server of the old binary version would not be able to serve a controller of the new binary version even though the emulation version is the same as the old binary version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not the case today? apiserver should be always be upgraded first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

today although apiserver should be upgraded first according to the doc, you can still upgrade all the control plane components in one node first - apiserver1, cm1, scheduler1, before moving on to the next node. That is not a problem today because apis support N-1, N, N+1. But with emulation version, this might not work anymore.


Alpha APIs may not be enabled in conjunction with emulation version.
#### Alternatives to API forward compatibility
To make API graduation workable for controller code change under compatibility version, we have considered and rejected the following alternative options:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also always assume that the logic in the controllers is the same between versions, but that may not be the case, if the controller and the API change between versions, or also when an existing GA API add a new field and the controller depends on that new field, you'll never being able to emulate the previous behavior ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to #5038 (comment)

For these cases, we should try to emulate these changing features at best effort, and make sure the clients are aware of the risks of enabling default-off Beta features with emulation version.

##### Alternatives
An alternative would be to keep a exception list of `NonEmulatableFeatures`, and make the server fail to start if any of the `NonEmulatableFeatures` are enabled in compatibility mode. This approach has the benefits of predictable outcomes. The downsides are:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already have this exception list, all the feature gates that are beta and disabled by default are NonEmulatableFeatures

Copy link
Contributor Author

@siyuanfoundation siyuanfoundation Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think not all the feature gates that are beta and disabled by default would break emulation necessarily, like a new field in an existing api, e.g. SELinuxMountReadWriteOncePod. But the users should be aware of the risks if they explicitly enable the disabled-by-default beta features

```
To fully emulate the controller for an older version, anywhere v1 api/type is referenced, it would need to switch to the v1beta version if the emulation version is older than the binary version. This would mean a lot of extra work, complicated testing rules, and high maintenance cost even for simple API graduations, while the emulation fidelity is unreliable with the extra complexity.

So instead of truly emulating the feature controllers and API availability at the emulation version, we are proposing to keep forward compatibility of all APIs in compatibility version mode and have the non-emulatable feature controllers to update to use the newest available API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if a 1.32 controller does thing/A. A 1.33 controller does thing/A and also does wrongThing/B. Does this meant that switching to an emulation version for 1.32 does not stop the 1.33 controller from doing wrongThing/B?

`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.32` | 1.30 | NA
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.32` | 1.31 | `api/v1beta1` and `api/v1` available only when `--runtime-config=api/v1beta1=true`
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.33` | 1.33 | `api/v1`
`v1beta1: introduced=1.31, removed=1.32`<br>`v1beta2: introduced=1.32` | 1.31 | `api/v1beta1` and `api/v1beta2` available only when `--runtime-config=api/v1beta1=true`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is v1beta2 on in this scenario since it wasn't even introduced in 1.31?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v1beta2 and v1beta1 have the same stability level, so the controller may have moved on to v1beta2 in the binary. By forward compatibility, we are exposing apis that are introduced later than the emulation version that are the same / more stable then the apis enabled at the emulation version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this feature described as Removed: 1.31 in the 1.30 release below?

The steps to remove the Beta feature would be:
| Release | Stage | Feature tracking information |
| ------- | ----- | ------------------------------------------------- |
| 1.26 | beta | Beta: 1.26 |
| 1.27 | beta | Beta: 1.26, Deprecated: 1.27 |
| 1.28 | beta | Beta: 1.26, Deprecated: 1.27 |
| 1.29 | beta | Beta: 1.26, Deprecated: 1.27 |
| 1.30 | - | Beta: 1.26, Deprecated: 1.27, Removed: 1.31 |
| 1.31 | - | Beta: 1.26, Deprecated: 1.27, Removed: 1.31 |
| 1.32 | - | Beta: 1.26, Deprecated: 1.27, Removed: 1.31 |
| 1.33 | - | **`if featureGate enabled { // implement feature }` code may be removed at this step** |

-----|-----|-----
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.32` | 1.30 | NA because we do not support emulating alpha apis.
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.32` | 1.31 | `api/v1beta1` and `api/v1` available only when `--runtime-config=api/v1beta1=true`
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.33` | 1.33 | `api/v1`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be a gap here: beta removed in 1.32, but v1 introduced in 1.33.

Suggested change
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.33` | 1.33 | `api/v1`
`v1alpha1: introduced=1.30, removed=1.31`<br>`v1beta1: introduced=1.31, removed=1.32`<br>`v1: introduced=1.32` | 1.33 | `api/v1`

Beta API graduated to GA|off-by-default|on|API available only when `--runtime-config=api/v1beta1=true`|API `api/v1` available
Beta API removed|off-by-default|-|API available only when `--runtime-config=api/v1beta1=true`|API `api/v1beta1` does not exist
on-by-default Beta API removed|on-by-default|-|API available unless `--runtime-config=api/v1beta1=false`|API `api/v1beta1` does not exist
Beta API graduated to GA|off-by-default|on|API `api/v1beta1` and `api/v1` available only when `--runtime-config=api/v1beta1=true`|API `api/v1` available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "graduated to GA" imply that the Beta API is removed (from N)?

Should --runtime-config=api/v1beta1=true have some effect when emulation-version=N?

on-by-default Beta API removed|on-by-default|-|API available unless `--runtime-config=api/v1beta1=false`|API `api/v1beta1` does not exist
Beta API graduated to GA|off-by-default|on|API `api/v1beta1` and `api/v1` available only when `--runtime-config=api/v1beta1=true`|API `api/v1` available
Beta API removed|off-by-default|-|API `api/v1beta1` available only when `--runtime-config=api/v1beta1=true`|API `api/v1beta1` does not exist
on-by-default Beta API removed|on-by-default|-|API `api/v1beta1` available unless `--runtime-config=api/v1beta1=false`|API `api/v1beta1` does not exist
API Storage version changed|v1beta1|v1|Resources stored as v1beta1|Resources stored as v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the storage version depend on the compatibility version? If so, should it be mentioned here?

What happens in this case for Story 2, administrator needs to roll back?

  1. 1pm: emulation=N-1, everything is stored v1beta1
  2. 2pm: emulation=N, so some items are stored as v1
  3. 3pm: emulation=N-1, so...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants