Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 Add In-place updates proposal #11029

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

g-gaston
Copy link
Contributor

@g-gaston g-gaston commented Aug 7, 2024

What this PR does / why we need it:
Proposal doc for In-place updates written by the In-place updates feature group.

Starting this as a draft to collect early feedback on the main ideas and high level flow. APIs and some other lower level details are left purposefully as TODOs to focus the conversation on the rest of the doc, speed up consensus and avoid rework.

Fixes #9489

/area documentation

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/documentation Issues or PRs related to documentation cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 7, 2024
@g-gaston g-gaston force-pushed the in-place-updates-proposal branch from c77a225 to be97dc6 Compare August 7, 2024 16:55
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the write up.
i left some comments, but i did not go in detailed review on the controller interaction (diagrams) part.


Both `KCP` and `MachineDeployment` controllers follow a similar pattern around updates, they first detect if an update is required and then based on the configured strategy follow the appropiate update logic (note that today there is only one valid strategy, `RollingUpdate`).

With `ExternalUpdate` strategy, CAPI controllers will compute the set of desired changes and iterate over the registered external updaters, requesting through the Runtime Hook the set of changes each updater can handle. The changes supported by an updater can be the complete set of desired changes, a subset of them or an empty set, signaling it cannot handle any of the desired changes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're falling back to rolling update, to @neolit123's point, it doesn't make sense to me that ExternalUpdate is a rollout strategy on its own, but rather it should be a field, or set of fields within rolling update that control its behavior?

Note that technically, a rolling update it doesn't have to be a replace operation, but it can be done in place, so imo it can be expanded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting point. I'm not against representing external updates as a subtype of rolling update strategy. You are right that we what we are proposing here, CAPI is following a rolling update process except it delegates the machine update instead of replacing the machine by itself. But capi orchestrates the rolling process.

As long as we can represent the fallback as optional, I'm ok with this if folks think it makes more sense.


We propose a pluggable update strategy architecture that allows External Update Extension to handle the update process. The design decouples core CAPI controllers from the specific extension implementation responsible for updating a machine. The External Update Strategy will be configured reusing the existing field in KCP and MD resources, by introducing new type of strategy called `ExternalUpdate` (reusing the existing field in KCP and MD). This allows us to provide a consistent user experience: the interaction witht he CAPI resources is the same as in rolling updates.

This proposal introduces a Lifecycle Hook named `ExternalUpdate` for communication between CAPI and external update implementers. Multiple external updaters can be registered, each of them only covering a subset of machine changes. The CAPI controllers will ask the external updaters what kind of changes they can handle and, based on the reponse, compose and orchestrate them to achieve the desired state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal is a missing details in how the external updater logic would work, and how the "kind of changes they can handle" is handled. How is that going to work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think It'd be good for the proposal to include a reference external updater implementation and shape around one common/trivial driving use case. E.g perform an in-place rolling update of the kubernetes version for a pool of Nodes. Then we can grasp and discuss design implications for RBAC, drain...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enxebre In the 'test plan' section we mention a "CAPD Kubeadm Updater", which will be a reference implementation and also used for testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincepri

What do you mean with "how is that going to work?"? Are you referring to how the external updater knows what are the desired changes? Or how does the external updater compute what changes it can perform and what changes it can't?

Trying to give a generic answer here, the external updater will receive something like "current state" and "desired state" for a particular machine (including machine, infra machine and bootstrap) in the CanUpdateRequest. Then it will respond with something like an array of fields for those objects (kubeadmconfig -> ["spec.files", "spec.mounts", "spec.files"]), which would signal the subset of fields that it can update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enxebre
The idea of opening the draft at this stage for review is to get feedback on the core ideas and high level flow before we invest more time on this direction. Unless you think that a reference implementation is necessary to have these discussions, I would prefer to avoid that work.

That said, I totally get that it's possible that the lack of detail in certain areas is making difficult to have the high level discussion. If that's the case, we are happy to add that detail wherever needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to give a generic answer here, the external updater will receive something like "current state" and "desired state" for a particular machine (including machine, infra machine and bootstrap) in the CanUpdateRequest. Then it will respond with something like an array of fields for those objects (kubeadmconfig -> ["spec.files", "spec.mounts", "spec.files"]), which would signal the subset of fields that it can update.

These details must be part of the proposal, the details on how the entire flow from MachineDeployment, to the external request, back to the Machine, and reflecting status are not present, which makes it hard to understand how the technical flow will go and/or propose alternative solutions.

@g-gaston g-gaston force-pushed the in-place-updates-proposal branch from 5eb6664 to 472a336 Compare September 10, 2024 20:48
@t-lo
Copy link
Contributor

t-lo commented Sep 12, 2024

Hey folks 👋

@g-gaston Dropping by from the Flatcar Container Linux project - we're a container optimised Linux distro; we joined the CNCF a few weeks ago (incubating).

We've been driving implementation spikes of in-place OS and Kubernetes updates in ClusterAPI for some time - at the OS level. Your proposal looks great from our point of view.

While progress has been slower in the recent months due to project resource constraints, Flatcar has working proof-of-concept implementations for both in-place updating the OS and Kubernetes - independently. Our implementation is near production ready on the OS level, update activation can be coordinated via kured, and the worker cluster control plane picks up the correct versions. We do lack any signalling to the management cluster as well as more advanced features like coordinated roll-backs (though this would be easy to implement on the OS level).

In theory, our approach of in-place Kubernetes updates is distro agnostic (given the "mutable sysext" changes in recent versions of systemd starting with release 256).

We presented our work in a CAPZ office hours call earlier this year: https://youtu.be/Fpn-E9832UQ?feature=shared&t=164 (slide deck: https://drive.google.com/file/d/1MfBQcRvGHsb-xNU3g_MqvY4haNJl-WY2/view).

We hope our work can provide some insights that help to further flesh out this proposal. Happy to chat if folks are interested.

(CC: @tormath1 for visibility)

EDIT after initial feedback from @neolit123 : in-place updates of Kubernetes in CAPI are in "proof of concept" stage. Just using sysexts to ship Kubernetes (with and without CAPI) has been in production on (at least) Flatcar for quite some time. Several CAPI providers (OpenStack, Linode) use sysexts as preferred mechanism for Flatcar worker nodes.

@neolit123
Copy link
Member

neolit123 commented Sep 12, 2024

systemd-sysext

i don't think i've seen usage of sysext with k8s. it's provisioning of image extensions seems like something users can do, but they might as well stick to the vanilla way of using the k8s package registries and employing update scripts for e.g. containerd.

the kubeadm upgrade docs, just leverage the package manager upgrade way:
https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/

one concern that i think i have with systemd-sysext that you still have a intermediate build process for the extension, while the k8s package build process is already done by the k8s release folks.

@t-lo
Copy link
Contributor

t-lo commented Sep 12, 2024

On Flatcar, sysexts are the preferred way to run Kubernetes. "Packaging" is straightforward - create a filesystem from a subdirectory - and does not require any distro specific information. The resulting sysext can be used across many distros.

I'd argue that the overhead is negligible: download release binaries into a sub-directory and run mksquashfs. We might even evangelise sysext releases with k8s upstream if this is a continued concern.

Drawbacks of the packaging process are:

  • intermediate state: no atomic updates, recovery required if update process fails
  • distro specific: needs to be re-implemented for every distro
  • no easy roll-back: going back to a previous version (e.g. because a new release causes issues with user workloads) is complicated and risky (again, intermediate state)

Sysexts are already used by the ClusterAPI OpenStack and the Linode providers with Flatcar (though without in-place updates).

@neolit123
Copy link
Member

On Flatcar, sysexts are the preferred way to run Kubernetes. "Packaging" is straightforward - create a filesystem from a subdirectory - and does not require any distro specific information. The resulting sysext can be used across many distros.

the kubeadm and kubelet systemd drop-in files (in the official k8s packages) have some distro specific nuances like Debian vs RedHat paths. is sysexts capable of managing different drop-in files if the target distro is different, perhaps even detecting that automatically?

@t-lo
Copy link
Contributor

t-lo commented Sep 12, 2024

the kubeadm and kubelet systemd drop-in files (in the official k8s packages) have some distro specific nuances like Debian vs RedHat paths. is sysexts capable of managing different drop-in files if the target distro is different, perhaps even detecting that automatically?

Sysexts focus on shipping application bits (Kubernetes in the case at hand); configuration is usually supplied by separate means. That said, a complementary image-based configuration mechanism ("confext") exists for etc. Both approaches have their pros and cons, I'd say it depends on the specifics (I'm not very familiar with kubeadm on Debian vs. Red Hat, I'm more of an OS person :) ). But this should by no means be a blocker.

(Sorry for the sysext nerd sniping. I think we should stick to the topic of this PR - I merely wanted to raise that we have a working PoC of in-place Kubernetes updates. Happy to discuss Kubernetes sysexts elsewhere)

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 5, 2024
Co-authored-by: Alexandr Demicev <[email protected]>
Co-authored-by: Danil-Grigorev <[email protected]>
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@anmazzotti
Copy link

/lgtm

@k8s-ci-robot
Copy link
Contributor

@anmazzotti: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sbueringer
Copy link
Member

Would it make sense to move this PR out of draft status?

@sbueringer
Copy link
Member

I think this goes in the right direction

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work in-place WG team!
Mostly a few nits and cleanup from my side, If we keep iterating quickly on feedback I really think we can get this merged after KubeCon

@elmiko
Copy link
Contributor

elmiko commented Mar 26, 2025

discussing this today at the office hours, plan is to merge by lazy consensus 1 or 2 weeks after kubecon.

@g-gaston g-gaston marked this pull request as ready for review March 28, 2025 15:45
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 28, 2025
@k8s-ci-robot k8s-ci-robot requested a review from JoelSpeed March 28, 2025 15:45
@fabriziopandini
Copy link
Member

FYI we discussed the topic at KubeCon and all the people present confirmed that there are no blockers in merging in 1 or 2 weeks.

@g-gaston please close as much comments as possible

@g-gaston
Copy link
Contributor Author

g-gaston commented Apr 8, 2025

FYI we discussed the topic at KubeCon and all the people present confirmed that there are no blockers in merging in 1 or 2 weeks.

@g-gaston please close as much comments as possible

addressed all comments :)


- To provide rollbacks in case of an in-place update failure. Failed updates need to be fixed manually by the user on the machine or by replacing the machine.
- Introduce any changes to KCP (or any other control plane provider), MachineDeployment, MachineSet, Machine APIs.
- Maintain a coherent user experience for both rolling and in-place updates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this really be a non-goal now?

- To provide rollbacks in case of an in-place update failure. Failed updates need to be fixed manually by the user on the machine or by replacing the machine.
- Introduce any changes to KCP (or any other control plane provider), MachineDeployment, MachineSet, Machine APIs.
- Maintain a coherent user experience for both rolling and in-place updates.
- Allow in-place updates for single-node clusters without the requirement to reprovision hosts (future goal).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is still valid

What about the OnDelete strategy of MDs?
Probably we just shouldn't try in-place if OnDelete is configured? (so maybe it's a non-goal?)
#11029 (comment)

(just opened a new comment to make it easier to deal with, considering GitHub UI limitations...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Issues or PRs related to documentation cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Supporting an Inplace Update Rollout Strategy for upgrading Workload Clusters