Skip to content

design-proposal: split kubernetes package and add Talos backends#8

Draft
kvaps wants to merge 1 commit intocozystack:mainfrom
kvaps:proposal/kubernetes-nodes-split
Draft

design-proposal: split kubernetes package and add Talos backends#8
kvaps wants to merge 1 commit intocozystack:mainfrom
kvaps:proposal/kubernetes-nodes-split

Conversation

@kvaps
Copy link
Copy Markdown
Member

@kvaps kvaps commented May 4, 2026

Summary

Adds a design proposal for extracting worker node pools from the kubernetes application into a sibling kubernetes-nodes application, modelled on the existing vm-instance / vm-disk precedent.

The split decouples control-plane and node-pool lifecycles, and introduces a backend abstraction so a single tenant cluster can mix:

  • kubevirt-kubeadm — the existing flow, kept as default for backward compatibility.
  • kubevirt-talos — new, Talos workers on KubeVirt VMs via clastix/talos-csr-signer.
  • cloud-talos-hetzner and cloud-talos-azure — new, no Cluster API; uses native cloud-autoscaler with Talos machineconfig injected via cloud-init, mirroring the model already proven for the management cluster (/docs/v1.3/operations/multi-location/autoscaling/).

Includes:

  • Linkage by name (matching the vm-instance/vm-disk precedent)
  • Talos machineconfig template + user-overlay pattern (system layer hidden from the user)
  • Per-pool cluster-autoscaler running in the management cluster
  • Tenant-side node lifecycle controller (NLC) reused from cozystack/local-ccm for cloud-talos-* backends
  • Phased migration strategy with a long parallel period and an idempotent migration script
  • Explicit decision not to use Sidero Metal (officially deprecated upstream)

Looking for feedback on the open questions, especially long-term CAPI removal scope, per-pool vs cluster-wide Talos token model, and NLC reuse vs fork.

Test plan

This is a design proposal; no code yet. Implementation testing is scoped in the proposal:

  • Unit tests for chart rendering across each backend
  • Schema validation tests for kubernetes-nodes values.yaml
  • Migration script tests with idempotence and rollback
  • Integration tests with kind + stub KubeVirt
  • E2E with Hetzner and Azure cloud accounts
  • Failure-injection tests (talos-csr-signer restart, autoscaler crash, credentials Secret deletion)

Propose extracting node pools from the kubernetes application into a
sibling kubernetes-nodes application, modelled on the vm-instance/vm-disk
split. Add a backend abstraction that supports the existing
KubeVirt+kubeadm flow alongside new Talos backends: KubeVirt+Talos via
clastix/talos-csr-signer, and cloud-talos for Hetzner and Azure without
Cluster API.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request proposes a significant architectural change to split the monolithic kubernetes package into separate control-plane and node-pool components, enabling independent lifecycles and multi-backend support for KubeVirt and cloud-native Talos workers. The review feedback identifies a technical error regarding the gRPC protocol (TCP vs UDP), suggests consolidating the node-lifecycle-controller and Talos tokens at the cluster level for better efficiency and simpler configuration, and requests further implementation details for the dependency discovery mechanism in the proposed admission webhook.


Renders the same CAPI/CAPK objects as above, but with a `TalosConfigTemplate` (from `cluster-api-bootstrap-provider-talos`) replacing `KubeadmConfigTemplate`. Worker VMs boot from a Talos image. Bootstrap fetches the Talos machineconfig from CAPI and joins the cluster via standard Talos PKI.

The tenant's `KamajiControlPlane` carries an `additionalContainers` entry running `clastix/talos-csr-signer` listening on UDP/50001, exposed alongside `:6443` on the tenant API LoadBalancer. This is what allows `talosctl` to operate against worker nodes whose control-plane is Kamaji rather than Talos.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The proposal mentions clastix/talos-csr-signer listening on UDP/50001. Since this is a gRPC-based service (as noted in line 40), it should be TCP/50001 to match standard Talos trustd and gRPC requirements.


When `cluster-autoscaler` scales a `cloud-talos-*` pool down, it deletes the cloud VM. The tenant's apiserver still has a `Node` object that will linger until something deletes it. CAPI was previously the agent doing this; without CAPI, we need an equivalent.

The `node-lifecycle-controller` from `cozystack/local-ccm` is a good fit for this role. The `kubernetes-nodes` chart for `cloud-talos-*` backends renders an NLC Deployment that runs in the management cluster but uses a kubeconfig pointing to the **tenant** apiserver. It watches Node objects with the `ToBeDeletedByClusterAutoscaler:NoSchedule` taint and removes them after a configurable grace period and unreachability check.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Deploying a separate node-lifecycle-controller (NLC) for each kubernetes-nodes release leads to redundant processes watching the same tenant API server. A more efficient approach would be to manage a single NLC instance per tenant cluster (e.g., within the kubernetes control-plane chart) that handles node cleanup for all associated node pools.

## Failure and edge cases

- **`kubernetes-nodes` HelmRelease created before its parent `kubernetes` HelmRelease** → chart `fail`s the render with a clear error message identifying the missing parent. No partial CAPI/autoscaler resources created.
- **Parent `kubernetes` HelmRelease deleted while children exist** → all `kubernetes-nodes` HelmReleases for that cluster fail subsequent reconciles. An admission webhook on `kubernetes` HelmRelease delete blocks the operation if any `kubernetes-nodes` references it.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The proposed admission webhook for blocking kubernetes HelmRelease deletion requires a mechanism to discover name-based dependencies. Clarifying where this logic will reside and how it will perform discovery would strengthen the proposal, especially since the linkage doesn't use standard Kubernetes owner references.

**System layer (chart-managed, not exposed to user):**

- Cluster CA, machine CA, apiserver endpoint — read at template time via `lookup` from the tenant's `KamajiControlPlane`.
- Talos token — generated once per pool, stored alongside the machineconfig.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Generating a TALOS_TOKEN per pool while running a single talos-csr-signer sidecar per cluster (as discussed in Open Question #3 on line 253) introduces a configuration challenge for the signer, as it must be aware of all active tokens. Using a single TALOS_TOKEN per tenant cluster would simplify the sidecar configuration and the kubernetes-nodes chart logic while maintaining sufficient isolation between tenants.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e3d1ea8e-6065-4878-8607-c6240605734b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

kvaps added a commit to kvaps/cozystack-community that referenced this pull request May 5, 2026
Adjust the proposal to reflect that the controller will be developed as
an independent project under the kilo-io organization, per confirmed
interest from Kilo maintainer @squat. Generalize the CRD from a
tenant-specific TenantMeshLink to a tenant-agnostic ClusterMesh that
references peer clusters through a map of kubeconfig Secrets. Move all
tenant semantics into a dedicated Cozystack integration section that
also accounts for the kubernetes-nodes split (PR cozystack#8) so a single
ClusterMesh covers multi-location, multi-backend tenants.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant