Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
3ef28d0
fix(chart): default kubelet.validSubnets from discovered primary subnet
lexfrei Apr 18, 2026
10b117e
fix(chart): default etcd.advertisedSubnets from discovered primary su…
lexfrei Apr 18, 2026
b86b402
fix(chart): require cluster endpoint in values.yaml
lexfrei Apr 18, 2026
b4546fc
chore(values): drop misleading placeholder subnet + clarify endpoint
lexfrei Apr 18, 2026
0ed96c0
test(engine): cover discovery-based subnet fallbacks and required end…
lexfrei Apr 18, 2026
ab2ac02
fix(values): blank default endpoint so required fires on fresh install
lexfrei Apr 18, 2026
3299cb5
test(engine): adapt existing tests to empty-endpoint default; add fre…
lexfrei Apr 18, 2026
28894d1
fix(chart): canonical subnet form + required guard on empty discovery
lexfrei Apr 18, 2026
9ab71c7
docs: couple endpoint+floatingIP in values.yaml and README getting-st…
lexfrei Apr 18, 2026
2ad6e20
fix(values): blank default floatingIP so VIP never ships a placeholder
lexfrei Apr 18, 2026
9472dfe
fix(chart): dedupe subnet fallback; reword required messages
lexfrei Apr 18, 2026
a6b3444
docs+test: align README example; add VIP + dedupe + floatingIP tests
lexfrei Apr 18, 2026
d7e602e
test: unit-test cidrNetwork + strengthen empty-discovery assertion
lexfrei Apr 18, 2026
42967b7
fix(chart): use fail instead of required for unconditional discovery …
lexfrei Apr 23, 2026
3a8c0a5
fix(chart): use fail instead of required for unconditional discovery …
lexfrei Apr 23, 2026
6295e02
test(engine): deep-copy chart values in render helpers
lexfrei Apr 23, 2026
d67d0d1
docs(readme): use RFC 5737 documentation IPs in getting-started walkt…
lexfrei Apr 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 25 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,16 +59,35 @@ cd newcluster
talm init -p cozystack -N myawesomecluster
```

Boot Talos Linux node, let's say it has address `1.2.3.4`
Edit `values.yaml` to set your cluster's control-plane endpoint. This
is the URL every node's kubelet and kube-proxy will dial. The chart
leaves it empty on purpose so a missed override fails loudly instead
of silently embedding a placeholder. For cozystack VIP setups set
`endpoint` and `floatingIP` together (same IP, single shared VIP);
for single-node clusters use that node's routable IP and leave
`floatingIP` blank; for multi-node with an external load balancer
use the LB URL and leave `floatingIP` blank. Subnet-selector fields
(`kubelet.validSubnets`, `etcd.advertisedSubnets`) are derived
automatically from the node's default-gateway-bearing link, so no
override is needed unless you have a multi-homed node that requires
a specific subnet pinned.

Boot Talos Linux node, let's say it has address `192.0.2.4`. Then:

```yaml
# values.yaml (single-node example matching the 192.0.2.4 node below)
endpoint: "https://192.0.2.4:6443"
floatingIP: ""
```

Gather node information:
```bash
talm -n 1.2.3.4 -e 1.2.3.4 template -t templates/controlplane.yaml -i > nodes/node1.yaml
talm -n 192.0.2.4 -e 192.0.2.4 template -t templates/controlplane.yaml -i > nodes/node1.yaml
```

Edit `nodes/node1.yaml` file:
```yaml
# talm: nodes=["1.2.3.4"], endpoints=["1.2.3.4"], templates=["templates/controlplane.yaml"]
# talm: nodes=["192.0.2.4"], endpoints=["192.0.2.4"], templates=["templates/controlplane.yaml"]
machine:
network:
# -- Discovered interfaces:
Expand All @@ -89,10 +108,10 @@ machine:
interfaces:
- interface: enx9c6b0047066c
addresses:
- 1.2.3.4/26
- 192.0.2.4/26
routes:
- network: 0.0.0.0/0
gateway: 1.2.3.1
gateway: 192.0.2.1
nameservers:
- 8.8.8.8
- 8.8.4.4
Expand All @@ -113,7 +132,7 @@ machine:
cluster:
clusterName: talm
controlPlane:
endpoint: https://192.168.0.1:6443
endpoint: https://192.0.2.4:6443
```

> **Note:** The output format depends on the Talos version configured in `Chart.yaml` (`templateOptions.talosVersion`) or via the `--talos-version` CLI flag.
Expand Down
38 changes: 37 additions & 1 deletion charts/cozystack/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,27 @@ machine:
kubelet:
nodeIP:
validSubnets:
{{- if .Values.advertisedSubnets }}
{{- toYaml .Values.advertisedSubnets | nindent 8 }}
Comment on lines +21 to 22
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

operator-override path skips canonicalization

When the operator sets advertisedSubnets: ["192.168.201.10/24"] (host form) in values.yaml, the chart emits it verbatim via toYaml, while the fallback path runs every entry through cidrNetwork so the YAML is in canonical network form. That inconsistency means two operators with the same underlying intent — one relying on discovery, one with an explicit override — get differently-formatted machine configs, which is noisy in kubectl diff/talm diff output and surprising in code review. Consider canonicalizing in both branches:

      validSubnets:
        {{- $sources := .Values.advertisedSubnets }}
        {{- if not $sources }}
        {{- $sources = fromJsonArray (include "talm.discovered.default_addresses_by_gateway" .) }}
        {{- if not $sources }}
        {{- fail "values.yaml: `advertisedSubnets` was left empty and talm could not derive a default from discovery. ..." }}
        {{- end }}
        {{- end }}
        {{- $subnets := list }}
        {{- range $sources }}
        {{- $subnets = append $subnets (. | cidrNetwork) }}
        {{- end }}
        {{- range uniq $subnets }}
        - {{ . }}
        {{- end }}

This also de-duplicates the validSubnets and etcd.advertisedSubnets blocks down to a shared talos.discovered.subnet_list define, which is a sympathetic refactor for the comment in the etcd block that already says "handled the same way as validSubnets above." If you'd rather keep the operator-override path passthrough (e.g., to preserve operator-supplied formatting verbatim), call this out in a comment so future readers don't try to "fix" the inconsistency.

{{- else }}
{{- /* Fall back to the subnet of the node's default-gateway-bearing
link. cidrNetwork masks host bits so the emitted YAML is the
canonical network form (192.168.201.0/24) rather than the
host form (192.168.201.10/24). Dedupe after masking because
a link with a secondary address in the same subnet would
otherwise produce duplicate list entries. */ -}}
{{- $addrs := fromJsonArray (include "talm.discovered.default_addresses_by_gateway" .) }}
{{- if not $addrs }}
{{- fail "values.yaml: `advertisedSubnets` was left empty and talm could not derive a default from discovery. No default-gateway-bearing link was found on the node. This field is a cluster-wide subnet selector fed to kubelet and etcd; `talm template` is invoked once per node and cannot merge per-node values into one cluster value. Either set advertisedSubnets explicitly in values.yaml, or ensure the node has a default route before running `talm template`." }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fail message field-name disambiguation

The error fires from inside the validSubnets: rendering block but the message references advertisedSubnets. The values.yaml key is advertisedSubnets, so the message is correct, but a user staring at the rendered (broken) output sees validSubnets: and may chase the wrong field. One word in the message would close the gap:

"values.yaml: `advertisedSubnets` (which feeds both kubelet validSubnets and etcd advertisedSubnets) was left empty and talm could not derive a default from discovery. ..."

Same in the generic copy.

{{- end }}
{{- $subnets := list }}
{{- range $addrs }}
{{- $subnets = append $subnets (. | cidrNetwork) }}
{{- end }}
{{- range uniq $subnets }}
- {{ . }}
{{- end }}
{{- end }}
extraConfig:
cpuManagerPolicy: static
maxPods: 512
Expand Down Expand Up @@ -85,7 +105,7 @@ cluster:
{{- toYaml .Values.serviceSubnets | nindent 6 }}
clusterName: "{{ .Chart.Name }}"
controlPlane:
endpoint: "{{ .Values.endpoint }}"
endpoint: {{ required "values.yaml: `endpoint` must be set to the cluster control-plane URL (e.g. https://<vip>:6443). This field is cluster-wide: every node's kubelet and kube-proxy dials it, so it cannot be auto-derived from the current node's IP -- `talm template` runs once per node and has no way to reconcile per-node IPs into a single shared endpoint. For multi-node setups use a VIP (cozystack floatingIP) or an external load balancer; for single-node clusters the node's routable IP works." .Values.endpoint | quote }}
{{- if eq .MachineType "controlplane" }}
allowSchedulingOnControlPlanes: true
controllerManager:
Expand Down Expand Up @@ -119,7 +139,23 @@ cluster:
enabled: false
etcd:
advertisedSubnets:
{{- if .Values.advertisedSubnets }}
{{- toYaml .Values.advertisedSubnets | nindent 6 }}
{{- else }}
{{- /* Fall back to the subnet of the node's default-gateway-bearing
link; cidrNetwork masks host bits to emit canonical network
form. Dedupe handled the same way as validSubnets above.
Empty discovery already errored via validSubnets' required()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stale "required()" wording in comment

The most recent commit (3a8c0a5) switched from required to fail for the empty-discovery path, but the comment in the etcd block still references the old guard name. Same wording lives in charts/generic/templates/_helpers.tpl:71.

      {{- /* Fall back to the subnet of the node's default-gateway-bearing
             link; cidrNetwork masks host bits to emit canonical network
             form. Dedupe handled the same way as validSubnets above.
             Empty discovery already errored via validSubnets' fail
             guard, so we reach this block only when at least one address
             was resolved. */ -}}

guard, so we reach this block only when at least one address
was resolved. */ -}}
{{- $subnets := list }}
{{- range fromJsonArray (include "talm.discovered.default_addresses_by_gateway" .) }}
{{- $subnets = append $subnets (. | cidrNetwork) }}
{{- end }}
{{- range uniq $subnets }}
- {{ . }}
{{- end }}
{{- end }}
Comment on lines 141 to +158
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etcd block's empty-discovery handling is implicit

The etcd advertisedSubnets block deliberately omits the empty-discovery fail because validSubnets renders first and errors out earlier. That's true today, but fragile: any future refactor that splits talos.config.machine.common and talos.config.cluster into different render passes (or skips the machine block for some path) silently emits an empty advertisedSubnets: list — exactly the silent-fail mode this PR is trying to eliminate.

A defensive duplicate of the same if not $addrs / fail check costs five lines and removes the cross-block dependency. Same applies to charts/generic/templates/_helpers.tpl:65-82.

      {{- else }}
      {{- $addrs := fromJsonArray (include "talm.discovered.default_addresses_by_gateway" .) }}
      {{- if not $addrs }}
      {{- fail "values.yaml: `advertisedSubnets` was left empty and talm could not derive a default from discovery. ..." }}
      {{- end }}
      {{- $subnets := list }}
      {{- range $addrs }}
      {{- $subnets = append $subnets (. | cidrNetwork) }}
      {{- end }}
      {{- range uniq $subnets }}
      - {{ . }}
      {{- end }}
      {{- end }}

{{- end }}
{{- end }}

Expand Down
48 changes: 44 additions & 4 deletions charts/cozystack/values.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,53 @@
endpoint: "https://192.168.100.10:6443"
# REQUIRED. Cluster control-plane endpoint. Left empty intentionally
# so the chart's `required` check fires loudly on a fresh install.
# A placeholder default would silently embed a broken endpoint in
# every rendered machine config. No auto-discovery is possible: this
# is a cluster-wide value that every node's kubelet and kube-proxy
# dials, not a per-node one.
#
# For cozystack VIP setups, set `endpoint` AND `floatingIP` below to
# the SAME IP. floatingIP drives the per-node VIP (Layer2VIPConfig)
# that endpoint points at; leaving them mismatched produces a cluster
# that talks to an IP no node actually claims.
#
# For single-node clusters without a VIP, set `endpoint` to that
# node's routable IP and leave `floatingIP` blank.
#
# For multi-node with an external load balancer, set `endpoint` to
# the LB URL and leave `floatingIP` blank.
#
# Example: endpoint: "https://192.168.0.1:6443"
endpoint: ""

clusterDomain: cozy.local
floatingIP: 192.168.100.10
# Layer-2 VIP for cozystack multi-node setups. When set, the chart
# emits a Layer2VIPConfig document pinning this IP as a floating
# address on the node's primary link. MUST equal the host portion
# of `endpoint` above, otherwise the cluster dials an IP that no
# node actually claims. Blank by default so the shipped value never
# silently embeds a wrong VIP -- fill in only if you want a VIP.
# Single-node clusters and external-LB topologies leave it blank.
# Example: floatingIP: 192.168.0.1
floatingIP: ""
Comment on lines +30 to +31
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coupling note in floatingIP comment is great, suggest explicit cross-link

The floatingIP comment says "MUST equal the host portion of endpoint above" but doesn't show the symmetric note in the endpoint comment (it does mention "set endpoint AND floatingIP below to the SAME IP"). The two comments together tell the right story, but a reader who hits one comment first without scrolling may miss the coupling. Consider a one-line cross-reference at the top of floatingIP:

# See `endpoint` above — these two values are coupled and MUST match
# for cozystack VIP setups.

image: "ghcr.io/cozystack/cozystack/talos:v1.12.6"
podSubnets:
- 10.244.0.0/16
serviceSubnets:
- 10.96.0.0/16
advertisedSubnets:
- 192.168.100.0/24

# Optional override for machine.kubelet.nodeIP.validSubnets and
# cluster.etcd.advertisedSubnets. When left empty the chart derives
# the value from the node's default-gateway-bearing link at render
# time (via talm.discovered.default_addresses_by_gateway), so the
# generated machine config matches the node's actual network without
# any values.yaml edit. Set this only when you want to pin a specific
# subnet — typically for multi-homed nodes where the default-gateway
# link is not the subnet you want kubelet/etcd to use.
# Example:
# advertisedSubnets:
# - "10.0.0.0/8"
advertisedSubnets: []

oidcIssuerUrl: ""
certSANs: []
nr_hugepages: 0
Expand Down
38 changes: 37 additions & 1 deletion charts/generic/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,27 @@ machine:
kubelet:
nodeIP:
validSubnets:
{{- if .Values.advertisedSubnets }}
{{- toYaml .Values.advertisedSubnets | nindent 8 }}
{{- else }}
{{- /* Fall back to the subnet of the node's default-gateway-bearing
link. cidrNetwork masks host bits so the emitted YAML is the
canonical network form (192.168.201.0/24) rather than the
host form (192.168.201.10/24). Dedupe after masking because
a link with a secondary address in the same subnet would
otherwise produce duplicate list entries. */ -}}
{{- $addrs := fromJsonArray (include "talm.discovered.default_addresses_by_gateway" .) }}
{{- if not $addrs }}
{{- fail "values.yaml: `advertisedSubnets` was left empty and talm could not derive a default from discovery. No default-gateway-bearing link was found on the node. This field is a cluster-wide subnet selector fed to kubelet and etcd; `talm template` is invoked once per node and cannot merge per-node values into one cluster value. Either set advertisedSubnets explicitly in values.yaml, or ensure the node has a default route before running `talm template`." }}
{{- end }}
{{- $subnets := list }}
{{- range $addrs }}
{{- $subnets = append $subnets (. | cidrNetwork) }}
{{- end }}
{{- range uniq $subnets }}
- {{ . }}
{{- end }}
{{- end }}
{{- with .Values.certSANs }}
certSANs:
{{- toYaml . | nindent 2 }}
Expand All @@ -33,7 +53,7 @@ cluster:
{{- toYaml .Values.serviceSubnets | nindent 6 }}
clusterName: "{{ .Chart.Name }}"
controlPlane:
endpoint: "{{ .Values.endpoint }}"
endpoint: {{ required "values.yaml: `endpoint` must be set to the cluster control-plane URL (e.g. https://<vip>:6443). This field is cluster-wide: every node's kubelet and kube-proxy dials it, so it cannot be auto-derived from the current node's IP -- `talm template` runs once per node and has no way to reconcile per-node IPs into a single shared endpoint. For multi-node setups use a VIP or an external load balancer; for single-node clusters the node's routable IP works." .Values.endpoint | quote }}
{{- if eq .MachineType "controlplane" }}
apiServer:
{{- with .Values.certSANs }}
Expand All @@ -42,7 +62,23 @@ cluster:
{{- end }}
etcd:
advertisedSubnets:
{{- if .Values.advertisedSubnets }}
{{- toYaml .Values.advertisedSubnets | nindent 6 }}
{{- else }}
{{- /* Fall back to the subnet of the node's default-gateway-bearing
link; cidrNetwork masks host bits to emit canonical network
form. Dedupe handled the same way as validSubnets above.
Empty discovery already errored via validSubnets' required()
guard, so we reach this block only when at least one address
was resolved. */ -}}
{{- $subnets := list }}
{{- range fromJsonArray (include "talm.discovered.default_addresses_by_gateway" .) }}
{{- $subnets = append $subnets (. | cidrNetwork) }}
{{- end }}
{{- range uniq $subnets }}
- {{ . }}
{{- end }}
{{- end }}
{{- end }}
{{- end }}

Expand Down
27 changes: 24 additions & 3 deletions charts/generic/values.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,29 @@
endpoint: "https://192.168.100.10:6443"
# REQUIRED. Cluster control-plane endpoint. Left empty intentionally
# so the chart's `required` check fires loudly on a fresh install.
# A placeholder default would silently embed a broken endpoint in
# every rendered machine config. No auto-discovery is possible: this
# is a cluster-wide value that every node's kubelet and kube-proxy
# dials, not a per-node one. For single-node clusters set it to that
# node's routable IP; for multi-node set it to a VIP or external LB.
# Example: endpoint: "https://192.168.0.1:6443"
endpoint: ""

podSubnets:
- 10.244.0.0/16
serviceSubnets:
- 10.96.0.0/16
advertisedSubnets:
- 192.168.100.0/24

# Optional override for machine.kubelet.nodeIP.validSubnets and
# cluster.etcd.advertisedSubnets. When left empty the chart derives
# the value from the node's default-gateway-bearing link at render
# time (via talm.discovered.default_addresses_by_gateway), so the
# generated machine config matches the node's actual network without
# any values.yaml edit. Set this only when you want to pin a specific
# subnet — typically for multi-homed nodes where the default-gateway
# link is not the subnet you want kubelet/etcd to use.
# Example:
# advertisedSubnets:
# - "10.0.0.0/8"
advertisedSubnets: []

certSANs: []
14 changes: 14 additions & 0 deletions pkg/engine/helm/engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ package engine
import (
"fmt"
"log"
"net/netip"
"path"
"path/filepath"
"regexp"
Expand Down Expand Up @@ -218,6 +219,19 @@ func (e Engine) initFunMap(t *template.Template) {
}
}

// cidrNetwork canonicalizes a CIDR string to its network form
// ("192.168.201.10/24" -> "192.168.201.0/24"), matching what
// operators see in Talos docs and upstream examples. Sprig ships
// no equivalent; net/netip's ParsePrefix + Masked handles both
// IPv4 and IPv6 without any host-bit arithmetic in the template.
funcMap["cidrNetwork"] = func(cidr string) (string, error) {
p, err := netip.ParsePrefix(cidr)
if err != nil {
return "", fmt.Errorf("cidrNetwork: %w", err)
}
Comment on lines +222 to +231
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cidrNetwork is fine; consider also exposing the bare network address

Not a request, just a flag for future iterations: callers may eventually want the network address without the prefix length (192.168.201.0 rather than 192.168.201.0/24) — for example when constructing CIDR list entries with explicit prefix logic. If a cidrNetworkAddress companion is ever needed, the netip.Prefix.Masked().Addr().String() form is one line. Out of scope for this PR.

return p.Masked().String(), nil
}

t.Funcs(funcMap)
}

Expand Down
53 changes: 53 additions & 0 deletions pkg/engine/helm/engine_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1218,3 +1218,56 @@ func TestTalosVersionConcurrentRender(t *testing.T) {
}
wg.Wait()
}

// TestCidrNetworkTemplateFunc exercises the cidrNetwork template
// function directly (bypassing chart rendering) so a future refactor
// that breaks parsing or masking — for either IPv4 or IPv6 inputs —
// is caught without needing to boot the whole helm engine.
func TestCidrNetworkTemplateFunc(t *testing.T) {
renderExpr := func(expr string) (string, error) {
chrt := &chart.Chart{
Metadata: &chart.Metadata{Name: "cidrtest"},
Templates: []*chart.File{{Name: "templates/out.yaml", Data: []byte(expr)}},
Values: map[string]any{},
}
var eng Engine
out, err := eng.Render(chrt, chartutil.Values{"Values": map[string]any{}})
if err != nil {
return "", err
}
return out["cidrtest/templates/out.yaml"], nil
}

tests := []struct {
name string
input string
want string
wantErr bool
}{
{"ipv4 host form", "192.168.201.10/24", "192.168.201.0/24", false},
{"ipv4 already canonical", "10.0.0.0/8", "10.0.0.0/8", false},
{"ipv4 narrow prefix", "192.168.201.10/31", "192.168.201.10/31", false},
{"ipv6 host form", "2001:db8::1/64", "2001:db8::/64", false},
{"ipv6 already canonical", "fd00::/8", "fd00::/8", false},
{"malformed missing prefix", "192.168.201.10", "", true},
{"malformed garbage", "not-a-cidr", "", true},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got, err := renderExpr(fmt.Sprintf(`{{ cidrNetwork %q }}`, tt.input))
if tt.wantErr {
if err == nil {
t.Errorf("expected error for input %q, got output %q", tt.input, got)
}
return
}
if err != nil {
t.Fatalf("unexpected error for %q: %v", tt.input, err)
}
if got != tt.want {
t.Errorf("cidrNetwork(%q) = %q, want %q", tt.input, got, tt.want)
}
})
}
}
Loading