-
Notifications
You must be signed in to change notification settings - Fork 24
fix(chart): discovery-based defaults for subnets; required endpoint and VIP #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
3ef28d0
10b117e
b86b402
b4546fc
0ed96c0
ab2ac02
3299cb5
28894d1
9ab71c7
2ad6e20
9472dfe
a6b3444
d7e602e
42967b7
3a8c0a5
6295e02
d67d0d1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,7 +18,27 @@ machine: | |
| kubelet: | ||
| nodeIP: | ||
| validSubnets: | ||
| {{- if .Values.advertisedSubnets }} | ||
| {{- toYaml .Values.advertisedSubnets | nindent 8 }} | ||
| {{- else }} | ||
| {{- /* Fall back to the subnet of the node's default-gateway-bearing | ||
| link. cidrNetwork masks host bits so the emitted YAML is the | ||
| canonical network form (192.168.201.0/24) rather than the | ||
| host form (192.168.201.10/24). Dedupe after masking because | ||
| a link with a secondary address in the same subnet would | ||
| otherwise produce duplicate list entries. */ -}} | ||
| {{- $addrs := fromJsonArray (include "talm.discovered.default_addresses_by_gateway" .) }} | ||
| {{- if not $addrs }} | ||
| {{- fail "values.yaml: `advertisedSubnets` was left empty and talm could not derive a default from discovery. No default-gateway-bearing link was found on the node. This field is a cluster-wide subnet selector fed to kubelet and etcd; `talm template` is invoked once per node and cannot merge per-node values into one cluster value. Either set advertisedSubnets explicitly in values.yaml, or ensure the node has a default route before running `talm template`." }} | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fail message field-name disambiguation The error fires from inside the Same in the generic copy. |
||
| {{- end }} | ||
| {{- $subnets := list }} | ||
| {{- range $addrs }} | ||
| {{- $subnets = append $subnets (. | cidrNetwork) }} | ||
| {{- end }} | ||
| {{- range uniq $subnets }} | ||
| - {{ . }} | ||
| {{- end }} | ||
| {{- end }} | ||
| extraConfig: | ||
| cpuManagerPolicy: static | ||
| maxPods: 512 | ||
|
|
@@ -85,7 +105,7 @@ cluster: | |
| {{- toYaml .Values.serviceSubnets | nindent 6 }} | ||
| clusterName: "{{ .Chart.Name }}" | ||
| controlPlane: | ||
| endpoint: "{{ .Values.endpoint }}" | ||
| endpoint: {{ required "values.yaml: `endpoint` must be set to the cluster control-plane URL (e.g. https://<vip>:6443). This field is cluster-wide: every node's kubelet and kube-proxy dials it, so it cannot be auto-derived from the current node's IP -- `talm template` runs once per node and has no way to reconcile per-node IPs into a single shared endpoint. For multi-node setups use a VIP (cozystack floatingIP) or an external load balancer; for single-node clusters the node's routable IP works." .Values.endpoint | quote }} | ||
| {{- if eq .MachineType "controlplane" }} | ||
| allowSchedulingOnControlPlanes: true | ||
| controllerManager: | ||
|
|
@@ -119,7 +139,23 @@ cluster: | |
| enabled: false | ||
| etcd: | ||
| advertisedSubnets: | ||
| {{- if .Values.advertisedSubnets }} | ||
| {{- toYaml .Values.advertisedSubnets | nindent 6 }} | ||
| {{- else }} | ||
| {{- /* Fall back to the subnet of the node's default-gateway-bearing | ||
| link; cidrNetwork masks host bits to emit canonical network | ||
| form. Dedupe handled the same way as validSubnets above. | ||
| Empty discovery already errored via validSubnets' required() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. stale "required()" wording in comment The most recent commit ( {{- /* Fall back to the subnet of the node's default-gateway-bearing
link; cidrNetwork masks host bits to emit canonical network
form. Dedupe handled the same way as validSubnets above.
Empty discovery already errored via validSubnets' fail
guard, so we reach this block only when at least one address
was resolved. */ -}} |
||
| guard, so we reach this block only when at least one address | ||
| was resolved. */ -}} | ||
| {{- $subnets := list }} | ||
| {{- range fromJsonArray (include "talm.discovered.default_addresses_by_gateway" .) }} | ||
| {{- $subnets = append $subnets (. | cidrNetwork) }} | ||
| {{- end }} | ||
| {{- range uniq $subnets }} | ||
| - {{ . }} | ||
| {{- end }} | ||
| {{- end }} | ||
|
Comment on lines
141
to
+158
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. etcd block's empty-discovery handling is implicit The etcd A defensive duplicate of the same {{- else }}
{{- $addrs := fromJsonArray (include "talm.discovered.default_addresses_by_gateway" .) }}
{{- if not $addrs }}
{{- fail "values.yaml: `advertisedSubnets` was left empty and talm could not derive a default from discovery. ..." }}
{{- end }}
{{- $subnets := list }}
{{- range $addrs }}
{{- $subnets = append $subnets (. | cidrNetwork) }}
{{- end }}
{{- range uniq $subnets }}
- {{ . }}
{{- end }}
{{- end }} |
||
| {{- end }} | ||
| {{- end }} | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,13 +1,53 @@ | ||
| endpoint: "https://192.168.100.10:6443" | ||
| # REQUIRED. Cluster control-plane endpoint. Left empty intentionally | ||
| # so the chart's `required` check fires loudly on a fresh install. | ||
| # A placeholder default would silently embed a broken endpoint in | ||
| # every rendered machine config. No auto-discovery is possible: this | ||
| # is a cluster-wide value that every node's kubelet and kube-proxy | ||
| # dials, not a per-node one. | ||
| # | ||
| # For cozystack VIP setups, set `endpoint` AND `floatingIP` below to | ||
| # the SAME IP. floatingIP drives the per-node VIP (Layer2VIPConfig) | ||
| # that endpoint points at; leaving them mismatched produces a cluster | ||
| # that talks to an IP no node actually claims. | ||
| # | ||
| # For single-node clusters without a VIP, set `endpoint` to that | ||
| # node's routable IP and leave `floatingIP` blank. | ||
| # | ||
| # For multi-node with an external load balancer, set `endpoint` to | ||
| # the LB URL and leave `floatingIP` blank. | ||
| # | ||
| # Example: endpoint: "https://192.168.0.1:6443" | ||
| endpoint: "" | ||
|
|
||
| clusterDomain: cozy.local | ||
| floatingIP: 192.168.100.10 | ||
| # Layer-2 VIP for cozystack multi-node setups. When set, the chart | ||
| # emits a Layer2VIPConfig document pinning this IP as a floating | ||
| # address on the node's primary link. MUST equal the host portion | ||
| # of `endpoint` above, otherwise the cluster dials an IP that no | ||
| # node actually claims. Blank by default so the shipped value never | ||
| # silently embeds a wrong VIP -- fill in only if you want a VIP. | ||
| # Single-node clusters and external-LB topologies leave it blank. | ||
| # Example: floatingIP: 192.168.0.1 | ||
| floatingIP: "" | ||
|
Comment on lines
+30
to
+31
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. coupling note in The # See `endpoint` above — these two values are coupled and MUST match
# for cozystack VIP setups. |
||
| image: "ghcr.io/cozystack/cozystack/talos:v1.12.6" | ||
| podSubnets: | ||
| - 10.244.0.0/16 | ||
| serviceSubnets: | ||
| - 10.96.0.0/16 | ||
| advertisedSubnets: | ||
| - 192.168.100.0/24 | ||
|
|
||
| # Optional override for machine.kubelet.nodeIP.validSubnets and | ||
| # cluster.etcd.advertisedSubnets. When left empty the chart derives | ||
| # the value from the node's default-gateway-bearing link at render | ||
| # time (via talm.discovered.default_addresses_by_gateway), so the | ||
| # generated machine config matches the node's actual network without | ||
| # any values.yaml edit. Set this only when you want to pin a specific | ||
| # subnet — typically for multi-homed nodes where the default-gateway | ||
| # link is not the subnet you want kubelet/etcd to use. | ||
| # Example: | ||
| # advertisedSubnets: | ||
| # - "10.0.0.0/8" | ||
| advertisedSubnets: [] | ||
|
|
||
| oidcIssuerUrl: "" | ||
| certSANs: [] | ||
| nr_hugepages: 0 | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,8 +1,29 @@ | ||
| endpoint: "https://192.168.100.10:6443" | ||
| # REQUIRED. Cluster control-plane endpoint. Left empty intentionally | ||
| # so the chart's `required` check fires loudly on a fresh install. | ||
| # A placeholder default would silently embed a broken endpoint in | ||
| # every rendered machine config. No auto-discovery is possible: this | ||
| # is a cluster-wide value that every node's kubelet and kube-proxy | ||
| # dials, not a per-node one. For single-node clusters set it to that | ||
| # node's routable IP; for multi-node set it to a VIP or external LB. | ||
| # Example: endpoint: "https://192.168.0.1:6443" | ||
| endpoint: "" | ||
|
|
||
| podSubnets: | ||
| - 10.244.0.0/16 | ||
| serviceSubnets: | ||
| - 10.96.0.0/16 | ||
| advertisedSubnets: | ||
| - 192.168.100.0/24 | ||
|
|
||
| # Optional override for machine.kubelet.nodeIP.validSubnets and | ||
| # cluster.etcd.advertisedSubnets. When left empty the chart derives | ||
| # the value from the node's default-gateway-bearing link at render | ||
| # time (via talm.discovered.default_addresses_by_gateway), so the | ||
| # generated machine config matches the node's actual network without | ||
| # any values.yaml edit. Set this only when you want to pin a specific | ||
| # subnet — typically for multi-homed nodes where the default-gateway | ||
| # link is not the subnet you want kubelet/etcd to use. | ||
| # Example: | ||
| # advertisedSubnets: | ||
| # - "10.0.0.0/8" | ||
| advertisedSubnets: [] | ||
|
|
||
| certSANs: [] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,6 +19,7 @@ package engine | |
| import ( | ||
| "fmt" | ||
| "log" | ||
| "net/netip" | ||
| "path" | ||
| "path/filepath" | ||
| "regexp" | ||
|
|
@@ -218,6 +219,19 @@ func (e Engine) initFunMap(t *template.Template) { | |
| } | ||
| } | ||
|
|
||
| // cidrNetwork canonicalizes a CIDR string to its network form | ||
| // ("192.168.201.10/24" -> "192.168.201.0/24"), matching what | ||
| // operators see in Talos docs and upstream examples. Sprig ships | ||
| // no equivalent; net/netip's ParsePrefix + Masked handles both | ||
| // IPv4 and IPv6 without any host-bit arithmetic in the template. | ||
| funcMap["cidrNetwork"] = func(cidr string) (string, error) { | ||
| p, err := netip.ParsePrefix(cidr) | ||
| if err != nil { | ||
| return "", fmt.Errorf("cidrNetwork: %w", err) | ||
| } | ||
|
Comment on lines
+222
to
+231
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not a request, just a flag for future iterations: callers may eventually want the network address without the prefix length ( |
||
| return p.Masked().String(), nil | ||
| } | ||
|
|
||
| t.Funcs(funcMap) | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
operator-override path skips canonicalization
When the operator sets
advertisedSubnets: ["192.168.201.10/24"](host form) in values.yaml, the chart emits it verbatim viatoYaml, while the fallback path runs every entry throughcidrNetworkso the YAML is in canonical network form. That inconsistency means two operators with the same underlying intent — one relying on discovery, one with an explicit override — get differently-formatted machine configs, which is noisy inkubectl diff/talm diffoutput and surprising in code review. Consider canonicalizing in both branches:validSubnets: {{- $sources := .Values.advertisedSubnets }} {{- if not $sources }} {{- $sources = fromJsonArray (include "talm.discovered.default_addresses_by_gateway" .) }} {{- if not $sources }} {{- fail "values.yaml: `advertisedSubnets` was left empty and talm could not derive a default from discovery. ..." }} {{- end }} {{- end }} {{- $subnets := list }} {{- range $sources }} {{- $subnets = append $subnets (. | cidrNetwork) }} {{- end }} {{- range uniq $subnets }} - {{ . }} {{- end }}This also de-duplicates the validSubnets and etcd.advertisedSubnets blocks down to a shared
talos.discovered.subnet_listdefine, which is a sympathetic refactor for the comment in the etcd block that already says "handled the same way as validSubnets above." If you'd rather keep the operator-override path passthrough (e.g., to preserve operator-supplied formatting verbatim), call this out in a comment so future readers don't try to "fix" the inconsistency.