fix(chart): discovery-based defaults for subnets; required endpoint and VIP#130
fix(chart): discovery-based defaults for subnets; required endpoint and VIP#130
Conversation
When .Values.advertisedSubnets is unset in values.yaml, fall back to the CIDR of the node's default-gateway-bearing link (as returned by talm.discovered.default_addresses_by_gateway). Previously the chart emitted whatever was in values.yaml verbatim, so a stale placeholder like 192.168.100.0/24 silently landed in machine configs for nodes on completely different networks. Talos' validSubnets matching uses net.ParseCIDR + IPNet.Contains, so emitting the node's IP-with-prefix (e.g. 192.168.201.10/24) is semantically equivalent to the network form (192.168.201.0/24) — no network-alignment math needed. Applied symmetrically to both cozystack and generic presets. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…bnet Same shape as the kubelet.validSubnets fallback: when .Values.advertisedSubnets is unset, pick the CIDR of the node's default-gateway-bearing link via talm.discovered.default_addresses_by_gateway instead of silently emitting the stale 192.168.100.0/24 placeholder baked into values.yaml. etcd matches advertisedSubnets against local addresses using the same CIDR semantics as kubelet, so the discovered IP/prefix form is safe to emit directly. Applied symmetrically to both cozystack and generic presets. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Replace the bare "{{ .Values.endpoint }}" interpolation with
Helm's required() so an unset or empty endpoint produces a clear
error at render time instead of silently embedding a stale
192.168.100.10:6443 placeholder in machine configs.
Unlike kubelet validSubnets and etcd advertisedSubnets, endpoint
has no per-node fallback: it is the cluster-wide URL every node
dials for the API server, so it must come from operator input
(VIP, LB, single-node IP) — there is nothing to discover locally.
Applied symmetrically to both cozystack and generic presets.
Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Two related changes to the chart's default values file: - advertisedSubnets: was pre-populated with 192.168.100.0/24. Every user not running on that subnet got a stale value silently baked into their machine config. Now defaults to an empty list so the chart's new fallback path (default_addresses_by_gateway) kicks in. The in-file comment explains when to override. - endpoint: the placeholder https://192.168.100.10:6443 is kept as an example but the comment is rewritten to mark the field REQUIRED and explain why no auto-discovery is possible (cluster-wide value, not per-node). The chart now uses required() so an unset or empty value fails loudly rather than silently embedding this placeholder. Applied symmetrically to cozystack and generic presets. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…point Five new tests pin the behavior introduced by the chart changes in this branch: - TestMultiDocCozystack_ValidSubnetsFallsBackToDiscovery — empty advertisedSubnets in values, lookup yields 192.168.201.10/24 on the gateway-bearing link; rendered kubelet.validSubnets must contain the discovered CIDR and must NOT contain the historical 192.168.100.0/24 placeholder. - TestMultiDocCozystack_AdvertisedSubnetsFallsBackToDiscovery — same setup, assertion on etcd.advertisedSubnets. - TestMultiDocCozystack_ValuesAdvertisedSubnetsOverridesDiscovery — explicit [10.0.0.0/8] in values; rendered config must use the operator's value in both subnet-selector fields, discovered CIDR must not leak into either. Counts occurrences of the override (expect ≥2 — once per consumer) and rejects the discovered CIDR in a subnet-list context (excluding LinkConfig address lines). - TestMultiDocCozystack_EndpointRequired — empty endpoint in values; Render must fail with Helm's required() error mentioning endpoint, instead of silently embedding the stale placeholder. - TestMultiDocGeneric_ValidSubnetsFallsBackToDiscovery — equivalent smoke check for the generic preset. Adds simpleNicLookup() — a minimal single-physical-interface lookup fixture with a deliberately non-100.* subnet so the tests can distinguish 'discovered' from 'historical default' in the output. Adds renderCozystackWith / renderGenericWith — thin wrappers over the existing chart-render pattern used by TestMultiDoc* suites, keeping the new tests readable. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Previously 'endpoint' kept its https://192.168.100.10:6443 placeholder even after the chore(values) commit dropped advertisedSubnets — meaning Helm's required() guard never fired for a fresh user who kept defaults; the chart silently embedded the wrong endpoint. Set endpoint to the empty string so required() surfaces the missing value loudly. Comment block above the key explains the motivation and points at an example for operators to copy. Same change applied symmetrically to cozystack and generic presets. Also adds a commented format-example under advertisedSubnets so operators overriding discovery have a syntax reference. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…sh-default + worker coverage Two mechanical changes + two new tests motivated by the fix(values) commit that blanks the shipped endpoint: 1. Existing TestMultiDoc*, TestLegacyCozystack_NrHugepages and the renderChartTemplate helper previously relied on chrt.Values carrying a non-empty endpoint default. With the default blank they all hit the required() guard and fail. Inject a testEndpoint constant (https://talm-test.invalid:6443) in tests that are not specifically exercising the guard, so the rest of the chart is still reachable. 2. renderCozystackWith / renderGenericWith helpers gain an auto-inject so all new tests going through them don't have to spell out the endpoint. Two new tests: - TestMultiDocCozystack_ShippedDefaultsFailFresh — renders with chrt.Values exactly as shipped, expects a required() error. This test would fail if a future commit reintroduced a placeholder endpoint into values.yaml. Pins the fix(values) commit's intent at the test level. - TestMultiDocCozystack_WorkerValidSubnetsFallsBackToDiscovery — kubelet.validSubnets lives in the shared talos.config.machine.common block, so it is emitted for workers too. This test guards against a regression that would only break the worker path. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Two correctness fixes to the discovery fallback: 1. Emit the network form of the CIDR (192.168.201.0/24), not the host form (192.168.201.10/24). Talos matches both equivalently via netip.Prefix.Contains, but the ticket reporter explicitly asked for the canonical network form, and every Talos doc and third-party example uses it. Host-form output in a subnet-selector field is a trap for anyone comparing rendered configs against upstream references. Adds a small cidrNetwork template function to pkg/engine/helm that wraps net/netip.ParsePrefix+Masked and handles IPv4/IPv6 uniformly. Sprig ships no equivalent. The chart helpers pipe the discovered address through it in the fallback branches. 2. When the operator leaves advertisedSubnets empty AND discovery yields no default-gateway-bearing link (offline render, pre-boot node, no default route), the previous fallback produced an empty validSubnets list silently. A silent empty field is worse than the old broken default because nothing surfaces the problem. Wrap the fallback in a required() guard that fires on empty discovery with a clear remediation message pointing at both options (set values.yaml explicitly, or fix the node's default route). The etcd fallback reuses the validSubnets guard's fire path — same chart render, same .Values state — so a second required() there would just add noise without extra signal. Test updates: - All existing assertions on 192.168.201.10/24 in the subnet- selector fields switched to 192.168.201.0/24 (the masked form). LinkConfig's 'address: 192.168.201.10/24' stays host-form, which is correct for interface addresses. - New TestMultiDocCozystack_EmptyDiscoveryErrors exercises the new required() guard: advertisedSubnets empty + empty lookup → assert render fails with advertisedSubnets in the error message. - The override-precedence leak check updated to scan for the fallback's canonical form rather than the host form. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…arted Two adjacent docs fixes: - README getting-started previously said to run talm init then talm template -t ... > nodes/node1.yaml. With the new required-endpoint guard, that second command fails with a required() error because values.yaml ships endpoint empty. Add an explicit 'edit values.yaml' step between init and template, covering the endpoint + floatingIP coupling for cozystack deployments and the single-node / external-LB alternatives. Subnet fields are now covered in the same paragraph since they discover automatically. - charts/cozystack/values.yaml gained a long endpoint comment earlier in this branch but still left a silent trap: users who set 'endpoint: https://192.168.100.10:6443' by copying the example without also editing floatingIP ended up with an endpoint pointing at an IP no node VIP-claims. Comment now makes the coupling explicit and tells single-node / external-LB users to blank floatingIP. The floatingIP key itself gained a one-line comment pointing back at endpoint. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Same reasoning as the earlier endpoint-blanking commit: floatingIP shipped as 192.168.100.10, which for any user whose network was not 192.168.100.0/24 silently embedded a Layer2VIPConfig pointing at an IP no node claims. Even worse than the endpoint case because Talos actively advertises the bogus VIP on the primary link, visibly broken without any kube-proxy round trip. Blank the default so operators opt into VIP explicitly. Single-node clusters and external-LB topologies leave it blank and get no VIP document; cozystack multi-node setups set it to match endpoint. Regression guards: TestMultiDocCozystack_NoVIPOnFreshDefaults pins the shipped-default behavior (no VIP); existing TestMultiDocCozystack_ControlPlane loses its pre-PR 'expects 192.168.100.10' assertion; TestMultiDocCozystack_ Layer2VIPConfigWhenFloatingIPSet covers the opt-in path. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Two small-but-meaningful improvements to the subnet fallback and
error wording:
1. Dedupe subnets derived from multiple addresses on the same link.
validSubnets is a set semantically (Talos uses it as a CIDR match
list), so a link with a secondary address in the same subnet
previously emitted two identical list entries. The new pass
collects cidrNetwork results and runs them through sprig's uniq
before emitting. TestMultiDocCozystack_DedupesDuplicateSubnets-
FromMultipleAddresses pins this.
2. Reword required() messages to explain why auto-discovery is not
possible ('talm template runs once per node and cannot reconcile
per-node IPs into one cluster value') instead of the dismissive
'no auto-discovery is possible'. The original ticket author
explicitly asked for auto-derivation, so the message should
explain the constraint instead of just refusing. Swap em-dashes
for -- since Helm's error wrapping is known to mangle em-dashes
in some terminals.
3. advertisedSubnets docstring in charts/generic/values.yaml cleaned
up to match the wording of charts/cozystack/values.yaml for
symmetry.
Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
README getting-started block previously showed endpoint=192.168.0.1 followed by a tutorial node at 1.2.3.4 -- inconsistent for a reader walking the example. Rewrite the example to use 1.2.3.4 throughout as a single-node scenario (endpoint=node IP, floatingIP blank), which also demonstrates the 'blank floatingIP for single-node' path that the new default enables. Test additions and updates on the engine side: - TestMultiDocCozystack_Layer2VIPConfigWhenFloatingIPSet covers the opt-in VIP path (user sets floatingIP, chart emits VIP doc). - TestMultiDocCozystack_NoVIPOnFreshDefaults pins the shipped-empty default behavior: no VIP on a vanilla 'talm init' render. - TestMultiDocCozystack_DedupesDuplicateSubnetsFromMultipleAddresses covers the new uniq-pass in the subnet fallback: two addresses in the same subnet collapse to one list entry per consumer (validSubnets + advertisedSubnets = 2 total occurrences). - TestMultiDocCozystack_ControlPlane (pre-existing) retired its 'expect 192.168.100.10 VIP' assertion in favor of an explicit 'no VIP on fresh defaults' assertion, reflecting the new shipped behavior. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Two test-quality polish items from review:
- TestCidrNetworkTemplateFunc covers cidrNetwork via a trivial chart
that just renders {{ cidrNetwork INPUT }}: happy-path IPv4/IPv6
host-form masking, already-canonical round-trip, narrow prefix
round-trip, and malformed inputs. Runs in ~0ms each. Catches a
regression in the function without needing a full chart render.
- TestMultiDocCozystack_EmptyDiscoveryErrors now asserts both the
advertisedSubnets field name AND the 'default route' diagnostic
phrase. Two independent signals pin the error's guidance shape,
so a future reword cannot silently drop half the message.
Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 53 minutes and 16 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughThe pull request removes hardcoded cluster networking defaults from Helm charts and implements discovery-based subnet derivation with explicit endpoint validation. Users must now configure control-plane endpoints and floating IP values, while subnet defaults are automatically computed from discovered network interfaces when not explicitly provided. Changes
Sequence DiagramsequenceDiagram
participant User as User (values.yaml)
participant Template as Helm Template
participant Discovery as Discovery Data
participant Config as Talos Config
User->>Template: endpoint (required), advertisedSubnets (optional)
alt User provides advertisedSubnets
Template->>Config: Use provided advertisedSubnets
else advertisedSubnets empty/unset
Template->>Discovery: Fetch talm.discovered.default_addresses_by_gateway
alt Discovery has default-gateway addresses
Discovery-->>Template: Return gateway addresses
Template->>Template: Mask each address via cidrNetwork
Template->>Template: Deduplicate CIDR results
Template->>Config: Use derived subnet CIDRs
else No default-gateway addresses
Discovery-->>Template: Empty result
Template->>Template: Hard fail with error
end
end
Template->>Template: Validate endpoint via required()
alt endpoint provided
Template->>Config: Inject endpoint
else endpoint empty
Template->>Template: Fail validation with error
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes The changes span multiple interrelated layers—chart values, template logic with fallback subnet derivation and validation, new Helm engine functions, and extensive test coverage (572 additional test lines)—with heterogeneous logic across discovery integration, CIDR masking, and error handling that each requires separate reasoning. Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements automatic discovery for network subnets in the cozystack and generic Helm charts, falling back to the node's default gateway when advertisedSubnets is not explicitly defined. It also enforces mandatory configuration for the cluster control-plane endpoint by removing default placeholders and utilizing Helm's required function. To support these changes, a new cidrNetwork template function was added to the engine along with comprehensive test cases. Review feedback suggests using the fail function for more idiomatic error reporting in templates and recommends using deep copies for chart values in tests to avoid side effects from shallow copying.
| otherwise produce duplicate list entries. */ -}} | ||
| {{- $addrs := fromJsonArray (include "talm.discovered.default_addresses_by_gateway" .) }} | ||
| {{- if not $addrs }} | ||
| {{- required "values.yaml: `advertisedSubnets` was left empty and talm could not derive a default from discovery. No default-gateway-bearing link was found on the node. This field is a cluster-wide subnet selector fed to kubelet and etcd; `talm template` is invoked once per node and cannot merge per-node values into one cluster value. Either set advertisedSubnets explicitly in values.yaml, or ensure the node has a default route before running `talm template`." "" }} |
There was a problem hiding this comment.
While using required with an empty string works to throw an error, it is more idiomatic in Helm templates to use the fail function for unconditional errors with custom messages. Since fail is already implemented in the engine and used elsewhere, consider using it here for better clarity.
{{- fail "values.yaml: `advertisedSubnets` was left empty and talm could not derive a default from discovery. No default-gateway-bearing link was found on the node. This field is a cluster-wide subnet selector fed to kubelet and etcd; `talm template` is invoked once per node and cannot merge per-node values into one cluster value. Either set advertisedSubnets explicitly in values.yaml, or ensure the node has a default route before running `talm template`." }}
There was a problem hiding this comment.
Good call — swapped to fail in 42967b7. required "" was a hack to express unconditional failure, fail says the same thing directly and matches how the Helm engine already registers both helpers side by side.
| otherwise produce duplicate list entries. */ -}} | ||
| {{- $addrs := fromJsonArray (include "talm.discovered.default_addresses_by_gateway" .) }} | ||
| {{- if not $addrs }} | ||
| {{- required "values.yaml: `advertisedSubnets` was left empty and talm could not derive a default from discovery. No default-gateway-bearing link was found on the node. This field is a cluster-wide subnet selector fed to kubelet and etcd; `talm template` is invoked once per node and cannot merge per-node values into one cluster value. Either set advertisedSubnets explicitly in values.yaml, or ensure the node has a default route before running `talm template`." "" }} |
There was a problem hiding this comment.
Similar to the cozystack chart, consider using the fail function instead of required with an empty string for throwing this discovery error. This is more idiomatic for unconditional failures in Helm templates.
{{- fail "values.yaml: `advertisedSubnets` was left empty and talm could not derive a default from discovery. No default-gateway-bearing link was found on the node. This field is a cluster-wide subnet selector fed to kubelet and etcd; `talm template` is invoked once per node and cannot merge per-node values into one cluster value. Either set advertisedSubnets explicitly in values.yaml, or ensure the node has a default route before running `talm template`." }}
There was a problem hiding this comment.
Fixed symmetrically in 3a8c0a5 — the generic preset now uses fail too, keeping the two presets identical.
| values := make(map[string]any) | ||
| maps.Copy(values, chrt.Values) |
There was a problem hiding this comment.
The use of maps.Copy performs a shallow copy of the chart values. If any test modifies a nested map or slice within the values map, it will inadvertently modify the original chrt.Values, potentially leading to flaky tests or side effects in subsequent test cases. While the current tests only seem to modify top-level keys, using a deep copy mechanism would be more robust for future-proofing the test suite.
There was a problem hiding this comment.
Fair point for future-proofing — switched in 6295e02. Added a small cloneValues helper that recursively deep-copies maps and slices (primitives are immutable so they can stay shared) and routed renderCozystackWith, renderGenericWith, and renderChartTemplate through it. The remaining maps.Copy is only for merging caller-supplied overrides, and those maps are built fresh per test so they're already safe.
…error Address review feedback from gemini-code-assist on charts/cozystack/templates/_helpers.tpl:32: The empty-discovery branch used `required "msg" ""` to fail — the literal empty string as value makes the check unconditionally-true, which is semantically confusing since required() is for optional-vs- required checks. `fail "msg"` expresses the same intent in one function call and matches the engine's existing helpers (the render harness already registers both required and fail). No behavior change; same error message, same trigger condition. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…error (generic) Address review feedback from gemini-code-assist on charts/generic/templates/_helpers.tpl:27: Mirror the cozystack-preset change — swap `required "msg" ""` for `fail "msg"` in the empty-discovery branch. Keeps the two presets symmetric. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Address review feedback from gemini-code-assist on pkg/engine/render_test.go:71: maps.Copy does a shallow copy of chrt.Values. Today's tests only mutate top-level keys, but a future contributor writing something like `values["podSubnets"] = append(values["podSubnets"].([]any), x)` could corrupt chrt.Values via the shared slice's backing array, and downstream tests loading the same chart would pick up the mutation. Introduce cloneValues — a small recursive deep-copy handling maps, slices, and primitives — and use it in the renderCozystackWith, renderGenericWith, and renderChartTemplate helpers where chrt.Values is the starting point. The existing maps.Copy for merging in caller- supplied overrides is kept; overrides maps are built fresh per test so they are already safe. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/engine/render_test.go`:
- Around line 1403-1433: The test fixture is inconsistent: addressesList returns
two addresses but the nodeaddress/default response only contains one, so update
the resource-returning closure in render_test.go (the anonymous function that
switches on resource) to make the "nodeaddress"/"default" case return a
spec.addresses list containing both "192.168.201.10/24" and "192.168.201.11/24"
(matching addressesList) so talm.discovered.default_addresses_by_gateway sees
duplicate input; ensure you modify the map returned in the nodeaddress/default
branch rather than the addressesList variable itself.
In `@README.md`:
- Around line 75-81: The README's rendered config sample still shows the old
placeholder endpoint "https://192.168.0.1:6443" which contradicts the new
values.yaml example using "https://1.2.3.4:6443"; update the generated config
examples in the README (the sample that renders the endpoint and the later block
around lines referred to as 132-135) to use "https://1.2.3.4:6443" (or switch
both to a consistent placeholder like "https://<NODE_IP>:6443") so the endpoint
in the config sample matches the values.yaml example and the walkthrough; search
for the literal "https://192.168.0.1:6443" in README and replace it with the
consistent address/placeholder.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7649c732-ca1f-4842-816f-d751c547bc33
📒 Files selected for processing (8)
README.mdcharts/cozystack/templates/_helpers.tplcharts/cozystack/values.yamlcharts/generic/templates/_helpers.tplcharts/generic/values.yamlpkg/engine/helm/engine.gopkg/engine/helm/engine_test.gopkg/engine/render_test.go
| addressesList := map[string]any{ | ||
| "apiVersion": "v1", | ||
| "kind": "List", | ||
| "items": []any{ | ||
| map[string]any{"spec": map[string]any{ | ||
| "linkName": "eth0", "address": "192.168.201.10/24", | ||
| "family": "inet4", "scope": "global", | ||
| }}, | ||
| map[string]any{"spec": map[string]any{ | ||
| "linkName": "eth0", "address": "192.168.201.11/24", | ||
| "family": "inet4", "scope": "global", | ||
| }}, | ||
| }, | ||
| } | ||
| return func(resource, namespace, id string) (map[string]any, error) { | ||
| switch resource { | ||
| case "routes": | ||
| return routesList, nil | ||
| case "links": | ||
| if id == "eth0" { | ||
| return eth0, nil | ||
| } | ||
| if id == "" { | ||
| return map[string]any{"apiVersion": "v1", "kind": "List", "items": []any{eth0}}, nil | ||
| } | ||
| case "addresses": | ||
| return addressesList, nil | ||
| case "nodeaddress": | ||
| if id == "default" { | ||
| return map[string]any{"spec": map[string]any{"addresses": []any{"192.168.201.10/24"}}}, nil | ||
| } |
There was a problem hiding this comment.
Make the duplicate-subnet fixture consistent across discovery resources.
This test publishes two addresses through the addresses lookup, but nodeaddress/default still exposes only one. If talm.discovered.default_addresses_by_gateway reads from nodeaddress/default, the dedupe regression test can pass without exercising duplicate input.
🧪 Proposed fixture fix
case "nodeaddress":
if id == "default" {
- return map[string]any{"spec": map[string]any{"addresses": []any{"192.168.201.10/24"}}}, nil
+ return map[string]any{"spec": map[string]any{"addresses": []any{
+ "192.168.201.10/24",
+ "192.168.201.11/24",
+ }}}, nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| addressesList := map[string]any{ | |
| "apiVersion": "v1", | |
| "kind": "List", | |
| "items": []any{ | |
| map[string]any{"spec": map[string]any{ | |
| "linkName": "eth0", "address": "192.168.201.10/24", | |
| "family": "inet4", "scope": "global", | |
| }}, | |
| map[string]any{"spec": map[string]any{ | |
| "linkName": "eth0", "address": "192.168.201.11/24", | |
| "family": "inet4", "scope": "global", | |
| }}, | |
| }, | |
| } | |
| return func(resource, namespace, id string) (map[string]any, error) { | |
| switch resource { | |
| case "routes": | |
| return routesList, nil | |
| case "links": | |
| if id == "eth0" { | |
| return eth0, nil | |
| } | |
| if id == "" { | |
| return map[string]any{"apiVersion": "v1", "kind": "List", "items": []any{eth0}}, nil | |
| } | |
| case "addresses": | |
| return addressesList, nil | |
| case "nodeaddress": | |
| if id == "default" { | |
| return map[string]any{"spec": map[string]any{"addresses": []any{"192.168.201.10/24"}}}, nil | |
| } | |
| case "addresses": | |
| return addressesList, nil | |
| case "nodeaddress": | |
| if id == "default" { | |
| return map[string]any{"spec": map[string]any{"addresses": []any{ | |
| "192.168.201.10/24", | |
| "192.168.201.11/24", | |
| }}}, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/engine/render_test.go` around lines 1403 - 1433, The test fixture is
inconsistent: addressesList returns two addresses but the nodeaddress/default
response only contains one, so update the resource-returning closure in
render_test.go (the anonymous function that switches on resource) to make the
"nodeaddress"/"default" case return a spec.addresses list containing both
"192.168.201.10/24" and "192.168.201.11/24" (matching addressesList) so
talm.discovered.default_addresses_by_gateway sees duplicate input; ensure you
modify the map returned in the nodeaddress/default branch rather than the
addressesList variable itself.
There was a problem hiding this comment.
The dedupe path here runs through talm.discovered.default_addresses_by_gateway, which reads the addresses resource (see charts/talm/templates/_helpers.tpl:75) — not nodeaddress/default. The addresses fixture already publishes two entries on eth0 in the same /24, so the duplicate input is exercised as intended. nodeaddress/default feeds talm.discovered.default_addresses, a separate helper that is not on the subnet-derivation path, so the single-address entry there does not weaken the regression test.
There was a problem hiding this comment.
@lexfrei, thanks for the clear explanation — you're right, and my comment was incorrect. Since talm.discovered.default_addresses_by_gateway reads from the addresses resource (not nodeaddress/default), the two-entry addressesList fixture already exercises the duplicate-input path. The inconsistency in nodeaddress/default is intentional and harmless because that resource feeds a completely separate helper. The dedupe regression test is correctly structured as-is.
✏️ Learnings added
Learnt from: lexfrei
Repo: cozystack/talm PR: 130
File: pkg/engine/render_test.go:1403-1433
Timestamp: 2026-04-24T10:57:14.996Z
Learning: In the talm chart helpers (`charts/talm/templates/_helpers.tpl`), `talm.discovered.default_addresses_by_gateway` reads from the `addresses` lookup resource (around line 75), NOT from `nodeaddress/default`. The `nodeaddress/default` resource feeds `talm.discovered.default_addresses`, a separate helper that is not on the subnet-derivation/dedupe path used by `kubelet.validSubnets` and `etcd.advertisedSubnets`. These two helpers are distinct and should not be conflated when reviewing test fixtures for subnet-derivation or deduplication behavior.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: lexfrei
Repo: cozystack/talm PR: 116
File: charts/cozystack/templates/_helpers.tpl:127-131
Timestamp: 2026-04-14T15:32:01.453Z
Learning: In `charts/cozystack/templates/_helpers.tpl` and `charts/generic/templates/_helpers.tpl`, the `talos.config.network.multidoc` template intentionally ignores `talm.discovered.existing_interfaces_configuration`. That helper emits a legacy `machine.network.interfaces` YAML fragment that has no safe 1:1 translation to Talos v1.12 multi-doc resources (LinkConfig / BondConfig / VLANConfig / Layer2VIPConfig). The multidoc path rebuilds all required network documents directly from discovery resources instead. Users relying on custom legacy interface state must stay on Talos v1.11.x until they migrate their overrides to the new v1.12 resource types.
…hrough Address review feedback from coderabbitai on README.md:135: The rendered config sample showed the old placeholder https://192.168.0.1:6443, which contradicted the values.yaml example. Swap the walkthrough to the RFC 5737 TEST-NET-1 range (192.0.2.0/24) for both node and gateway so the example uses IPs reserved for documentation and cannot accidentally collide with a real network. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
e4ef7c4 to
d67d0d1
Compare
myasnikovdaniil
left a comment
There was a problem hiding this comment.
Here is my draft review.
Draft review for PR #130 — fix(chart): discovery-based defaults for subnets; required endpoint and VIP
Summary
This PR closes a real foot-gun: charts shipping with 192.168.100.* placeholder defaults silently produced broken machine configs for everyone not on that subnet. The fix is in three layers — empty-by-default required values for cluster-wide fields (endpoint, floatingIP), a discovery fallback wired through a new cidrNetwork template function for advertisedSubnets, and rewritten user-facing docs that explain the cozystack-VIP / single-node / external-LB decision tree. Test coverage is thorough (override precedence, empty-discovery guard, dedupe, fresh-defaults regression guard, IPv4 + IPv6 cidrNetwork unit tests, worker path, generic path). The breaking change for existing users is acknowledged and the error messages enumerate remediation, so the migration story is reasonable. A few minor cleanups noted below; nothing blocking.
Migration safety check — schema validation as a nice-to-have
This isn't a blocker, but since endpoint is now hard-required and the failure mode is "render-time error with a long message," adding a values.schema.json to each chart that marks endpoint as required would surface the issue at helm lint time too — earlier in the workflow than a render error. Out of scope for this PR; flagging only because the PR is the natural place to do it next.
Recommendation
Comment — the change is sound and well-tested. The findings above are nits and small defensive-programming suggestions; none of them block merge. I'd particularly encourage fixing the stale required()-vs-fail comment (5-second change) and adding the duplicate empty-discovery guard to the etcd block (5-line change) before merge. The CIDR canonicalization consistency and the schema-validation suggestion can land later.
Stopping here — this is a draft for your review, not posted anywhere.
cloneValues / deepClone justification reads well (pkg/engine/render_test.go)
The deep-copy helper is well-justified by the comment ("maps.Copy is a shallow copy — mutating a nested map or slice in a test would leak into chrt.Values and corrupt subsequent renders"). Nice.
| {{- /* Fall back to the subnet of the node's default-gateway-bearing | ||
| link; cidrNetwork masks host bits to emit canonical network | ||
| form. Dedupe handled the same way as validSubnets above. | ||
| Empty discovery already errored via validSubnets' required() |
There was a problem hiding this comment.
stale "required()" wording in comment
The most recent commit (3a8c0a5) switched from required to fail for the empty-discovery path, but the comment in the etcd block still references the old guard name. Same wording lives in charts/generic/templates/_helpers.tpl:71.
{{- /* Fall back to the subnet of the node's default-gateway-bearing
link; cidrNetwork masks host bits to emit canonical network
form. Dedupe handled the same way as validSubnets above.
Empty discovery already errored via validSubnets' fail
guard, so we reach this block only when at least one address
was resolved. */ -}}| advertisedSubnets: | ||
| {{- if .Values.advertisedSubnets }} | ||
| {{- toYaml .Values.advertisedSubnets | nindent 6 }} | ||
| {{- else }} | ||
| {{- /* Fall back to the subnet of the node's default-gateway-bearing | ||
| link; cidrNetwork masks host bits to emit canonical network | ||
| form. Dedupe handled the same way as validSubnets above. | ||
| Empty discovery already errored via validSubnets' required() | ||
| guard, so we reach this block only when at least one address | ||
| was resolved. */ -}} | ||
| {{- $subnets := list }} | ||
| {{- range fromJsonArray (include "talm.discovered.default_addresses_by_gateway" .) }} | ||
| {{- $subnets = append $subnets (. | cidrNetwork) }} | ||
| {{- end }} | ||
| {{- range uniq $subnets }} | ||
| - {{ . }} | ||
| {{- end }} | ||
| {{- end }} |
There was a problem hiding this comment.
etcd block's empty-discovery handling is implicit
The etcd advertisedSubnets block deliberately omits the empty-discovery fail because validSubnets renders first and errors out earlier. That's true today, but fragile: any future refactor that splits talos.config.machine.common and talos.config.cluster into different render passes (or skips the machine block for some path) silently emits an empty advertisedSubnets: list — exactly the silent-fail mode this PR is trying to eliminate.
A defensive duplicate of the same if not $addrs / fail check costs five lines and removes the cross-block dependency. Same applies to charts/generic/templates/_helpers.tpl:65-82.
{{- else }}
{{- $addrs := fromJsonArray (include "talm.discovered.default_addresses_by_gateway" .) }}
{{- if not $addrs }}
{{- fail "values.yaml: `advertisedSubnets` was left empty and talm could not derive a default from discovery. ..." }}
{{- end }}
{{- $subnets := list }}
{{- range $addrs }}
{{- $subnets = append $subnets (. | cidrNetwork) }}
{{- end }}
{{- range uniq $subnets }}
- {{ . }}
{{- end }}
{{- end }}| {{- if .Values.advertisedSubnets }} | ||
| {{- toYaml .Values.advertisedSubnets | nindent 8 }} |
There was a problem hiding this comment.
operator-override path skips canonicalization
When the operator sets advertisedSubnets: ["192.168.201.10/24"] (host form) in values.yaml, the chart emits it verbatim via toYaml, while the fallback path runs every entry through cidrNetwork so the YAML is in canonical network form. That inconsistency means two operators with the same underlying intent — one relying on discovery, one with an explicit override — get differently-formatted machine configs, which is noisy in kubectl diff/talm diff output and surprising in code review. Consider canonicalizing in both branches:
validSubnets:
{{- $sources := .Values.advertisedSubnets }}
{{- if not $sources }}
{{- $sources = fromJsonArray (include "talm.discovered.default_addresses_by_gateway" .) }}
{{- if not $sources }}
{{- fail "values.yaml: `advertisedSubnets` was left empty and talm could not derive a default from discovery. ..." }}
{{- end }}
{{- end }}
{{- $subnets := list }}
{{- range $sources }}
{{- $subnets = append $subnets (. | cidrNetwork) }}
{{- end }}
{{- range uniq $subnets }}
- {{ . }}
{{- end }}This also de-duplicates the validSubnets and etcd.advertisedSubnets blocks down to a shared talos.discovered.subnet_list define, which is a sympathetic refactor for the comment in the etcd block that already says "handled the same way as validSubnets above." If you'd rather keep the operator-override path passthrough (e.g., to preserve operator-supplied formatting verbatim), call this out in a comment so future readers don't try to "fix" the inconsistency.
| otherwise produce duplicate list entries. */ -}} | ||
| {{- $addrs := fromJsonArray (include "talm.discovered.default_addresses_by_gateway" .) }} | ||
| {{- if not $addrs }} | ||
| {{- fail "values.yaml: `advertisedSubnets` was left empty and talm could not derive a default from discovery. No default-gateway-bearing link was found on the node. This field is a cluster-wide subnet selector fed to kubelet and etcd; `talm template` is invoked once per node and cannot merge per-node values into one cluster value. Either set advertisedSubnets explicitly in values.yaml, or ensure the node has a default route before running `talm template`." }} |
There was a problem hiding this comment.
fail message field-name disambiguation
The error fires from inside the validSubnets: rendering block but the message references advertisedSubnets. The values.yaml key is advertisedSubnets, so the message is correct, but a user staring at the rendered (broken) output sees validSubnets: and may chase the wrong field. One word in the message would close the gap:
"values.yaml: `advertisedSubnets` (which feeds both kubelet validSubnets and etcd advertisedSubnets) was left empty and talm could not derive a default from discovery. ..."
Same in the generic copy.
| // cidrNetwork canonicalizes a CIDR string to its network form | ||
| // ("192.168.201.10/24" -> "192.168.201.0/24"), matching what | ||
| // operators see in Talos docs and upstream examples. Sprig ships | ||
| // no equivalent; net/netip's ParsePrefix + Masked handles both | ||
| // IPv4 and IPv6 without any host-bit arithmetic in the template. | ||
| funcMap["cidrNetwork"] = func(cidr string) (string, error) { | ||
| p, err := netip.ParsePrefix(cidr) | ||
| if err != nil { | ||
| return "", fmt.Errorf("cidrNetwork: %w", err) | ||
| } |
There was a problem hiding this comment.
cidrNetwork is fine; consider also exposing the bare network address
Not a request, just a flag for future iterations: callers may eventually want the network address without the prefix length (192.168.201.0 rather than 192.168.201.0/24) — for example when constructing CIDR list entries with explicit prefix logic. If a cidrNetworkAddress companion is ever needed, the netip.Prefix.Masked().Addr().String() form is one line. Out of scope for this PR.
| # Example: floatingIP: 192.168.0.1 | ||
| floatingIP: "" |
There was a problem hiding this comment.
coupling note in floatingIP comment is great, suggest explicit cross-link
The floatingIP comment says "MUST equal the host portion of endpoint above" but doesn't show the symmetric note in the endpoint comment (it does mention "set endpoint AND floatingIP below to the SAME IP"). The two comments together tell the right story, but a reader who hits one comment first without scrolling may miss the coupling. Consider a one-line cross-reference at the top of floatingIP:
# See `endpoint` above — these two values are coupled and MUST match
# for cozystack VIP setups.
Summary
Close the loop on the
192.168.100.*placeholder problem:talm templateon a node in any subnet other than 192.168.100.0/24 silently produced machine configs pointing at network IPs no node actually owns.Closes #25
What changes
1. Subnet fields derive from discovery
machine.kubelet.nodeIP.validSubnetsandcluster.etcd.advertisedSubnetsnow fall back to the CIDR of the node's default-gateway-bearing link whenvalues.yamlleavesadvertisedSubnetsempty. The chart pipes the discovered address (192.168.201.10/24) through a newcidrNetworktemplate function —net/netip'sParsePrefix().Masked()in 6 lines — so the emitted YAML is the canonical network form (192.168.201.0/24) rather than the host form. Multiple addresses on the same link are deduped via sprig'suniq.Override precedence is preserved: if the operator sets
advertisedSubnetsexplicitly invalues.yaml, that wins.When both paths fail (operator didn't set it, discovery returned nothing), the chart fails loudly with a
required()error that explains both remediations (set it explicitly, or fix the node's default route) — not silently emit an empty list.2. Cluster-wide fields are now
requiredinstead of placeholder-defaultedThree fields shipped with
192.168.100.*placeholder defaults that silently produced broken configs for anyone not on that subnet:endpoint— what every node's kubelet and kube-proxy dials.floatingIP(cozystack preset) — drives the Layer-2 VIP document.All three are now blank by default.
endpointtriggers arequired()guard with a verbose message that explains why auto-derivation isn't possible (talm templateruns once per node and has no way to reconcile per-node IPs into a single shared endpoint) and enumerates the viable options (cozystack VIP, external LB, single-node routable IP).floatingIPemits noLayer2VIPConfigdocument when blank, which is the correct behavior for single-node and external-LB topologies.3. Documentation matches reality
charts/cozystack/values.yamlandcharts/generic/values.yaml— rewritten comments explain the decision tree (cozystack VIP vs single-node vs external LB) and theendpoint/floatingIPcoupling.README.md"Getting Started" — adds the requiredvalues.yamledit step betweentalm initandtalm template. Example values match the tutorial's node IP so a reader walking the example doesn't see placeholder-vs-tutorial drift.4. Test coverage
pkg/engine/render_test.gogains tests for:validSubnetsandadvertisedSubnetsdiscovery fallback (controlplane + worker + generic preset).values.yamlwins.required()guard fires with both the field name and the remediation phrase in the error.chrt.Valuesrender fails, catching any future commit that re-introduces a placeholder.floatingIP.pkg/engine/helm/engine_test.gogainsTestCidrNetworkTemplateFunc— direct IPv4/IPv6 unit test of the new template function (7 cases including already-canonical, narrow prefix, malformed inputs).Migration notes for existing users
talm init'd before and leftvalues.yamldefaults now sees arequirederror on nexttalm template. Remediation is in the error message.endpointandadvertisedSubnetsinvalues.yamlis unaffected.floatingIP: 192.168.100.10default (unlikely, given it was network-incorrect for most users) now needs to set it explicitly.Verification
go test ./...on macOS — passGOOS=windows GOARCH=amd64 go build ./...— cleangolangci-lint run ./...— 0 issuesSummary by CodeRabbit
Release Notes
Documentation
Configuration Changes
Improvements