Skip to content

fix(network): correct multi-NIC discovery and add per-link helpers#127

Merged
kvaps merged 9 commits intomainfrom
fix/discovery-multi-nic-helpers
Apr 28, 2026
Merged

fix(network): correct multi-NIC discovery and add per-link helpers#127
kvaps merged 9 commits intomainfrom
fix/discovery-multi-nic-helpers

Conversation

@lexfrei
Copy link
Copy Markdown
Contributor

@lexfrei lexfrei commented Apr 16, 2026

Summary

Three independent fixes/improvements in the discovery helpers and network templates, all touching the same area:

  1. default_*_by_gateway helpers now return a single deterministic value. On multi-NIC nodes with several default routes (typical with DHCP), the helpers iterated every matching route and concatenated the outputs — eth0eth1 for default_link_name_by_gateway, two MACs glued together for the address variant, two busPath: ... YAML fragments for the selector. Downstream templates used the joined string as the interface name, so Talos either rejected the config or applied it to the wrong NIC. They now filter by table=main and break after the first match. Also fixes a long-standing copy-paste bug in default_link_bus_by_gateway, which was returning the link's MAC address instead of its busPath.

  2. Empty interface emission removed from cozystack/generic templates. When discovery returned no default route (offline render, isolated node, custom networking), the templates emitted Talos documents with empty names: LinkConfig/BondConfig/VLANConfig/Layer2VIPConfig with empty name: (multidoc), or interfaces: followed by - interface: with an empty value (legacy). Talos v1.12 rejects the multidoc form; the legacy form produced invalid YAML in some environments. The per-interface documents are now skipped when no default link is discovered.

  3. New per-link discovery helpers for multi-NIC templates. The existing default_*_by_gateway family exposes only the primary link. Templates for nodes with secondary NICs (storage uplink on a control-plane, second public link, etc.) had no way to enumerate every NIC or read addresses/gateway/MAC by name. Adds physical_link_names, configurable_link_names, addresses_by_link, gateway_by_link (IPv4 only by convention), routes_by_link, mac_by_link, bus_by_link, link_selector_by_name. Default cozystack/generic templates are unchanged; the new helpers are building blocks for user templates.

Verification

  • New unit tests cover all three changes: TestDefaultLinkByGatewayHelpers_MultiNIC, TestNetworkMultidoc_NoDiscovery / TestNetworkLegacy_NoDiscovery, TestSecondaryNicHelpers. The multi-NIC test fixture deliberately mixes routes across table=main and a private table to exercise both the table filter and the early break; the secondary-NIC fixture orders an IPv6 default route ahead of the IPv4 one to catch a missing family filter; the route fixture includes both an entry with absent fields and one with priority: 0.
  • Existing offline tests that previously asserted broken-empty LinkConfig output now use simpleNicLookup so they continue to validate the populated render path. TestMultiDocCozystack_BondTopology and _VlanOnBondTopology are byte-for-byte identical and still pass.
  • golangci-lint run is clean.

Closes

Summary by CodeRabbit

Release Notes

  • New Features

    • Added multi-NIC network helper utilities for enhanced network configuration discovery and management.
  • Bug Fixes

    • Improved network discovery to accurately select the correct network interface when multiple default routes exist.
    • Enhanced network interface configuration rendering to gracefully handle scenarios where network discovery provides no data.
  • Tests

    • Added comprehensive regression tests for multi-NIC and network discovery scenarios to ensure robust network configuration handling.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

📝 Walkthrough

Walkthrough

Conditional guards added to avoid emitting empty network YAML when no default link is discovered; multi‑NIC discovery helpers were added and default-route selection restricted to spec.table == "main" with early termination; tests updated to support lookup-driven template rendering and regression cases.

Changes

Cohort / File(s) Summary
Template conditional guards
charts/cozystack/templates/_helpers.tpl, charts/generic/templates/_helpers.tpl
Guard generation of VLAN/bond/link/L2VIP and legacy interfaces: blocks on presence of a discovered default link name; compute default link once and skip blocks when absent.
TALM discovery helpers
charts/talm/templates/_helpers.tpl
Restrict default-route helpers to spec.table == "main" with break on first match; return busPath for bus helper; add multi‑NIC by-name helpers (physical_link_names, configurable_link_names, addresses_by_link, gateway_by_link, routes_by_link, mac_by_link, bus_by_link, link_selector_by_name).
Test harness & regression tests
pkg/engine/render_test.go
Add renderChartTemplateWithLookup() to control Helm lookup in tests; add NIC lookup fixtures and regression tests verifying deterministic default-link selection, per-link outputs, and absence of network blocks when discovery yields no default route.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Poem

🐰 I hopped through templates bright and spry,

Found default routes that would not lie,
I counted links from bus to MAC,
Guarded outputs, never backtrack,
Now YAML sings and tests reply.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(network): correct multi-NIC discovery and add per-link helpers' accurately summarizes the main changes: correcting multi-NIC discovery behavior and adding new per-link helper functions.
Linked Issues check ✅ Passed The PR addresses all linked issues: fixes multi-NIC discovery to return single deterministic values [#108], prevents empty interface blocks when no default route exists [#58], and adds per-link helpers for advanced multi-NIC templates.
Out of Scope Changes check ✅ Passed All changes are directly related to the linked issues: fixing discovery helpers, adding conditional rendering in templates, and expanding the helper library for multi-NIC support. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/discovery-multi-nic-helpers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces multi-NIC discovery helpers and ensures deterministic link selection by filtering for the main routing table and stopping at the first match. It updates Helm templates to conditionally render network interfaces only when discovery data is available, preventing validation errors in Talos. Test coverage is expanded with new mock lookup functions and regression tests for multi-NIC and offline scenarios. Feedback identifies a copy-paste error in the bus path helper and suggests improving the handling of nil fields in JSON and string conversions to avoid producing '' strings.

Comment thread charts/talm/templates/_helpers.tpl Outdated
{{- range (lookup "routes" "" "").items }}
{{- if and (eq .spec.dst "") (not (eq .spec.gateway "")) }}
{{- if and (eq .spec.dst "") (not (eq .spec.gateway "")) (eq .spec.table "main") }}
{{- (lookup "links" "" .spec.outLinkName).spec.hardwareAddr }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The talm.discovered.default_link_bus_by_gateway helper is incorrectly returning the hardwareAddr (MAC address) instead of the busPath. This appears to be a copy-paste error from the default_link_address_by_gateway helper defined just above. Since you are already correcting these helpers to be deterministic, this should be fixed to return the correct field.

{{- (lookup "links" "" .spec.outLinkName).spec.busPath }}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b3729ad. Same diagnosis — the helper had been returning .spec.hardwareAddr since 2024-05 instead of .spec.busPath. Changed exactly as suggested. Coverage added in TestDefaultLinkByGatewayHelpers_MultiNIC, which asserts bus=pci-0000:00:1f.0 and explicitly rejects the MAC.

Comment thread charts/talm/templates/_helpers.tpl Outdated
{{- $routes := list -}}
{{- range (lookup "routes" "" "").items -}}
{{- if and (eq .spec.outLinkName $linkName) (not (eq .spec.dst "")) -}}
{{- $entry := dict "dst" .spec.dst "gateway" (.spec.gateway | toString) "family" (.spec.family | toString) "table" (.spec.table | toString) "priority" (.spec.priority | toString) -}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using toString on fields like gateway, priority, or table is problematic because if these fields are missing (nil), toString will return the literal string "<no value>". This string will then be included in the JSON output, which is likely not what downstream consumers expect. It is better to omit toString and let toJson handle the values; nil values will correctly become null in the JSON, and numeric types like priority will be preserved as numbers.

{{- $entry := dict "dst" .spec.dst "gateway" .spec.gateway "family" .spec.family "table" .spec.table "priority" .spec.priority -}}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in b3729ad with a slightly different choice. Each optional field is now wrapped in kindIs "invalid" and emitted via printf "%v", so absent fields become "" (never "<no value>" or "<nil>") and present values — including integer 0 (e.g. priority: 0) — round-trip correctly. Values are kept as strings rather than letting toJson emit null for missing fields, because consumer templates iterate via fromJsonArray + dig and benefit from a uniform string contract. TestSecondaryNicHelpers covers both the absent-fields and priority: 0 paths and asserts no <nil>/<no value> leaks.

Comment thread charts/talm/templates/_helpers.tpl Outdated
Comment on lines +354 to +355
{{- if $link -}}
{{- $link.spec.hardwareAddr | toString -}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The check if $link is always true because the Helm lookup function returns an empty map map[] if the resource is not found, and maps are considered truthy in Helm templates. If the link is not found, $link.spec will be nil, and | toString will produce the string "<no value>". It is safer to check for .spec and avoid toString on potentially nil fields to ensure the helper returns an empty string when the link or field is missing.

{{- if $link.spec -}}
{{- $link.spec.hardwareAddr -}}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b3729ad. Both helpers now check and $link $link.spec, so an unknown link name yields an empty string in either regime (real Helm lookup returns nil, test mocks may return {}). TestSecondaryNicHelpers asserts mac_unknown= and bus_unknown= against an unknown link to pin the contract.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@charts/talm/templates/_helpers.tpl`:
- Around line 126-133: The helper define
"talm.discovered.default_link_bus_by_gateway" incorrectly returns the MAC
address; change the returned property from (lookup "links" ""
.spec.outLinkName).spec.hardwareAddr to the Link CRD's bus path field (replace
.spec.hardwareAddr with the correct bus field, e.g. .spec.bus or .spec.busPath)
so the helper returns the bus path for the matching route; keep the existing
lookup("links", "" .spec.outLinkName) and break logic intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 70dedfa0-107c-4675-aa4e-5cea83075cd2

📥 Commits

Reviewing files that changed from the base of the PR and between 00358fd and 9add120.

📒 Files selected for processing (4)
  • charts/cozystack/templates/_helpers.tpl
  • charts/generic/templates/_helpers.tpl
  • charts/talm/templates/_helpers.tpl
  • pkg/engine/render_test.go

Comment thread charts/talm/templates/_helpers.tpl
@lexfrei lexfrei self-assigned this Apr 17, 2026
Copy link
Copy Markdown
Contributor

@myasnikovdaniil myasnikovdaniil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below is the draft review. Tests pass locally on the PR HEAD (combined with 00358fd from origin/main, which the PR is rebased on); flagged a couple of minor items but nothing blocking.


Review of PR #127fix(network): correct multi-NIC discovery and add per-link helpers

Summary

Three independent and well-motivated changes in the discovery layer:

  1. The default_*_by_gateway family is fixed to return a single deterministic value — adds the missing table=main filter and breaks after the first match, ending the eth0eth1 concatenation on multi-NIC DHCP nodes. Also fixes a real copy-paste bug in default_link_bus_by_gateway that returned the MAC instead of the busPath.
  2. Empty LinkConfig/BondConfig/VLANConfig/Layer2VIPConfig documents (multidoc) and bare - interface: blocks (legacy) are no longer emitted when discovery returns nothing — Talos v1.12 was rejecting these.
  3. New per-link helpers (physical_link_names, addresses_by_link, gateway_by_link, routes_by_link, mac_by_link, bus_by_link, link_selector_by_name, configurable_link_names) so user templates can configure secondary NICs without monkey-patching the existing primary-link wrappers.

The bugfixes are real and the test fixtures are thoughtfully constructed (mixed routing tables, IPv6 ordered before IPv4, priority: 0 and missing-field round-trip). All tests pass locally; the existing offline TestMultiDocCozystack_* tests are correctly migrated to use a populated simpleNicLookup so they stop relying on the broken-empty render path.

A couple of minor observations below — none are blockers.

Recommendation

Comment (lean approve). The three core fixes are correct and well-tested; the minor items above are tightening suggestions, not blockers. I'd at least drop the .claude/worktrees/ .gitignore entry before merging.

Comment thread .gitignore Outdated
@@ -1,2 +1,3 @@
talm
dist/
.claude/worktrees/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contributor-specific entry

.claude/worktrees/

This is Claude-Code-specific scratch state and shouldn't live in the project's .gitignore — it leaks one contributor's tooling onto everyone. Recommend putting it in your personal ~/.config/git/ignore (or core.excludesFile) instead and dropping the commit 06581ec from this PR.

talm
dist/

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in ab48e7f and folded the entry into my personal ~/.config/git/ignore per your suggestion, so future runs don't re-surface the path in git status and re-tempt me to commit it.

Comment on lines +380 to +385
{{- define "talm.discovered.bus_by_link" -}}
{{- $link := lookup "links" "" . -}}
{{- if and $link $link.spec -}}
{{- $link.spec.busPath | toString -}}
{{- end -}}
{{- end -}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bus_by_link / mac_by_link can emit <nil> for virtual links

{{- define "talm.discovered.bus_by_link" -}}
{{- $link := lookup "links" "" . -}}
{{- if and $link $link.spec -}}
{{- $link.spec.busPath | toString -}}
{{- end -}}
{{- end -}}

When the link exists with a spec but the specific field is absent — concretely, bus_by_link "bond0" on the secondary-NIC fixture, since virtual links don't have a busPath$link.spec.busPath is nil and nil | toString renders as <nil>. The doc string promises "empty if the link or its spec is missing", but a present spec with a missing field falls through. The global assertNotContains "<nil>" in TestSecondaryNicHelpers doesn't exercise this exact call.

Suggested tighten:

{{- define "talm.discovered.bus_by_link" -}}
{{- $link := lookup "links" "" . -}}
{{- if and $link $link.spec $link.spec.busPath -}}
{{- $link.spec.busPath | toString -}}
{{- end -}}
{{- end -}}

Same shape for mac_by_link (guard on $link.spec.hardwareAddr).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tightened both helpers in 28f37e0 — the if guard now extends to the specific field ($link.spec.busPath / $link.spec.hardwareAddr), so a present-spec / absent-field link returns empty rather than dereferencing through nil | toString. The bond0 fixture in secondaryNicLookup already had hardwareAddr but no busPath; added explicit bus_bond0 / mac_bond0 template inclusions to TestSecondaryNicHelpers so the regression is pinned directly rather than relying on the global assertNotContains "<nil>" sweep. Updated the doc comments to spell out the present-spec / absent-field case.

Comment on lines +349 to +367
{{- define "talm.discovered.routes_by_link" -}}
{{- $linkName := . -}}
{{- $routes := list -}}
{{- range (lookup "routes" "" "").items -}}
{{- if and (eq .spec.outLinkName $linkName) (not (eq .spec.dst "")) -}}
{{- $gateway := "" -}}
{{- if not (kindIs "invalid" .spec.gateway) -}}{{- $gateway = printf "%v" .spec.gateway -}}{{- end -}}
{{- $family := "" -}}
{{- if not (kindIs "invalid" .spec.family) -}}{{- $family = printf "%v" .spec.family -}}{{- end -}}
{{- $table := "" -}}
{{- if not (kindIs "invalid" .spec.table) -}}{{- $table = printf "%v" .spec.table -}}{{- end -}}
{{- $priority := "" -}}
{{- if not (kindIs "invalid" .spec.priority) -}}{{- $priority = printf "%v" .spec.priority -}}{{- end -}}
{{- $entry := dict "dst" .spec.dst "gateway" $gateway "family" $family "table" $table "priority" $priority -}}
{{- $routes = append $routes $entry -}}
{{- end -}}
{{- end -}}
{{- toJson $routes -}}
{{- end -}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

routes_by_link returns priority as a string

{{- $priority := "" -}}
{{- if not (kindIs "invalid" .spec.priority) -}}{{- $priority = printf "%v" .spec.priority -}}{{- end -}}
{{- $entry := dict "dst" .spec.dst "gateway" $gateway "family" $family "table" $table "priority" $priority -}}

The result is "priority":"0" / "priority":"100" — string-typed in the JSON. Functional and documented in the helper comment, but downstream callers that want metric/priority ordering will have to atoi the value. If you'd rather keep it native:

{{- $entry := dict "dst" .spec.dst "gateway" $gateway "family" $family "table" $table -}}
{{- if not (kindIs "invalid" .spec.priority) -}}
{{- $entry = merge $entry (dict "priority" .spec.priority) -}}
{{- end -}}

Not a blocker; flagging because priority is the one field where the lossy stringification is most likely to bite a future consumer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declining this one. The helper's invariant is "all fields are present strings; absent fields are empty strings", per the existing doc and pinned by the assertContains '"priority":"0"' assertion. Switching priority alone to native int introduces a heterogeneous shape — "string for dst, int for priority, empty string for absent fields" — which forces every consumer to add a type/presence check. The single-field atoi cost on the consumer side is real but bounded.

Happy to revisit if a downstream consumer surfaces concrete metric-ordering pain — at that point the better answer is probably a sibling helper that emits native types throughout, leaving this one as the string-uniform version.

Comment thread charts/talm/templates/_helpers.tpl Outdated
Comment on lines +344 to +348
{{- /* JSON list of non-default routes on the given link. Each entry is a flat
map {dst, gateway, family, table, priority} so consumers can
fromJsonArray + range + dig. Absent fields are emitted as empty
strings; present fields including the integer 0 (e.g. priority: 0)
round-trip through toString. */ -}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

routes_by_link doc could mention the table behaviour

The helper doc comment lists the returned fields but doesn't say it returns routes from all tables (not just main). Since gateway_by_link filters to table=main but routes_by_link doesn't, calling out the asymmetry would save a future reader a grep. One-line suggestion:

{{- /* JSON list of non-default routes on the given link, across all routing
       tables (the `table` field is included so callers can filter). Each
       entry is a flat map {dst, gateway, family, table, priority} ...
*/ -}}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 28f37e0 — the doc now calls out the all-tables behaviour and the asymmetry with gateway_by_link (which narrows to table=main).

lexfrei added 9 commits April 28, 2026 15:31
… value

The helpers iterating routes for the primary link (default_link_name_by_gateway,
default_link_address_by_gateway, default_link_bus_by_gateway,
default_link_selector_by_gateway) had no break and did not filter by routing
table. On a multi-NIC node with several default routes (typical with DHCP),
they iterated every match and concatenated the outputs into a single string,
producing values like 'eth0eth1' that downstream templates use as the
interface name. Talos validation then fails or the config is applied to the
wrong NIC.

Filter by table=main and break after the first match so each helper returns
exactly one value. Same break added to the route lookup inside
default_addresses_by_gateway.

Fixes #108

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…overed

When discovery returned no default route (offline render, isolated node, or
custom networking), the cozystack and generic templates emitted Talos config
documents with empty values: a LinkConfig/BondConfig/VLANConfig/Layer2VIPConfig
with empty 'name:', or — in the legacy machine.network path — an
'interfaces:' key followed by '- interface:' with an empty name. Talos v1.12
rejects the multi-doc form with [networking.os.device.interface] required;
the legacy form produced invalid YAML in some environments.

Wrap the multidoc per-interface documents in 'if $defaultLinkName' and emit
the legacy 'interfaces:' key only when a default link was found or when
existing_interfaces_configuration carries content forward. Hostname and
resolver documents stay (they are independent of link discovery).

Fixes #58

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The existing default_*_by_gateway helpers expose only the primary link (the
one carrying the default route). Templates targeting nodes with secondary
NICs — a storage uplink on a control-plane, a second public link, etc. — had
no way to enumerate every physical NIC or read addresses, gateway, MAC, or
bus path by link name.

Add the following helpers under talm.discovered:

- physical_links: JSON list of physical NIC names (eno|eth|enp|enx|ens with
  a busPath)
- configurable_links: physical NICs plus bond/vlan/bridge top-level links
- addresses_by_link: JSON list of CIDRs configured on the link, excluding
  host scope
- gateway_by_link: scalar gateway IP for the default route on the link, in
  the main table
- routes_by_link: JSON list of non-default routes on the link as flat
  {dst, gateway, family, table, priority} objects
- mac_by_link, bus_by_link: scalar accessors
- link_selector_by_name: YAML 'busPath: ...' fragment for use as a Talos
  deviceSelector

Default cozystack/generic templates are unchanged; the new helpers are
building blocks for user templates.

Refs #125

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add regression coverage for the discovery-helper changes:

- TestDefaultLinkByGatewayHelpers_MultiNIC: a node with two table=main default
  routes plus a third in an alternate routing table. Asserts that
  default_link_name_by_gateway and default_link_address_by_gateway return a
  single value (the first matching main-table route) and never concatenate.
- TestNetworkMultidoc_NoDiscovery / TestNetworkLegacy_NoDiscovery: render the
  cozystack chart with empty discovery and assert no LinkConfig/BondConfig/
  VLANConfig/Layer2VIPConfig is emitted (multidoc) and no 'interfaces:' or
  empty '- interface:' block appears (legacy).
- TestSecondaryNicHelpers: render a chart that calls each new per-link
  helper against a two-NIC topology (eth0 primary with default route, eth1
  storage with static routes only) and validate every output.

Existing offline tests that asserted broken-empty LinkConfig output
(TestMultiDocCozystack_ControlPlane, TestMultiDocCozystack_LegacyFallback,
TestMultiDocGeneric_ControlPlane, TestMultiDocGeneric_Worker) move from a
no-discovery offline render to renderChartTemplateWithLookup with
simpleNicLookup so they continue to validate the populated LinkConfig path.
TestMultiDocCozystack_BondTopology and _VlanOnBondTopology stay byte-for-byte
identical.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Three corrections to the per-link helpers introduced for #125, all tested.

- routes_by_link: route resources with absent gateway/family/table/priority
  fields produced JSON entries containing the literal string '<nil>' (then
  HTML-escaped to '\u003cnil\u003e' by toJson), which downstream templates
  would render verbatim into Talos config. Wrap each optional field with
  default "" so missing fields are emitted as empty strings.
- addresses_by_link: extend the scope filter from {host} to {host, link,
  nowhere} so IPv6 link-local (fe80::/64) and other kernel-managed addresses
  no longer leak into per-link configs.
- gateway_by_link: filter by family=inet4. On dual-stack nodes the helper
  previously returned whichever default route appeared first regardless of
  family, producing a gateway/address-family mismatch when paired with
  addresses_by_link. Add a sibling for IPv6 if needed later.

TestSecondaryNicHelpers fixture extended with a route missing several
fields, an fe80:: link-local address, and an IPv6 default route placed
ordered first. Verified by reverting each fix in turn — every assertion
fails without its corresponding fix.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
- default_link_bus_by_gateway: was returning .spec.hardwareAddr (a MAC) since
  2024-05 despite the 'bus' name. Long-standing copy-paste bug from the
  address helper. PR already touched this function for the table=main filter,
  so the fix lands here. Verified by extending TestDefaultLinkByGatewayHelpers_MultiNIC
  with a bus assertion that fails against the old body.
- Renamed physical_links / configurable_links to physical_link_names /
  configurable_link_names. The original names collided too closely with the
  pre-existing physical_links_info helper (which renders a YAML comment
  block), making it unclear which one returned structured data.
- addresses_by_link: tightened scope filter to require an explicit non-empty
  scope on the address. Previously an address with no scope set passed the
  filter unchanged. Test now includes a scope-less address that must be
  rejected.
- routes_by_link: replaced 'default "" .spec.X | toString' with explicit
  kindIs "invalid" guards. The sprig 'default' helper treats numeric zero
  as the empty value, so priority: 0 silently became "" — verified by a
  new fixture entry that round-trips priority 0 as "0".
- mac_by_link / bus_by_link: added defensive 'and $link $link.spec' guard.
  Real Helm lookup returns nil for missing resources so the existing 'if
  $link' was sufficient in practice; the extra check covers test mocks (and
  any future API change) that return an empty map for unknowns. Contract
  pinned by mac_unknown / bus_unknown assertions.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
Claude Code creates per-agent worktrees under .claude/worktrees/. They're
ephemeral and auto-cleaned after the agent finishes, but should never land
in version control if a cleanup is interrupted or skipped.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
.claude/worktrees/ is Claude-Code-specific scratch state and belongs
in a personal ~/.config/git/ignore, not in the project's .gitignore
where it leaks one contributor's tooling onto everyone else.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
When a link exists with a present spec but the specific field is
absent — concretely, bus_by_link "bond0" on the secondary-NIC fixture
because virtual links lack a busPath — the previous `if and $link
$link.spec` guard fell through, dereferenced the missing field, and
`nil | toString` rendered as "<nil>" in the output. The field guard
extends to the field itself so a present-spec / absent-field path
yields the documented empty string instead.

The fixture already exercises this case (bond0 has hardwareAddr but
no busPath); add explicit `bus_bond0`/`mac_bond0` template
inclusions to TestSecondaryNicHelpers so the regression is pinned
directly rather than relying on the global assertNotContains
"<nil>" sweep.

Also clarify the routes_by_link doc comment to call out that it
spans all routing tables (gateway_by_link narrows to table=main,
this helper does not) — saves a future reader a grep.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@lexfrei lexfrei force-pushed the fix/discovery-multi-nic-helpers branch from 28f37e0 to 0bfa594 Compare April 28, 2026 12:50
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
charts/talm/templates/_helpers.tpl (1)

68-72: ⚠️ Potential issue | 🟠 Major

Default-route selection is still family-ambiguous on dual-stack nodes.

These helpers now stop at the first table=main default route, but they still don't constrain the family. If an IPv6 RA default is listed before the IPv4 DHCP default, the charts will deterministically choose the IPv6 link/address path while downstream callers still derive IPv4-oriented config from it (advertisedSubnets / validSubnets, and the legacy 0.0.0.0/0 route block). Please keep the default-route helpers aligned with gateway_by_link and select inet4 explicitly, or split them into IPv4/IPv6 variants and use the IPv4 one from the existing chart paths.

Suggested direction
-{{- if and (eq .spec.dst "") (not (eq .spec.gateway "")) (eq .spec.table "main") }}
+{{- if and (eq .spec.dst "") (not (eq .spec.gateway "")) (eq .spec.table "main") (eq (.spec.family | toString) "inet4") }}

Apply the same guard consistently across the default_*_by_gateway family, including the unchanged talm.discovered.default_gateway helper.

Also applies to: 109-112, 118-130

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/talm/templates/_helpers.tpl` around lines 68 - 72, The default-route
helper loop is family-ambiguous; constrain selection to IPv4 by adding an
explicit family check (eq .spec.family "inet4") so the helper picks the IPv4
default route (or create separate inet4/inet6 variants and use the inet4 one).
Update the conditional in the range where you set $linkName/$family (and the
other occurrences referenced) to include (eq .spec.family "inet4") instead of
leaving family unconstrained, and apply the same guard to the related helpers in
the default_*_by_gateway family and the talm.discovered.default_gateway helper
so callers that expect IPv4 get the IPv4 route.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@charts/talm/templates/_helpers.tpl`:
- Around line 137-140: The helper default_link_selector_by_gateway can emit an
empty "busPath:" because it only checks that lookup "links" returns an object,
not that that object's .spec.busPath is non-empty; update the template to only
render the busPath line when the looked-up link has a non-empty .spec.busPath
(same guard style used in bus_by_link and link_selector_by_name). Specifically,
inside default_link_selector_by_gateway where you call with (lookup "links" ""
.spec.outLinkName), add an additional conditional that checks .spec.busPath (or
use if and not (eq .spec.busPath "")) before emitting the busPath: {{
.spec.busPath }} line so the selector is omitted when busPath is empty.

In `@pkg/engine/render_test.go`:
- Around line 327-328: The legacy smoke tests still render offline via
renderChartTemplate() and assert that "interfaces:" is present, which conflicts
with the new guard logic; update each legacy test
(TestLegacyCozystack_ControlPlane, TestLegacyCozystack_Worker,
TestLegacyGeneric_ControlPlane, TestLegacyGeneric_Worker) to use
renderChartTemplateWithLookup(t, "<chart-path>", "<template-path>",
simpleNicLookup(), "v1.12") (matching the already-updated call pattern) so they
run with the simpleNicLookup() offline discovery and no longer expect the
interfaces: block to be rendered; locate and replace the
renderChartTemplate(...) calls in those test functions with
renderChartTemplateWithLookup(..., simpleNicLookup(), "v1.12") or alternatively
remove the assertion that "interfaces:" is present if you prefer keeping
renderChartTemplate.

---

Outside diff comments:
In `@charts/talm/templates/_helpers.tpl`:
- Around line 68-72: The default-route helper loop is family-ambiguous;
constrain selection to IPv4 by adding an explicit family check (eq .spec.family
"inet4") so the helper picks the IPv4 default route (or create separate
inet4/inet6 variants and use the inet4 one). Update the conditional in the range
where you set $linkName/$family (and the other occurrences referenced) to
include (eq .spec.family "inet4") instead of leaving family unconstrained, and
apply the same guard to the related helpers in the default_*_by_gateway family
and the talm.discovered.default_gateway helper so callers that expect IPv4 get
the IPv4 route.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 65694bdd-e550-4a65-acd4-392853b786f6

📥 Commits

Reviewing files that changed from the base of the PR and between 9add120 and 0bfa594.

📒 Files selected for processing (4)
  • charts/cozystack/templates/_helpers.tpl
  • charts/generic/templates/_helpers.tpl
  • charts/talm/templates/_helpers.tpl
  • pkg/engine/render_test.go

Comment on lines +137 to 140
{{- if and (eq .spec.dst "") (not (eq .spec.gateway "")) (eq .spec.table "main") }}
{{- with (lookup "links" "" .spec.outLinkName) }}
busPath: {{ .spec.busPath }}
{{- break }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

default_link_selector_by_gateway can still emit an empty busPath: selector.

with (lookup "links" "" .spec.outLinkName) is truthy for bond/VLAN links that have a spec but no busPath, so this helper renders busPath: with an empty value instead of returning nothing. That produces an invalid selector fragment for the “default route via virtual link” case. Guard the field itself here the same way bus_by_link and link_selector_by_name already do.

Suggested fix
 {{- define "talm.discovered.default_link_selector_by_gateway" }}
 {{- range (lookup "routes" "" "").items }}
 {{- if and (eq .spec.dst "") (not (eq .spec.gateway "")) (eq .spec.table "main") }}
-{{- with (lookup "links" "" .spec.outLinkName) }}
-busPath: {{ .spec.busPath }}
-{{- break }}
-{{- end }}
+{{- $link := lookup "links" "" .spec.outLinkName }}
+{{- if and $link $link.spec $link.spec.busPath }}
+busPath: {{ $link.spec.busPath }}
+{{- end }}
+{{- break }}
 {{- end }}
 {{- end }}
 {{- end }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{{- if and (eq .spec.dst "") (not (eq .spec.gateway "")) (eq .spec.table "main") }}
{{- with (lookup "links" "" .spec.outLinkName) }}
busPath: {{ .spec.busPath }}
{{- break }}
{{- define "talm.discovered.default_link_selector_by_gateway" }}
{{- range (lookup "routes" "" "").items }}
{{- if and (eq .spec.dst "") (not (eq .spec.gateway "")) (eq .spec.table "main") }}
{{- $link := lookup "links" "" .spec.outLinkName }}
{{- if and $link $link.spec $link.spec.busPath }}
busPath: {{ $link.spec.busPath }}
{{- end }}
{{- break }}
{{- end }}
{{- end }}
{{- end }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/talm/templates/_helpers.tpl` around lines 137 - 140, The helper
default_link_selector_by_gateway can emit an empty "busPath:" because it only
checks that lookup "links" returns an object, not that that object's
.spec.busPath is non-empty; update the template to only render the busPath line
when the looked-up link has a non-empty .spec.busPath (same guard style used in
bus_by_link and link_selector_by_name). Specifically, inside
default_link_selector_by_gateway where you call with (lookup "links" ""
.spec.outLinkName), add an additional conditional that checks .spec.busPath (or
use if and not (eq .spec.busPath "")) before emitting the busPath: {{
.spec.busPath }} line so the selector is omitted when busPath is empty.

Comment thread pkg/engine/render_test.go
Comment on lines +327 to 328
output := renderChartTemplateWithLookup(t, "../../charts/cozystack", "templates/controlplane.yaml", simpleNicLookup(), "v1.12")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Please migrate the legacy smoke tests too.

This switches the multi-doc smoke tests onto simpleNicLookup(), but the four legacy smoke tests above still render offline via renderChartTemplate() while asserting that interfaces: is present. After the new guards in charts/cozystack/templates/_helpers.tpl and charts/generic/templates/_helpers.tpl, offline/no-discovery renders intentionally omit interfaces:, so TestLegacyCozystack_ControlPlane, TestLegacyCozystack_Worker, TestLegacyGeneric_ControlPlane, and TestLegacyGeneric_Worker now contradict the chart behavior and will fail.

Suggested fix
-func TestLegacyCozystack_ControlPlane(t *testing.T) {
-	output := renderChartTemplate(t, "../../charts/cozystack", "templates/controlplane.yaml")
+func TestLegacyCozystack_ControlPlane(t *testing.T) {
+	output := renderChartTemplateWithLookup(t, "../../charts/cozystack", "templates/controlplane.yaml", simpleNicLookup())

Apply the same change to the other legacy smoke tests that still assert discovered interfaces are rendered.

Also applies to: 398-398, 494-495, 522-522

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/engine/render_test.go` around lines 327 - 328, The legacy smoke tests
still render offline via renderChartTemplate() and assert that "interfaces:" is
present, which conflicts with the new guard logic; update each legacy test
(TestLegacyCozystack_ControlPlane, TestLegacyCozystack_Worker,
TestLegacyGeneric_ControlPlane, TestLegacyGeneric_Worker) to use
renderChartTemplateWithLookup(t, "<chart-path>", "<template-path>",
simpleNicLookup(), "v1.12") (matching the already-updated call pattern) so they
run with the simpleNicLookup() offline discovery and no longer expect the
interfaces: block to be rendered; locate and replace the
renderChartTemplate(...) calls in those test functions with
renderChartTemplateWithLookup(..., simpleNicLookup(), "v1.12") or alternatively
remove the assertion that "interfaces:" is present if you prefer keeping
renderChartTemplate.

Copy link
Copy Markdown
Member

@kvaps kvaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Multi-NIC fixes are well-engineered with comprehensive regression tests covering all edge cases (table filter, family filter, missing fields, priority 0, scope filtering).

@kvaps kvaps merged commit cac85a9 into main Apr 28, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect interface name and default route in the node yaml when templating Node default configuration results in invalid yaml through talm template

3 participants