feat(node-data-broker): run broker as main container with health probes by giuliocalzo · Pull Request #368 · NVIDIA/topograph

giuliocalzo · 2026-06-27T14:32:02Z

Description

Replace the node-data-broker chart's init container plus curl sleeper with a single main container running node-data-broker-initc, removing the dependency on the curlimages/curl image for this subchart.

node-data-broker runs node-data-broker-initc as the DaemonSet's main container. The binary applies node annotations once at startup, then serves /healthz on a configurable port (default 8080) until SIGTERM so the pod stays Running.
A startup probe gates liveness/readiness until /healthz serves, giving slow providers (e.g. infiniband ibnetdiscover) up to failureThreshold × periodSeconds (default 5m) to finish the initial apply.
Annotations are re-applied periodically via refreshInterval (default 5m; set to 0 to disable) so node metadata stays current without pod restarts. Failures on refresh are logged only.
The separate init container, the initc values block, the node-data-broker.initImage helper, and the tail -f /dev/null placeholder are removed; initc.extraArgs moves to top-level extraArgs.
Docs (docs/providers/infiniband.md) and helm-unittest suites/snapshots updated to match.

Complements #363 (node-observer in-process health wait).

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
All commits are signed off per DCO (git commit -s).

Test plan

go test ./cmd/node-data-broker-initc/...
make chart-test

github-actions · 2026-06-27T14:34:25Z

🌿 Preview your docs: https://nvidia-preview-pull-request-368.docs.buildwithfern.com/topograph

greptile-apps · 2026-06-27T14:35:38Z

Greptile Summary

This PR consolidates the node-data-broker DaemonSet from an init-container + curl sleeper pattern into a single long-running main container (node-data-broker) that applies node annotations on startup, then keeps the pod Running by serving /healthz. A startup probe gates liveness/readiness until the initial apply completes, and a configurable refresh interval re-applies annotations periodically. The separate Dockerfile.ib is removed and rdma-core (which ships ibnetdiscover on Alpine) is added to the shared Alpine image.

cmd/node-data-broker: New binary that creates the k8s clientset once, applies annotations, starts a periodic-refresh goroutine using sync.Once-protected close to prevent double-panic, and serves /healthz until SIGTERM.
pkg/node_observer: StatusInformer gains a brokerFactory / brokerContainerName pair; process() now gates every topology-generation request behind allBrokerPodsReady(), retrying every defaultBrokerRetryDelay (10 s) until all DaemonSet broker pods report Ready.
Helm chart: Daemonset template replaced init-container block with main-container config including startup/liveness/readiness probes; node-observer configmap template conditionally adds a nodeDataBroker: watch block.

Confidence Score: 5/5

Safe to merge. The refactoring is well-scoped: the binary, Helm chart, and node-observer integration all change consistently, tests cover the new refresh loop and health-server paths, and both previously flagged bugs are addressed.

Both issues raised in the prior review round are resolved: the k8s clientset is now created once in mainInternal (not per-refresh), and the refresh-loop tests use sync.Once to prevent double-close. The allBrokerPodsReady() gating logic is single-goroutine and timer-safe. The Alpine rdma-core package is confirmed to ship ibnetdiscover, so IB functionality is preserved. No correctness gaps found in the new code paths.

No files require special attention. All changed files are consistent with each other and with the broader codebase patterns.

Important Files Changed

Filename	Overview
cmd/node-data-broker/main.go	New binary replacing node-data-broker-initc: creates clientset once, applies annotations, starts periodic refresh with sync.Once-safe done channel, serves /healthz until SIGTERM. Clean structure; clientset is no longer re-created on every refresh.
cmd/node-data-broker/main_test.go	New test file with thorough coverage of getExtras, mergeNodeAnnotations, healthHandler, serveHealth shutdown, and refresh loop behavior. Uses sync.Once to prevent double-close panic in refresh loop tests, addressing the previously flagged issue.
pkg/node_observer/status_informer.go	Adds brokerFactory/brokerContainerName to StatusInformer and gates every topology request behind allBrokerPodsReady(). startBrokerInformer mirrors the API-server informer pattern. Timer handling in process() is consistent with existing retry logic.
pkg/node_observer/config.go	Adds NodeDataBroker config struct and defaults ContainerName when PodSelector is set, consistent with APIServer handling. No validation gaps relative to existing patterns.
charts/topograph/charts/node-data-broker/templates/daemonset.yaml	Replaces init-container + curl sleeper with a main container serving /healthz; startup/liveness/readiness probes correctly configured. failureThreshold and periodSeconds are templated from values.
Dockerfile	Adds rdma-core to the Alpine base image. Alpine's rdma-core package includes ibnetdiscover (confirmed via pkg.alpinelinux.org), correctly replacing the now-deleted Ubuntu Dockerfile.ib.
charts/topograph/charts/node-observer/templates/configmap.yml	Conditionally emits nodeDataBroker block in the node-observer configmap when enabled; uses release-scoped matchLabels by default or user-supplied podSelector.
.github/workflows/docker-ib.yml	Removed the separate IB Docker workflow; its functionality is now covered by the main Docker workflow since rdma-core (with ibnetdiscover) is bundled into the Alpine image.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant K8s as Kubernetes
    participant NDB as node-data-broker pod
    participant SP as Startup Probe
    participant NO as node-observer
    participant API as topograph API

    K8s->>NDB: Schedule DaemonSet pod
    Note over NDB: broker.apply(ctx) calls provider
    SP-->>NDB: /healthz fails (not yet serving)
    NDB->>K8s: Patch node annotations
    NDB->>NDB: Start /healthz server
    NDB->>NDB: Start refreshNodeAnnotations goroutine
    SP-->>NDB: /healthz 200 OK (startup probe passes)
    Note over K8s: Pod becomes Ready
    K8s->>NO: brokerFactory informer pod Added/Updated
    NO->>NO: sendRequest to queue
    NO->>NO: process allBrokerPodsReady true
    NO->>API: POST /v1/generate topology request
    loop Every refreshInterval default 5m
        NDB->>K8s: Re-patch node annotations
    end
    K8s->>NDB: SIGTERM
    NDB->>NDB: Graceful shutdown 5s timeout

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant K8s as Kubernetes
    participant NDB as node-data-broker pod
    participant SP as Startup Probe
    participant NO as node-observer
    participant API as topograph API

    K8s->>NDB: Schedule DaemonSet pod
    Note over NDB: broker.apply(ctx) calls provider
    SP-->>NDB: /healthz fails (not yet serving)
    NDB->>K8s: Patch node annotations
    NDB->>NDB: Start /healthz server
    NDB->>NDB: Start refreshNodeAnnotations goroutine
    SP-->>NDB: /healthz 200 OK (startup probe passes)
    Note over K8s: Pod becomes Ready
    K8s->>NO: brokerFactory informer pod Added/Updated
    NO->>NO: sendRequest to queue
    NO->>NO: process allBrokerPodsReady true
    NO->>API: POST /v1/generate topology request
    loop Every refreshInterval default 5m
        NDB->>K8s: Re-patch node annotations
    end
    K8s->>NDB: SIGTERM
    NDB->>NDB: Graceful shutdown 5s timeout

_{Reviews (4): Last reviewed commit: "feat(node-data-broker): rename binary an..." | Re-trigger Greptile}

Reuse a single in-cluster clientset for startup and periodic annotation refresh instead of constructing one on every apply. Fix a race in refresh-loop tests where a third ticker tick could close the done channel twice; gate shutdown with sync.Once and cancel the loop immediately when the test condition is met. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>

dmitsh · 2026-06-27T18:01:27Z

@giuliocalzo , there is one problem with this implementation.
In most cases, Topograph indeed uses curlimages/curl image for node-data-broker.
But for the InfiniBand-based deployments, it requires ghcr.io/nvidia/topograph/ib image instead.
This PR (as implemented now) will break IB clusters.

giuliocalzo · 2026-06-27T19:11:55Z

@dmitsh Good catch — you're right that #368 breaks IB clusters as-is if we drop the /ib image override without replacing what it provides.

I think we can fix this by baking IB tooling into the main Alpine image instead of maintaining a separate ghcr.io/nvidia/topograph/ib:

RUN apk add --no-cache rdma-core

On Alpine, rdma-core already includes ibnetdiscover (there is no separate infiniband-diags package like on Ubuntu). That gives node-data-broker both node-data-broker-initc and the binary Topograph execs into for infiniband-k8s, so IB example values could go back to the default ghcr.io/nvidia/topograph image.

Follow-up would be: delete Dockerfile.ib + the docker-ib workflow, update the IB Helm examples/snapshots, and smoke-test ibnetdiscover in a privileged broker pod on IB hardware (same /sys/class mount as today).

Does that approach work for you, or is there something the Ubuntu /ib image provides beyond ibnetdiscover that we'd lose on Alpine (e.g. ibutils)?

copy-pr-bot · 2026-06-27T19:12:12Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Reuse a single in-cluster clientset for startup and periodic annotation refresh instead of constructing one on every apply. Fix a race in refresh-loop tests where a third ticker tick could close the done channel twice; gate shutdown with sync.Once and cancel the loop immediately when the test condition is met. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>

giuliocalzo · 2026-06-28T13:49:48Z

@dmitsh Follow-up on the IB image concern — implemented in the latest commits.

Docker (d0214af)

Main Dockerfile now installs Alpine rdma-core (ships ibnetdiscover)
Removed Dockerfile.ib and the docker-ib workflow
IB/GCP Helm examples no longer override ghcr.io/nvidia/topograph/ib; they use the default ghcr.io/nvidia/topograph image

Docs (0ea7c82)

docs/providers/infiniband.md documents that the default image includes ibnetdiscover and that IB deployments still need privileged broker + /sys/class mount (see values.k8s.ib-example.yaml)
Stale references to the Ubuntu /ib variant removed from chart values comments and k8s docs

So node-data-broker now runs the same Topograph image for all providers, including infiniband-k8s. Remaining validation on real IB hardware: confirm ibnetdiscover works in a privileged broker pod with the existing host mounts.

Let me know if anything from the old Ubuntu /ib image (e.g. ibutils) is still required on Alpine.

dmitsh · 2026-06-28T18:36:13Z

Hi @giuliocalzo ,
I have two small comments.

Should we rename cmd/node-data-broker-initc to cmd/node-data-broker, since we got rid of the init container?
There might be some concurrency issue, when Topograph is reading topology node annotations before node-data-broker applies them. Maybe we should wait until node-data-broker's pods are ready? Just keep in mind that node-data-broker is optional, so it might be not present.

IMO, having an init container was more clean way to implement. It also allows to use different image in the future, if we need to support new switch vendor.

giuliocalzo · 2026-06-28T19:28:19Z

@dmitsh Thanks for the follow-up.

1. Rename cmd/node-data-broker-initc

Agreed — with the init container gone, the -initc suffix is misleading. I'll rename to cmd/node-data-broker (and the binary to node-data-broker) in a follow-up commit on this PR (even in this one).

2. Startup race (Topograph reading annotations before broker applies them)

Good point. I'm happy to add a wait until node-data-broker pods are Ready before the first topology request, with the caveat that node-data-broker is optional — when the subchart is disabled or not deployed, Topograph should proceed as today without blocking.

3. Init container vs single main container

My preference is the single long-running container model: node-data-broker applies annotations, serves /healthz, and refreshes periodically — closer to a small operator than a one-shot init + placeholder pod. That keeps lifecycle, probes, and retries in one place and removes the curlimages/curl sleeper.

That said, I'm flexible on the shape. The init-container pattern does make it easier to swap images per vendor without touching the main runtime image. If we want to preserve that flexibility long term, we could revisit — but for now the unified image (rdma-core in the main Alpine build) plus optional image override in Helm values still covers IB without a separate /ib publish path.

I'll proceed with the rename and the broker-readiness gate unless you'd prefer to keep the init-container design for this PR.

Replace the init container plus curl sleeper with a single container running node-data-broker-initc. The binary applies node annotations, serves /healthz, and re-applies annotations on a configurable refreshInterval (default 5m). Add a startup probe so slow providers can finish before liveness kicks in, move initc.extraArgs to top-level extraArgs, and update infiniband docs and helm-unittest coverage. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>

Reuse a single in-cluster clientset for startup and periodic annotation refresh instead of constructing one on every apply. Fix a race in refresh-loop tests where a third ticker tick could close the done channel twice; gate shutdown with sync.Once and cancel the loop immediately when the test condition is met. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>

Install rdma-core in the Alpine runtime image so node-data-broker and infiniband-k8s no longer need ghcr.io/nvidia/topograph/ib. Remove Dockerfile.ib, the docker-ib workflow, and /ib overrides from Helm examples and docs. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>

Document that the default topograph image includes ibnetdiscover via rdma-core, fix the node-data-broker init-container wording, and remove the obsolete IB/ubuntu variant note from chart values comments. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>

…diness Rename cmd/node-data-broker-initc to cmd/node-data-broker and teach node-observer to wait for broker DaemonSet pods to become Ready before the first topology request. The broker watch is optional via Helm when the subchart is not deployed. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>

giuliocalzo requested review from dmitsh and ravisoundar as code owners June 27, 2026 14:32

giuliocalzo mentioned this pull request Jun 27, 2026

feat(node-observer): wait for topograph health in-process #363

Closed

6 tasks

greptile-apps Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread cmd/node-data-broker/main_test.go

Comment thread cmd/node-data-broker-initc/main.go Outdated

giuliocalzo marked this pull request as draft June 27, 2026 19:12

giuliocalzo force-pushed the feat/node-data-broker-main-container branch from 12f1918 to dbaa0fc Compare June 27, 2026 19:16

giuliocalzo force-pushed the feat/node-data-broker-main-container branch from b45abfe to 869c9f9 Compare June 27, 2026 19:27

giuliocalzo force-pushed the feat/node-data-broker-main-container branch from 869c9f9 to 57e4ff7 Compare June 27, 2026 19:29

greptile-apps Bot mentioned this pull request Jun 28, 2026

docs: add CHANGELOG and wire it into agent guidance #369

Merged

2 tasks

giuliocalzo marked this pull request as ready for review June 28, 2026 13:47

giuliocalzo added 5 commits June 29, 2026 20:52

giuliocalzo force-pushed the feat/node-data-broker-main-container branch from 0ea7c82 to 7edb997 Compare June 29, 2026 18:52

dmitsh approved these changes Jun 29, 2026

View reviewed changes

giuliocalzo merged commit 7f4d16c into NVIDIA:main Jun 30, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(node-data-broker): run broker as main container with health probes#368

feat(node-data-broker): run broker as main container with health probes#368
giuliocalzo merged 5 commits into
NVIDIA:mainfrom
giuliocalzo:feat/node-data-broker-main-container

giuliocalzo commented Jun 27, 2026

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

greptile-apps Bot commented Jun 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

dmitsh commented Jun 27, 2026

Uh oh!

giuliocalzo commented Jun 27, 2026

Uh oh!

copy-pr-bot Bot commented Jun 27, 2026

Uh oh!

giuliocalzo commented Jun 28, 2026

Uh oh!

dmitsh commented Jun 28, 2026 •

edited

Loading

Uh oh!

giuliocalzo commented Jun 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

giuliocalzo commented Jun 27, 2026

Description

Checklist

Test plan

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

greptile-apps Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

dmitsh commented Jun 27, 2026

Uh oh!

giuliocalzo commented Jun 27, 2026

Uh oh!

copy-pr-bot Bot commented Jun 27, 2026

Uh oh!

giuliocalzo commented Jun 28, 2026

Uh oh!

dmitsh commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giuliocalzo commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Jun 27, 2026 •

edited

Loading

dmitsh commented Jun 28, 2026 •

edited

Loading

giuliocalzo commented Jun 28, 2026 •

edited

Loading