Skip to content

feat(node-data-broker): run broker as main container with health probes#368

Merged
giuliocalzo merged 5 commits into
NVIDIA:mainfrom
giuliocalzo:feat/node-data-broker-main-container
Jun 30, 2026
Merged

feat(node-data-broker): run broker as main container with health probes#368
giuliocalzo merged 5 commits into
NVIDIA:mainfrom
giuliocalzo:feat/node-data-broker-main-container

Conversation

@giuliocalzo

Copy link
Copy Markdown
Collaborator

Description

Replace the node-data-broker chart's init container plus curl sleeper with a single main container running node-data-broker-initc, removing the dependency on the curlimages/curl image for this subchart.

  • node-data-broker runs node-data-broker-initc as the DaemonSet's main container. The binary applies node annotations once at startup, then serves /healthz on a configurable port (default 8080) until SIGTERM so the pod stays Running.
  • A startup probe gates liveness/readiness until /healthz serves, giving slow providers (e.g. infiniband ibnetdiscover) up to failureThreshold × periodSeconds (default 5m) to finish the initial apply.
  • Annotations are re-applied periodically via refreshInterval (default 5m; set to 0 to disable) so node metadata stays current without pod restarts. Failures on refresh are logged only.
  • The separate init container, the initc values block, the node-data-broker.initImage helper, and the tail -f /dev/null placeholder are removed; initc.extraArgs moves to top-level extraArgs.
  • Docs (docs/providers/infiniband.md) and helm-unittest suites/snapshots updated to match.

Complements #363 (node-observer in-process health wait).

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • All commits are signed off per DCO (git commit -s).

Test plan

  • go test ./cmd/node-data-broker-initc/...
  • make chart-test

@github-actions

Copy link
Copy Markdown
Contributor

@greptile-apps

greptile-apps Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR consolidates the node-data-broker DaemonSet from an init-container + curl sleeper pattern into a single long-running main container (node-data-broker) that applies node annotations on startup, then keeps the pod Running by serving /healthz. A startup probe gates liveness/readiness until the initial apply completes, and a configurable refresh interval re-applies annotations periodically. The separate Dockerfile.ib is removed and rdma-core (which ships ibnetdiscover on Alpine) is added to the shared Alpine image.

  • cmd/node-data-broker: New binary that creates the k8s clientset once, applies annotations, starts a periodic-refresh goroutine using sync.Once-protected close to prevent double-panic, and serves /healthz until SIGTERM.
  • pkg/node_observer: StatusInformer gains a brokerFactory / brokerContainerName pair; process() now gates every topology-generation request behind allBrokerPodsReady(), retrying every defaultBrokerRetryDelay (10 s) until all DaemonSet broker pods report Ready.
  • Helm chart: Daemonset template replaced init-container block with main-container config including startup/liveness/readiness probes; node-observer configmap template conditionally adds a nodeDataBroker: watch block.

Confidence Score: 5/5

Safe to merge. The refactoring is well-scoped: the binary, Helm chart, and node-observer integration all change consistently, tests cover the new refresh loop and health-server paths, and both previously flagged bugs are addressed.

Both issues raised in the prior review round are resolved: the k8s clientset is now created once in mainInternal (not per-refresh), and the refresh-loop tests use sync.Once to prevent double-close. The allBrokerPodsReady() gating logic is single-goroutine and timer-safe. The Alpine rdma-core package is confirmed to ship ibnetdiscover, so IB functionality is preserved. No correctness gaps found in the new code paths.

No files require special attention. All changed files are consistent with each other and with the broader codebase patterns.

Important Files Changed

Filename Overview
cmd/node-data-broker/main.go New binary replacing node-data-broker-initc: creates clientset once, applies annotations, starts periodic refresh with sync.Once-safe done channel, serves /healthz until SIGTERM. Clean structure; clientset is no longer re-created on every refresh.
cmd/node-data-broker/main_test.go New test file with thorough coverage of getExtras, mergeNodeAnnotations, healthHandler, serveHealth shutdown, and refresh loop behavior. Uses sync.Once to prevent double-close panic in refresh loop tests, addressing the previously flagged issue.
pkg/node_observer/status_informer.go Adds brokerFactory/brokerContainerName to StatusInformer and gates every topology request behind allBrokerPodsReady(). startBrokerInformer mirrors the API-server informer pattern. Timer handling in process() is consistent with existing retry logic.
pkg/node_observer/config.go Adds NodeDataBroker config struct and defaults ContainerName when PodSelector is set, consistent with APIServer handling. No validation gaps relative to existing patterns.
charts/topograph/charts/node-data-broker/templates/daemonset.yaml Replaces init-container + curl sleeper with a main container serving /healthz; startup/liveness/readiness probes correctly configured. failureThreshold and periodSeconds are templated from values.
Dockerfile Adds rdma-core to the Alpine base image. Alpine's rdma-core package includes ibnetdiscover (confirmed via pkg.alpinelinux.org), correctly replacing the now-deleted Ubuntu Dockerfile.ib.
charts/topograph/charts/node-observer/templates/configmap.yml Conditionally emits nodeDataBroker block in the node-observer configmap when enabled; uses release-scoped matchLabels by default or user-supplied podSelector.
.github/workflows/docker-ib.yml Removed the separate IB Docker workflow; its functionality is now covered by the main Docker workflow since rdma-core (with ibnetdiscover) is bundled into the Alpine image.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant K8s as Kubernetes
    participant NDB as node-data-broker pod
    participant SP as Startup Probe
    participant NO as node-observer
    participant API as topograph API

    K8s->>NDB: Schedule DaemonSet pod
    Note over NDB: broker.apply(ctx) calls provider
    SP-->>NDB: /healthz fails (not yet serving)
    NDB->>K8s: Patch node annotations
    NDB->>NDB: Start /healthz server
    NDB->>NDB: Start refreshNodeAnnotations goroutine
    SP-->>NDB: /healthz 200 OK (startup probe passes)
    Note over K8s: Pod becomes Ready
    K8s->>NO: brokerFactory informer pod Added/Updated
    NO->>NO: sendRequest to queue
    NO->>NO: process allBrokerPodsReady true
    NO->>API: POST /v1/generate topology request
    loop Every refreshInterval default 5m
        NDB->>K8s: Re-patch node annotations
    end
    K8s->>NDB: SIGTERM
    NDB->>NDB: Graceful shutdown 5s timeout
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant K8s as Kubernetes
    participant NDB as node-data-broker pod
    participant SP as Startup Probe
    participant NO as node-observer
    participant API as topograph API

    K8s->>NDB: Schedule DaemonSet pod
    Note over NDB: broker.apply(ctx) calls provider
    SP-->>NDB: /healthz fails (not yet serving)
    NDB->>K8s: Patch node annotations
    NDB->>NDB: Start /healthz server
    NDB->>NDB: Start refreshNodeAnnotations goroutine
    SP-->>NDB: /healthz 200 OK (startup probe passes)
    Note over K8s: Pod becomes Ready
    K8s->>NO: brokerFactory informer pod Added/Updated
    NO->>NO: sendRequest to queue
    NO->>NO: process allBrokerPodsReady true
    NO->>API: POST /v1/generate topology request
    loop Every refreshInterval default 5m
        NDB->>K8s: Re-patch node annotations
    end
    K8s->>NDB: SIGTERM
    NDB->>NDB: Graceful shutdown 5s timeout
Loading

Reviews (4): Last reviewed commit: "feat(node-data-broker): rename binary an..." | Re-trigger Greptile

Comment thread cmd/node-data-broker/main_test.go
Comment thread cmd/node-data-broker-initc/main.go Outdated
giuliocalzo added a commit to giuliocalzo/topograph that referenced this pull request Jun 27, 2026
Reuse a single in-cluster clientset for startup and periodic annotation
refresh instead of constructing one on every apply.

Fix a race in refresh-loop tests where a third ticker tick could close the
done channel twice; gate shutdown with sync.Once and cancel the loop
immediately when the test condition is met.

Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
@dmitsh

dmitsh commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

@giuliocalzo , there is one problem with this implementation.
In most cases, Topograph indeed uses curlimages/curl image for node-data-broker.
But for the InfiniBand-based deployments, it requires ghcr.io/nvidia/topograph/ib image instead.
This PR (as implemented now) will break IB clusters.

@giuliocalzo

Copy link
Copy Markdown
Collaborator Author

@dmitsh Good catch — you're right that #368 breaks IB clusters as-is if we drop the /ib image override without replacing what it provides.

I think we can fix this by baking IB tooling into the main Alpine image instead of maintaining a separate ghcr.io/nvidia/topograph/ib:

RUN apk add --no-cache rdma-core

On Alpine, rdma-core already includes ibnetdiscover (there is no separate infiniband-diags package like on Ubuntu). That gives node-data-broker both node-data-broker-initc and the binary Topograph execs into for infiniband-k8s, so IB example values could go back to the default ghcr.io/nvidia/topograph image.

Follow-up would be: delete Dockerfile.ib + the docker-ib workflow, update the IB Helm examples/snapshots, and smoke-test ibnetdiscover in a privileged broker pod on IB hardware (same /sys/class mount as today).

Does that approach work for you, or is there something the Ubuntu /ib image provides beyond ibnetdiscover that we'd lose on Alpine (e.g. ibutils)?

@giuliocalzo giuliocalzo marked this pull request as draft June 27, 2026 19:12
@copy-pr-bot

copy-pr-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

giuliocalzo added a commit to giuliocalzo/topograph that referenced this pull request Jun 27, 2026
Reuse a single in-cluster clientset for startup and periodic annotation
refresh instead of constructing one on every apply.

Fix a race in refresh-loop tests where a third ticker tick could close the
done channel twice; gate shutdown with sync.Once and cancel the loop
immediately when the test condition is met.

Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
@giuliocalzo giuliocalzo force-pushed the feat/node-data-broker-main-container branch from 12f1918 to dbaa0fc Compare June 27, 2026 19:16
giuliocalzo added a commit to giuliocalzo/topograph that referenced this pull request Jun 27, 2026
Reuse a single in-cluster clientset for startup and periodic annotation
refresh instead of constructing one on every apply.

Fix a race in refresh-loop tests where a third ticker tick could close the
done channel twice; gate shutdown with sync.Once and cancel the loop
immediately when the test condition is met.

Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
@giuliocalzo giuliocalzo force-pushed the feat/node-data-broker-main-container branch from b45abfe to 869c9f9 Compare June 27, 2026 19:27
giuliocalzo added a commit to giuliocalzo/topograph that referenced this pull request Jun 27, 2026
Reuse a single in-cluster clientset for startup and periodic annotation
refresh instead of constructing one on every apply.

Fix a race in refresh-loop tests where a third ticker tick could close the
done channel twice; gate shutdown with sync.Once and cancel the loop
immediately when the test condition is met.

Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
@giuliocalzo giuliocalzo force-pushed the feat/node-data-broker-main-container branch from 869c9f9 to 57e4ff7 Compare June 27, 2026 19:29
@giuliocalzo giuliocalzo marked this pull request as ready for review June 28, 2026 13:47
@giuliocalzo

Copy link
Copy Markdown
Collaborator Author

@dmitsh Follow-up on the IB image concern — implemented in the latest commits.

Docker (d0214af)

  • Main Dockerfile now installs Alpine rdma-core (ships ibnetdiscover)
  • Removed Dockerfile.ib and the docker-ib workflow
  • IB/GCP Helm examples no longer override ghcr.io/nvidia/topograph/ib; they use the default ghcr.io/nvidia/topograph image

Docs (0ea7c82)

  • docs/providers/infiniband.md documents that the default image includes ibnetdiscover and that IB deployments still need privileged broker + /sys/class mount (see values.k8s.ib-example.yaml)
  • Stale references to the Ubuntu /ib variant removed from chart values comments and k8s docs

So node-data-broker now runs the same Topograph image for all providers, including infiniband-k8s. Remaining validation on real IB hardware: confirm ibnetdiscover works in a privileged broker pod with the existing host mounts.

Let me know if anything from the old Ubuntu /ib image (e.g. ibutils) is still required on Alpine.

@dmitsh

dmitsh commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Hi @giuliocalzo ,
I have two small comments.

  1. Should we rename cmd/node-data-broker-initc to cmd/node-data-broker, since we got rid of the init container?
  2. There might be some concurrency issue, when Topograph is reading topology node annotations before node-data-broker applies them. Maybe we should wait until node-data-broker's pods are ready? Just keep in mind that node-data-broker is optional, so it might be not present.

IMO, having an init container was more clean way to implement. It also allows to use different image in the future, if we need to support new switch vendor.

@giuliocalzo

giuliocalzo commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator Author

@dmitsh Thanks for the follow-up.

1. Rename cmd/node-data-broker-initc

Agreed — with the init container gone, the -initc suffix is misleading. I'll rename to cmd/node-data-broker (and the binary to node-data-broker) in a follow-up commit on this PR (even in this one).

2. Startup race (Topograph reading annotations before broker applies them)

Good point. I'm happy to add a wait until node-data-broker pods are Ready before the first topology request, with the caveat that node-data-broker is optional — when the subchart is disabled or not deployed, Topograph should proceed as today without blocking.

3. Init container vs single main container

My preference is the single long-running container model: node-data-broker applies annotations, serves /healthz, and refreshes periodically — closer to a small operator than a one-shot init + placeholder pod. That keeps lifecycle, probes, and retries in one place and removes the curlimages/curl sleeper.

That said, I'm flexible on the shape. The init-container pattern does make it easier to swap images per vendor without touching the main runtime image. If we want to preserve that flexibility long term, we could revisit — but for now the unified image (rdma-core in the main Alpine build) plus optional image override in Helm values still covers IB without a separate /ib publish path.

I'll proceed with the rename and the broker-readiness gate unless you'd prefer to keep the init-container design for this PR.

Replace the init container plus curl sleeper with a single container running
node-data-broker-initc. The binary applies node annotations, serves /healthz,
and re-applies annotations on a configurable refreshInterval (default 5m).
Add a startup probe so slow providers can finish before liveness kicks in,
move initc.extraArgs to top-level extraArgs, and update infiniband docs and
helm-unittest coverage.

Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
Reuse a single in-cluster clientset for startup and periodic annotation
refresh instead of constructing one on every apply.

Fix a race in refresh-loop tests where a third ticker tick could close the
done channel twice; gate shutdown with sync.Once and cancel the loop
immediately when the test condition is met.

Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
Install rdma-core in the Alpine runtime image so node-data-broker and
infiniband-k8s no longer need ghcr.io/nvidia/topograph/ib. Remove
Dockerfile.ib, the docker-ib workflow, and /ib overrides from Helm
examples and docs.

Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
Document that the default topograph image includes ibnetdiscover via
rdma-core, fix the node-data-broker init-container wording, and remove
the obsolete IB/ubuntu variant note from chart values comments.

Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
…diness

Rename cmd/node-data-broker-initc to cmd/node-data-broker and teach
node-observer to wait for broker DaemonSet pods to become Ready before
the first topology request. The broker watch is optional via Helm when
the subchart is not deployed.

Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
@giuliocalzo giuliocalzo force-pushed the feat/node-data-broker-main-container branch from 0ea7c82 to 7edb997 Compare June 29, 2026 18:52
@giuliocalzo giuliocalzo merged commit 7f4d16c into NVIDIA:main Jun 30, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants