feat(node-data-broker): run broker as main container with health probes#368
Conversation
|
🌿 Preview your docs: https://nvidia-preview-pull-request-368.docs.buildwithfern.com/topograph |
Greptile SummaryThis PR consolidates the node-data-broker DaemonSet from an init-container + curl sleeper pattern into a single long-running main container (
Confidence Score: 5/5Safe to merge. The refactoring is well-scoped: the binary, Helm chart, and node-observer integration all change consistently, tests cover the new refresh loop and health-server paths, and both previously flagged bugs are addressed. Both issues raised in the prior review round are resolved: the k8s clientset is now created once in mainInternal (not per-refresh), and the refresh-loop tests use sync.Once to prevent double-close. The allBrokerPodsReady() gating logic is single-goroutine and timer-safe. The Alpine rdma-core package is confirmed to ship ibnetdiscover, so IB functionality is preserved. No correctness gaps found in the new code paths. No files require special attention. All changed files are consistent with each other and with the broader codebase patterns. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant K8s as Kubernetes
participant NDB as node-data-broker pod
participant SP as Startup Probe
participant NO as node-observer
participant API as topograph API
K8s->>NDB: Schedule DaemonSet pod
Note over NDB: broker.apply(ctx) calls provider
SP-->>NDB: /healthz fails (not yet serving)
NDB->>K8s: Patch node annotations
NDB->>NDB: Start /healthz server
NDB->>NDB: Start refreshNodeAnnotations goroutine
SP-->>NDB: /healthz 200 OK (startup probe passes)
Note over K8s: Pod becomes Ready
K8s->>NO: brokerFactory informer pod Added/Updated
NO->>NO: sendRequest to queue
NO->>NO: process allBrokerPodsReady true
NO->>API: POST /v1/generate topology request
loop Every refreshInterval default 5m
NDB->>K8s: Re-patch node annotations
end
K8s->>NDB: SIGTERM
NDB->>NDB: Graceful shutdown 5s timeout
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant K8s as Kubernetes
participant NDB as node-data-broker pod
participant SP as Startup Probe
participant NO as node-observer
participant API as topograph API
K8s->>NDB: Schedule DaemonSet pod
Note over NDB: broker.apply(ctx) calls provider
SP-->>NDB: /healthz fails (not yet serving)
NDB->>K8s: Patch node annotations
NDB->>NDB: Start /healthz server
NDB->>NDB: Start refreshNodeAnnotations goroutine
SP-->>NDB: /healthz 200 OK (startup probe passes)
Note over K8s: Pod becomes Ready
K8s->>NO: brokerFactory informer pod Added/Updated
NO->>NO: sendRequest to queue
NO->>NO: process allBrokerPodsReady true
NO->>API: POST /v1/generate topology request
loop Every refreshInterval default 5m
NDB->>K8s: Re-patch node annotations
end
K8s->>NDB: SIGTERM
NDB->>NDB: Graceful shutdown 5s timeout
Reviews (4): Last reviewed commit: "feat(node-data-broker): rename binary an..." | Re-trigger Greptile |
Reuse a single in-cluster clientset for startup and periodic annotation refresh instead of constructing one on every apply. Fix a race in refresh-loop tests where a third ticker tick could close the done channel twice; gate shutdown with sync.Once and cancel the loop immediately when the test condition is met. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
|
@giuliocalzo , there is one problem with this implementation. |
|
@dmitsh Good catch — you're right that #368 breaks IB clusters as-is if we drop the I think we can fix this by baking IB tooling into the main Alpine image instead of maintaining a separate RUN apk add --no-cache rdma-coreOn Alpine, Follow-up would be: delete Does that approach work for you, or is there something the Ubuntu |
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Reuse a single in-cluster clientset for startup and periodic annotation refresh instead of constructing one on every apply. Fix a race in refresh-loop tests where a third ticker tick could close the done channel twice; gate shutdown with sync.Once and cancel the loop immediately when the test condition is met. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
12f1918 to
dbaa0fc
Compare
Reuse a single in-cluster clientset for startup and periodic annotation refresh instead of constructing one on every apply. Fix a race in refresh-loop tests where a third ticker tick could close the done channel twice; gate shutdown with sync.Once and cancel the loop immediately when the test condition is met. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
b45abfe to
869c9f9
Compare
Reuse a single in-cluster clientset for startup and periodic annotation refresh instead of constructing one on every apply. Fix a race in refresh-loop tests where a third ticker tick could close the done channel twice; gate shutdown with sync.Once and cancel the loop immediately when the test condition is met. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
869c9f9 to
57e4ff7
Compare
|
@dmitsh Follow-up on the IB image concern — implemented in the latest commits. Docker (
Docs (
So node-data-broker now runs the same Topograph image for all providers, including Let me know if anything from the old Ubuntu |
|
Hi @giuliocalzo ,
IMO, having an init container was more clean way to implement. It also allows to use different image in the future, if we need to support new switch vendor. |
|
@dmitsh Thanks for the follow-up. 1. Rename Agreed — with the init container gone, the 2. Startup race (Topograph reading annotations before broker applies them) Good point. I'm happy to add a wait until node-data-broker pods are Ready before the first topology request, with the caveat that node-data-broker is optional — when the subchart is disabled or not deployed, Topograph should proceed as today without blocking. 3. Init container vs single main container My preference is the single long-running container model: node-data-broker applies annotations, serves That said, I'm flexible on the shape. The init-container pattern does make it easier to swap images per vendor without touching the main runtime image. If we want to preserve that flexibility long term, we could revisit — but for now the unified image ( I'll proceed with the rename and the broker-readiness gate unless you'd prefer to keep the init-container design for this PR. |
Replace the init container plus curl sleeper with a single container running node-data-broker-initc. The binary applies node annotations, serves /healthz, and re-applies annotations on a configurable refreshInterval (default 5m). Add a startup probe so slow providers can finish before liveness kicks in, move initc.extraArgs to top-level extraArgs, and update infiniband docs and helm-unittest coverage. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
Reuse a single in-cluster clientset for startup and periodic annotation refresh instead of constructing one on every apply. Fix a race in refresh-loop tests where a third ticker tick could close the done channel twice; gate shutdown with sync.Once and cancel the loop immediately when the test condition is met. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
Install rdma-core in the Alpine runtime image so node-data-broker and infiniband-k8s no longer need ghcr.io/nvidia/topograph/ib. Remove Dockerfile.ib, the docker-ib workflow, and /ib overrides from Helm examples and docs. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
Document that the default topograph image includes ibnetdiscover via rdma-core, fix the node-data-broker init-container wording, and remove the obsolete IB/ubuntu variant note from chart values comments. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
…diness Rename cmd/node-data-broker-initc to cmd/node-data-broker and teach node-observer to wait for broker DaemonSet pods to become Ready before the first topology request. The broker watch is optional via Helm when the subchart is not deployed. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
0ea7c82 to
7edb997
Compare
Description
Replace the node-data-broker chart's init container plus curl sleeper with a single main container running
node-data-broker-initc, removing the dependency on thecurlimages/curlimage for this subchart.node-data-broker-initcas the DaemonSet's main container. The binary applies node annotations once at startup, then serves/healthzon a configurableport(default 8080) until SIGTERM so the pod stays Running./healthzserves, giving slow providers (e.g. infinibandibnetdiscover) up tofailureThreshold × periodSeconds(default 5m) to finish the initial apply.refreshInterval(default 5m; set to0to disable) so node metadata stays current without pod restarts. Failures on refresh are logged only.initcvalues block, thenode-data-broker.initImagehelper, and thetail -f /dev/nullplaceholder are removed;initc.extraArgsmoves to top-levelextraArgs.docs/providers/infiniband.md) and helm-unittest suites/snapshots updated to match.Complements #363 (node-observer in-process health wait).
Checklist
git commit -s).Test plan
go test ./cmd/node-data-broker-initc/...make chart-test