Resource Listener#281
Merged
ethany-nv merged 50 commits intofeature/PROJ-147-operator-redesignfrom Feb 2, 2026
Merged
Conversation
- Introduced new message types: ResourceBody, ResourceUsageBody, and DeleteResourceBody in messages.proto. - Added ResourceListenerStream RPC in services.proto for handling node events and resource usage.
- Implemented multiple test cases for ResourceListenerStream, covering scenarios such as handling resource messages, resource usage messages, delete resource messages, and error handling. - Verified ACK message sending for different message types and ensured proper error handling for EOF and context cancellation. - Added tests for cases with missing or empty backend name metadata to ensure correct rejection of streams.
…e_listener_golang
…SMO into ethany/resource_listener_golang
xutongNV
reviewed
Jan 30, 2026
xutongNV
reviewed
Jan 30, 2026
xutongNV
reviewed
Jan 30, 2026
xutongNV
previously approved these changes
Jan 30, 2026
RyaliNvidia
previously approved these changes
Jan 30, 2026
… handling Replace manual signal handling with signal.NotifyContext() for cleaner shutdown coordination. Use channel closure instead of separate done channels to signal watcher completion, allowing channels to communicate their own state. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…o functions that need it
fernandol-nvidia
approved these changes
Feb 2, 2026
xutongNV
approved these changes
Feb 2, 2026
a0347bd
into
feature/PROJ-147-operator-redesign
10 of 12 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This re-implementation of resource listening uses a more efficient way to aggregate resource usage:
Pod resource usage is computed incrementally from informer deltas: when a pod first transitions into Running (with a node assignment), its requested resources are added to per‑node totals; when it transitions out to a terminal phase or is deleted, its contribution is subtracted. Because pod requests and node placement are immutable, repeated updates for the same UID are ignored. Contributions are derived from container requests (CPU millicores, memory/ephemeral storage Ki, GPUs) and tracked both as overall usage and non‑workflow usage; dirty nodes are flushed on a debounce interval, and any watch gaps trigger a full rebuild from the informer cache.
Issue #329
Checklist