Skip to content

Resource Listener#281

Merged
ethany-nv merged 50 commits intofeature/PROJ-147-operator-redesignfrom
ethany/resource_listener_golang
Feb 2, 2026
Merged

Resource Listener#281
ethany-nv merged 50 commits intofeature/PROJ-147-operator-redesignfrom
ethany/resource_listener_golang

Conversation

@ethany-nv
Copy link
Collaborator

@ethany-nv ethany-nv commented Jan 23, 2026

Description

This re-implementation of resource listening uses a more efficient way to aggregate resource usage:

Pod resource usage is computed incrementally from informer deltas: when a pod first transitions into Running (with a node assignment), its requested resources are added to per‑node totals; when it transitions out to a terminal phase or is deleted, its contribution is subtracted. Because pod requests and node placement are immutable, repeated updates for the same UID are ignored. Contributions are derived from container requests (CPU millicores, memory/ephemeral storage Ki, GPUs) and tracked both as overall usage and non‑workflow usage; dirty nodes are flushed on a debounce interval, and any watch gaps trigger a full rebuild from the informer cache.

Issue #329

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

- Introduced new message types: ResourceBody, ResourceUsageBody, and DeleteResourceBody in messages.proto.
- Added ResourceListenerStream RPC in services.proto for handling node events and resource usage.
- Implemented multiple test cases for ResourceListenerStream, covering scenarios such as handling resource messages, resource usage messages, delete resource messages, and error handling.
- Verified ACK message sending for different message types and ensured proper error handling for EOF and context cancellation.
- Added tests for cases with missing or empty backend name metadata to ensure correct rejection of streams.
@ethany-nv ethany-nv changed the title Ethany/resource listener golang Resource Listener Jan 27, 2026
xutongNV
xutongNV previously approved these changes Jan 30, 2026
RyaliNvidia
RyaliNvidia previously approved these changes Jan 30, 2026
ethany-nv and others added 6 commits January 30, 2026 12:00
… handling

Replace manual signal handling with signal.NotifyContext() for cleaner
shutdown coordination. Use channel closure instead of separate done channels
to signal watcher completion, allowing channels to communicate their own state.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@ethany-nv ethany-nv enabled auto-merge (squash) February 2, 2026 20:46
@ethany-nv ethany-nv merged commit a0347bd into feature/PROJ-147-operator-redesign Feb 2, 2026
10 of 12 checks passed
@ethany-nv ethany-nv deleted the ethany/resource_listener_golang branch February 2, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants