Ray Pool Executor by lbliii · Pull Request #1415 · NVIDIA-NeMo/Curator

lbliii · 2026-01-22T17:42:09Z

No description provided.

Signed-off-by: Lawrence Lane <llane@nvidia.com>

greptile-apps · 2026-01-22T17:46:08Z

Greptile Overview

Greptile Summary

This PR updates the executor documentation to remove maturity labels and reorganize content, promoting RayActorPoolExecutor to a more prominent position. The changes improve the overall presentation by:

Removing "Production Ready" and "Experimental" labels from the architecture diagram in release notes
Reorganizing the execution backends guide to present RayActorPoolExecutor before RayDataExecutor
Streamlining descriptions to focus on use cases rather than maturity status
Removing the comparison table that highlighted experimental status

However, the documentation has an inconsistency: while the changes suggest RayActorPoolExecutor is being promoted from experimental status, the module still lives in nemo_curator.backends.experimental.ray_actor_pool throughout the codebase. Additionally, unlike the other executors, RayActorPoolExecutor lacks an import code example, which could confuse users about the correct import path.

Confidence Score: 3/5

Documentation-only changes with potential usability issues but no runtime impact
The changes are safe from a technical perspective (documentation only, no code changes), but the missing import example for RayActorPoolExecutor and the disconnect between the documentation's presentation (suggesting stable/promoted status) and the actual code structure (still in experimental namespace) create user confusion and inconsistency concerns that should be addressed before merge
docs/reference/infrastructure/execution-backends.md requires attention to add the missing import example and clarify the experimental status of RayActorPoolExecutor

Important Files Changed

Filename	Overview
docs/about/release-notes/index.md	Removed maturity status labels from backend executors diagram - cleaner visual presentation
docs/reference/infrastructure/execution-backends.md	Reorganized executor documentation with improved descriptions; missing import example for `RayActorPoolExecutor` raises concerns about usability

Sequence Diagram

sequenceDiagram
    participant User
    participant Docs as Documentation
    participant ReleaseNotes as Release Notes
    participant ExecutionBackends as Execution Backends Guide
    
    User->>Docs: Read about executors
    Docs->>ExecutionBackends: Navigate to execution backends
    
    Note over ExecutionBackends: XennaExecutor section<br/>Clear import example provided
    ExecutionBackends-->>User: Shows XennaExecutor usage
    
    Note over ExecutionBackends: RayActorPoolExecutor section<br/>No import example shown
    ExecutionBackends-->>User: Shows conceptual usage only
    
    User->>ExecutionBackends: Try to use RayActorPoolExecutor
    Note over User: Confusion: Where to import from?<br/>nemo_curator.backends.ray_actor_pool?<br/>nemo_curator.backends.experimental.ray_actor_pool?
    
    User->>ReleaseNotes: Check architecture diagram
    Note over ReleaseNotes: Diagram shows all executors<br/>without maturity labels
    ReleaseNotes-->>User: No clarity on experimental status
    
    Note over User: Import fails at runtime<br/>Needs to check source code

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

docs/reference/infrastructure/execution-backends.md

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

docs/reference/infrastructure/execution-backends.md

Signed-off-by: Lawrence Lane <llane@nvidia.com>

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-26T17:25:19Z

docs/reference/infrastructure/execution-backends.md

+### `RayActorPoolExecutor`

-`RayDataExecutor` uses Ray Data, a scalable data processing library built on Ray Core. Ray Data provides a familiar DataFrame-like API for distributed data transformations. This executor is experimental and best suited for large-scale batch processing tasks that benefit from Ray Data's optimized data loading and transformation pipelines.
+`RayActorPoolExecutor` uses Ray's ActorPool for efficient distributed processing with fine-grained resource management. This executor creates pools of Ray actors per stage, enabling better load balancing and fault tolerance through Ray's native mechanisms. Deduplication workflows automatically use this executor for GPU-accelerated stages.

 **Key Features**:
- **Ray Data API**: Leverages Ray Data's optimized data processing primitives
- **Scalable transformations**: Efficient map-batch operations across distributed workers
- **Experimental status**: API and performance characteristics may change
+- **ActorPool-based execution**: Creates dedicated actor pools per stage for optimal resource utilization
+- **Load balancing**: Uses `map_unordered` for efficient work distribution across actors
+- **RAFT support**: Native integration with [RAFT](https://github.com/rapidsai/raft) (RAPIDS Analytics Framework Toolbox) for GPU-accelerated clustering and nearest-neighbor operations
+- **Head node exclusion**: Optional `ignore_head_node` parameter to reserve the Ray cluster's [head node](https://docs.ray.io/en/latest/cluster/key-concepts.html#head-node) for coordination tasks only
+


missing import example for RayActorPoolExecutor

Unlike XennaExecutor and RayDataExecutor, there's no code example showing how to import RayActorPoolExecutor. Based on the codebase, the correct import is:

from nemo_curator.backends.experimental.ray_actor_pool import RayActorPoolExecutor

Consider adding an import example here for consistency and to help users.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

updates

6ca7db4

Signed-off-by: Lawrence Lane <llane@nvidia.com>

lbliii self-assigned this Jan 22, 2026

greptile-apps bot reviewed Jan 22, 2026

View reviewed changes

docs/reference/infrastructure/execution-backends.md Outdated Show resolved Hide resolved

docs/reference/infrastructure/execution-backends.md Outdated Show resolved Hide resolved

Merge branch 'main' into llane/ray-actor-pool-executor

2553825

lbliii requested a review from sarahyurick January 23, 2026 22:27

greptile-apps bot reviewed Jan 23, 2026

View reviewed changes

sarahyurick requested changes Jan 24, 2026

View reviewed changes

feedback

3466c4e

Signed-off-by: Lawrence Lane <llane@nvidia.com>

copy-pr-bot bot temporarily deployed to test January 26, 2026 17:21 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 26, 2026 17:21 Error

Merge branch 'main' into llane/ray-actor-pool-executor

0c23880

greptile-apps bot reviewed Jan 26, 2026

View reviewed changes

sarahyurick approved these changes Jan 26, 2026

View reviewed changes

lbliii merged commit 3974061 into NVIDIA-NeMo:main Jan 26, 2026
17 checks passed

sarahyurick mentioned this pull request Feb 11, 2026

Add relevant 26.02 docs to r1.1.0 #1493

Merged

44 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ray Pool Executor#1415

Ray Pool Executor#1415
lbliii merged 4 commits intoNVIDIA-NeMo:mainfrom
lbliii:llane/ray-actor-pool-executor

lbliii commented Jan 22, 2026

Uh oh!

greptile-apps bot commented Jan 22, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lbliii commented Jan 22, 2026

Uh oh!

greptile-apps bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Jan 22, 2026 •

edited

Loading