Skip to content

Conversation

@AntoineRichard
Copy link
Collaborator

Description

Adds graphed workflows as a separate pipeline for now.

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context..

Type of change

  • New feature (non-breaking change which adds functionality)

How to use:

./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Cartpole-Warp-v0 --num_envs 4096 --headless --max_iterations 300
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Ant-Warp-v0 --num_envs 4096 --headless --max_iterations 300
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Humanoid-Warp-v0 --num_envs 4096 --headless --max_iterations 300

Results

Provides some speed-up on the Isaac front:
For Cartpole:

  • Torch API:
    • full collection step: Mean: 3398.505688 us, Std: 462.092124 us, N: 4800
    • actuators: Mean: 458.664717 us, Std: 56.303926 us, N: 9600
    • newton step: Mean: 418.564702 us, Std: 27.599639 us, N: 9599
    • post-processing (rl-related calculations): Mean: 1727.790608 us, Std: 427.153509 us, N: 4800
    • Lab overhead: 2560us
  • Warp API:
    • full collection step: Mean: 1097.855936 us, Std: 73.003888 us, N: 4800
    • actuators: Mean: 17.609461 us, Std: 4.145735 us, N: 9599
    • newton step: Mean: 370.281310 us, Std: 29.087660 us, N: 9599
    • post-processing (rl-related calculations): Mean: 24.169094 us, Std: 8.389473 us, N: 4800
    • Lab overhead: 357us

For Ant:

  • Torch API:
    • full collection step: Mean: 7646.153628 us, Std: 1809.778802 us, N: 32000
    • actuators: 352.259001 us, Mean: 363.953334 us, Std: 29.461383 us, N: 64000
    • newton step: 2011.052002 us, Mean: 2121.656286 us, Std: 321.844311 us, N: 64000
    • post-processing (rl-related calculations): 2433.182002 us, Mean: 2435.709163 us, Std: 1679.963760 us, N: 32000
    • Lab overhead: 3404us
  • Warp API:
    • full collection step: Mean: 4603.307170 us, Std: 257.345966 us, N: 32000
    • actuators: Mean: 18.158209 us, Std: 2.621708 us, N: 64000
    • newton step: Mean: 2066.688861 us, Std: 128.275069 us, N: 63999
    • post-processing (rl-related calculations): Mean: 59.394046 us, Std: 5.979035 us, N: 32000
    • Lab overhead: 471us

For Humanoid:

  • Torch API:
    • full collection step: Mean: 17507.664525 us, Std: 2403.361261 us, N: 16000
    • actuators: Mean: 404.597538 us, Std: 137.743652 us, N: 31999
    • newton step: Mean: 6812.051563 us, Std: 331.997178 us, N: 31999
    • post-processing (rl-related calculations): Mean: 2780.663383 us, Std: 2256.250511 us, N: 16000
    • Lab overhead: 3883us
  • Warp API:
    • full collection step: Mean: 14906.382912 us, Std: 668.738434 us, N: 16000
    • actuators: Mean: 23.225196 us, Std: 3.397187 us, N: 32000
    • newton step: Mean: 7172.681914 us, Std: 355.531817 us, N: 32000
    • post-processing (rl-related calculations): Mean: 80.205396 us, Std: 6.993390 us, N: 16000
    • Lab overhead: 562us

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@ooctipus
Copy link
Collaborator

this is amazing

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR introduces a major architectural enhancement by implementing graphed workflows using Warp for GPU-accelerated RL training in Isaac Lab. The implementation adds a complete parallel pipeline alongside the existing Torch-based workflow.

Key Changes

  • New Backend Abstraction: Factory pattern in backend_utils.py enables dynamic loading of Newton vs base implementations, with backend currently hardcoded to "newton"
  • Warp-based Environment: DirectRLEnvWarp class implements CUDA graph capture for step_warp_action() and step_warp_end() to eliminate kernel launch overhead
  • Newton Assets & Actuators: Complete reimplementation of articulation and actuator systems using Warp arrays and kernels instead of Torch tensors
  • Performance Improvements: Benchmarks show significant speedups (Cartpole: 3.4ms→1.1ms, Ant: 7.6ms→4.6ms, Humanoid: 17.5ms→14.9ms per collection step)
  • Example Environments: Warp implementations for Cartpole, Ant, and Humanoid demonstrating the new workflow

Issues Found

  • Critical: Line 375 in direct_rl_env_warp.py creates new tensors each step (wp.from_torch(action)), preventing full graph capture and undermining performance benefits
  • Type inconsistency: _pre_physics_step signature expects torch.Tensor but receives wp.array
  • Code cleanup needed: Significant commented-out code for event managers and noise models throughout direct_rl_env_warp.py
  • Minor: Missing newline at EOF in __init__.py

Confidence Score: 2/5

  • This PR has significant architectural changes with critical performance-impacting issues that need resolution before merging
  • Score reflects the critical issue at line 375 where tensor conversion breaks CUDA graph capture, undermining the core performance benefits of the graphed workflow. The extensive commented-out code suggests incomplete implementation of event management and noise models. While the architecture is sound and benchmarks show promise, the tensor conversion bug needs immediate attention as it prevents achieving the advertised performance gains in production.
  • Pay special attention to source/isaaclab/isaaclab/envs/direct_rl_env_warp.py - the action tensor conversion at line 375 must be fixed to achieve full CUDA graph benefits

Important Files Changed

File Analysis

Filename Score Overview
source/isaaclab/isaaclab/envs/direct_rl_env_warp.py 3/5 New warp-based RL environment with CUDA graph capture for performance. Contains commented-out event manager and noise model code that should be cleaned up or enabled.
source/isaaclab/isaaclab/init.py 4/5 Added Backend IntEnum to distinguish Newton from PhysX. Missing newline at end of file.
source/isaaclab/isaaclab/actuators/actuator_base.py 3/5 Refactored to support backend abstraction. Removed torch-specific attributes, simplified constructor signature for backend compatibility.
source/isaaclab/isaaclab/newton/assets/articulation/articulation.py 3/5 Large new file (2500+ lines) implementing Newton-based articulation with warp kernels. Core implementation for the graphed workflow.
source/isaaclab/isaaclab/utils/backend_utils.py 5/5 Factory pattern implementation for dynamic backend loading (Newton vs base). Clean abstraction layer.
source/isaaclab_tasks/isaaclab_tasks/direct/cartpole/cartpole_warp_env.py 4/5 Warp-based Cartpole environment using kernels for observations, rewards, and resets. Good example of the new workflow.

Sequence Diagram

sequenceDiagram
    participant User
    participant DirectRLEnvWarp
    participant Scene
    participant Articulation
    participant ActuatorModel
    participant NewtonSolver
    participant WarpKernels

    Note over User,WarpKernels: Initialization Phase
    User->>DirectRLEnvWarp: __init__(cfg)
    DirectRLEnvWarp->>Scene: InteractiveScene(cfg.scene)
    DirectRLEnvWarp->>Articulation: Articulation(cfg, frontend=Frontend.WARP)
    Articulation->>ActuatorModel: Initialize actuator models with warp arrays
    
    Note over User,WarpKernels: Training Loop
    User->>DirectRLEnvWarp: step(action)
    DirectRLEnvWarp->>DirectRLEnvWarp: _pre_physics_step(wp.from_torch(action))
    
    loop decimation times
        DirectRLEnvWarp->>DirectRLEnvWarp: step_warp_action() [CUDA Graph]
        DirectRLEnvWarp->>Articulation: _apply_action()
        Articulation->>ActuatorModel: compute()
        ActuatorModel->>WarpKernels: compute_pd_actuator / clip_efforts
        DirectRLEnvWarp->>Scene: write_data_to_sim()
        Scene->>NewtonSolver: Apply computed efforts
        DirectRLEnvWarp->>NewtonSolver: sim.step(render=False)
        NewtonSolver-->>Articulation: Update physics state
        DirectRLEnvWarp->>Scene: update(dt)
    end
    
    DirectRLEnvWarp->>DirectRLEnvWarp: step_warp_end() [CUDA Graph]
    DirectRLEnvWarp->>WarpKernels: add_to_env (episode_length_buf++)
    DirectRLEnvWarp->>DirectRLEnvWarp: _get_dones()
    DirectRLEnvWarp->>WarpKernels: get_dones kernel
    DirectRLEnvWarp->>DirectRLEnvWarp: _get_rewards()
    DirectRLEnvWarp->>WarpKernels: compute_rewards kernel
    DirectRLEnvWarp->>DirectRLEnvWarp: _reset_idx(reset_buf)
    DirectRLEnvWarp->>Scene: reset(mask=reset_buf)
    DirectRLEnvWarp->>DirectRLEnvWarp: _get_observations()
    DirectRLEnvWarp->>WarpKernels: get_observations kernel
    DirectRLEnvWarp-->>User: (obs, rewards, dones, extras)
Loading

94 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

__version__ = ISAACLAB_METADATA["package"]["version"]

class Backend(IntEnum):
NEWTON = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: missing newline at end of file

Suggested change
NEWTON = 0
PHYSX = 1

action = self._action_noise_model(action)

# process actions, #TODO pass the torch tensor directly.
self._pre_physics_step(wp.from_torch(action)) # Creates a tensor and throws it away. --> Not graphable unless the training loop is using the same pointer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: the TODO comment indicates this creates a new tensor each step, preventing CUDA graph capture. This will significantly impact the performance benefits of the graphed workflow since the action conversion happens outside the graph

pass

@abstractmethod
def _pre_physics_step(self, actions: torch.Tensor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: type hint says torch.Tensor but should be wp.array since this is the warp environment and line 375 converts it with wp.from_torch(action)

Comment on lines +447 to +459
# TODO We could split it out.
# post-step: step interval event
# if self.cfg.events:
# if "interval" in self.event_manager.available_modes:
# self.event_manager.apply(mode="interval", dt=self.step_dt)

# update observations
self._get_observations()

# add observation noise
# note: we apply no noise to the state space (since it is used for critic networks)
# if self.cfg.observation_noise_model:
# self.obs_buf["policy"] = self._observation_noise_model(self.obs_buf["policy"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: large blocks of commented-out code for event management and observation noise should either be removed or properly implemented with a clear plan

Comment on lines +656 to +667

# apply events such as randomization for environments that need a reset
# if self.cfg.events:
# if "reset" in self.event_manager.available_modes:
# env_step_count = self._sim_step_counter // self.cfg.decimation
# self.event_manager.apply(mode="reset", env_ids=env_ids, global_env_step_count=env_step_count)

# reset noise models
# if self.cfg.action_noise_model:
# self._action_noise_model.reset(env_ids)
# if self.cfg.observation_noise_model:
# self._observation_noise_model.reset(env_ids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: commented-out event manager and noise model reset logic should be addressed before merging

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Introduces a separate Newton-based graphed workflow pipeline for Isaac Lab using Warp kernels to achieve significant performance improvements. The implementation adds a complete parallel execution path with warp-based actuators, articulation management, and a new DirectRLEnvWarp environment class.

Major changes include:

  • New Newton backend actuator implementations (actuator_base.py, actuator_pd.py, actuator_net.py) using warp arrays instead of torch tensors
  • Complete articulation system rewrite for Newton (articulation.py, articulation_data.py) with dual Torch/Warp frontend support
  • Warp kernels for PD control, DC motor clipping, and effort limiting
  • New DirectRLEnvWarp environment that aims to support CUDA graph capture for performance
  • Performance improvements demonstrated: Cartpole overhead reduced from 2560us to 357us, Ant from 3404us to 471us

Key concerns:

  • The DC motor torque-speed clipping formula may not correctly handle negative velocities
  • TODOs indicate tensor conversions in the action pipeline block CUDA graph capture, limiting the full performance benefit
  • Large commented-out sections for event management and noise models suggest incomplete implementation

Confidence Score: 3/5

  • Safe to merge with caveats - the parallel pipeline is isolated but has incomplete features and one potential physics bug
  • The PR introduces a complete parallel execution path which limits risk to existing code. However, the DC motor clipping logic has a potential bug in handling negative velocities, and the CUDA graph support (a key performance feature) is blocked by tensor conversion issues noted in TODOs. The implementation is well-structured but incomplete (missing event management, noise models).
  • source/isaaclab/isaaclab/newton/actuators/kernels.py for DC motor physics verification, and source/isaaclab/isaaclab/envs/direct_rl_env_warp.py for resolving CUDA graph blocking issues

Important Files Changed

File Analysis

Filename Score Overview
source/isaaclab/isaaclab/newton/actuators/kernels.py 3/5 New warp kernels for PD control and DC motor clipping; DC motor torque-speed curve logic needs verification
source/isaaclab/isaaclab/newton/actuators/actuator_base.py 4/5 Base actuator class for Newton backend with warp array support; clean implementation of parameter parsing
source/isaaclab/isaaclab/newton/assets/articulation/articulation.py 4/5 Core articulation implementation for Newton with extensive state management and frontend conversions
source/isaaclab/isaaclab/newton/assets/articulation/articulation_data.py 4/5 Data container with dual Torch/Warp frontend support; comprehensive property accessors for articulation state
source/isaaclab/isaaclab/envs/direct_rl_env_warp.py 2/5 Warp-based RL environment with CUDA graph support; contains TODOs about tensor conversion blocking graphs

Sequence Diagram

sequenceDiagram
    participant User as Training Script
    participant Env as DirectRLEnvWarp
    participant Art as Articulation (Newton)
    participant Act as Actuator Models
    participant Sim as Newton Solver
    participant WP as Warp Kernels

    User->>Env: step(actions)
    Env->>Env: wp.from_torch(actions)
    Note over Env: TODO: Tensor conversion blocks CUDA graphs
    
    Env->>Env: _pre_physics_step(wp_actions)
    Env->>Art: set_joint_*_target()
    Art->>Art: _torch_to_warp_dual_index()
    Art->>Act: Store targets in data._actuator_target
    
    Env->>Art: write_data_to_sim()
    Art->>Act: _apply_actuator_model()
    Act->>WP: compute_pd_actuator()
    Note over WP: PD control: k_p*(target-pos) + k_d*(-vel)
    WP->>Act: computed_effort
    Act->>WP: clip_efforts_with_limits()
    WP->>Act: applied_effort
    Act->>Art: Update data._sim_bind_joint_effort
    
    Env->>Sim: step(dt)
    Sim->>Sim: Newton physics integration
    
    Env->>Art: update(dt)
    Art->>Art: _data.update(dt)
    Note over Art: Read new states from Newton
    
    Env->>Env: _get_observations()
    Env->>Env: _get_rewards()
    Env->>User: obs, rewards, dones, info
Loading

21 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

joint_vel: float,
effort: float,
):
max_effort = saturation_effort * (1.0 - joint_vel) / vel_limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: DC motor clipping formula appears incorrect. The max_effort calculation uses (1.0 - joint_vel) but this doesn't account for the sign of velocity correctly. When joint_vel is negative (motor spinning backward), (1.0 - joint_vel) becomes larger than 1.0, which seems incorrect for a DC motor torque-speed curve. Standard DC motor model: torque = k * (V - k * omega) suggests the formula should handle forward/backward rotation symmetrically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants