[WIP] Port HIL SERL #644

AdilZouitine · 2025-01-17T08:44:45Z

What this does

⚠️ This PR is not ready to be merged.

We evaluate the actor-learner architecture on ManiSkill.

Implements the actor-learner process:
1. An actor machine interacts with the environment and sends data to a learner machine.
2. The learner updates its weights using this data and sends the updated weights back to the actor.
Increases learning speed by 50% using a shared encoder for the ensemble critics.
- Previously, each critic made a separate forward pass through the encoder, duplicating work.
- Now, the observation is passed through the encoder only once, and the resulting representation is sent to the critic heads.

How it was tested

We trained an agent on ManiSkill using this actor-learner architecture.

How to check out & try it (for the reviewer) 😃

Install ManiSkill.

Examples:

python lerobot/scripts/server/actor_server.py policy=sac_maniskill env=maniskill_example device=cuda wandb.enable=True

python lerobot/scripts/server/learner_server.py policy=sac_maniskill env=maniskill_example device=cuda wandb.enable=True

lerobot/common/policies/sac/modeling_sac.py

lerobot/common/policies/hilserl/classifier/modeling_classifier.py

lerobot/common/policies/sac/modeling_sac.py

lerobot/scripts/server/learner_server.py

ChorntonYoel · 2025-03-17T14:43:28Z

lerobot/scripts/server/gym_manipulator.py

+        cfg.env.wrapper.ee_action_space_params is not None
+        and cfg.env.wrapper.ee_action_space_params.use_gamepad
+    ):
+        # env = ActionScaleWrapper(env=env, ee_action_space_params=cfg.env.wrapper.ee_action_space_params)


WHat's the reason for the action scale wrapper being commented out?

Co-authored-by: Daniel Ritchie <[email protected]> Co-authored-by: resolver101757 <[email protected]> Co-authored-by: Jannik Grothusen <[email protected]> Co-authored-by: Remi <[email protected]> Co-authored-by: Michel Aractingi <[email protected]>

…licy on the robot (#541) Co-authored-by: Yoel <[email protected]>

Co-authored-by: Yoel <[email protected]>

Co-authored-by: KeWang1017 <[email protected]>

…ing logic - Added `num_subsample_critics`, `critic_target_update_weight`, and `utd_ratio` to SACConfig. - Implemented target entropy calculation in SACPolicy if not provided. - Introduced subsampling of critics to prevent overfitting during updates. - Updated temperature loss calculation to use the new target entropy. - Added comments for future UTD update implementation. These changes improve the flexibility and performance of the SAC implementation.

…s & check script (#578)

…n handling - Updated action selection to use distribution sampling and log probabilities for better stochastic behavior. - Enhanced standard deviation clamping to prevent extreme values, ensuring stability in policy outputs. - Cleaned up code by removing unnecessary comments and improving readability. These changes aim to refine the SAC implementation, enhancing its robustness and performance during training and inference.

- Updated standard deviation parameterization in SACConfig to 'softplus' with defined min and max values for improved stability. - Modified action sampling in SACPolicy to use reparameterized sampling, ensuring better gradient flow and log probability calculations. - Cleaned up log probability calculations in TanhMultivariateNormalDiag for clarity and efficiency. - Increased evaluation frequency in YAML configuration to 50000 for more efficient training cycles. These changes aim to enhance the robustness and performance of the SAC implementation during training and inference.

…d stability - Updated SACConfig to replace standard deviation parameterization with log_std_min and log_std_max for better control over action distributions. - Modified SACPolicy to streamline action selection and log probability calculations, enhancing stochastic behavior. - Removed deprecated TanhMultivariateNormalDiag class to simplify the codebase and improve maintainability. These changes aim to enhance the robustness and performance of the SAC implementation during training and inference.

Added support for hil_serl classifier to be trained with train.py run classifier training by python lerobot/scripts/train.py --policy.type=hilserl_classifier fixes in find_joint_limits, control_robot, end_effector_control_utils

…rties - Introduced `WrapperConfig` dataclass for environment wrapper configurations. - Updated `ManiskillEnvConfig` to include a `wrapper` field for enhanced environment management. - Modified `SACConfig` to return `None` for `observation_delta_indices` and `action_delta_indices` properties. - Refactored `make_robot_env` function to improve readability and maintainability.

Moved HilSerl env config to configs/env/configs.py fixes in actor_server and modeling_sac and configuration_sac added the possibility of ignoring missing keys in env_cfg in get_features_from_env_config function

- Implemented process-specific logging for actor and learner servers to improve traceability. - Created a dedicated logs directory and ensured it exists before logging. - Initialized logging with explicit log files for each process, including actor transitions, interactions, and policy. - Updated the actor CLI to validate configuration and set up logging accordingly.

- Simplified the `image_features` property to directly iterate over `input_features`. - Removed unused imports and unnecessary code related to main execution, enhancing clarity and maintainability.

- Rearranged import statements for better readability. - Removed unused imports and streamlined the code structure.

- Removed unused imports and streamlined the code structure. - Consolidated logging initialization and enhanced logging for training processes. - Improved handling of training state loading and resume logic. - Refactored transition and interaction message processing for better readability and maintainability. - Added detailed comments and documentation for clarity.

- Consolidated logging initialization and enhanced logging for actor processes. - Streamlined the handling of gRPC connections and process management. - Improved readability by organizing core algorithm functions and communication functions. - Added detailed comments and documentation for clarity. - Ensured proper queue management and shutdown handling for actor processes.

…onality - Updated the `forward` method in `SACPolicy` to handle loss computation for actor, critic, and temperature models. - Replaced direct calls to `compute_loss_*` methods with a unified `forward` method in `learner_server`. - Enhanced batch processing by consolidating input parameters into a single dictionary for better readability and maintainability. - Removed redundant code and improved documentation for clarity.

- Enhanced type annotations for variables in the `SACPolicy` class to improve code clarity. - Updated method calls to use keyword arguments for better readability. - Streamlined the extraction of batch components, ensuring consistent typing across the class methods.

…f gamepad Minor modifications in gym_manipulator to quantize the gripper actions clamped the observations after F.resize in ConvertToLeRobotObservation wrapper due to a bug in F.resize, images were returned exceeding the maximum value of 1.0

for more information, see https://pre-commit.ci

helper2424 · 2025-03-28T18:09:41Z

lerobot/common/policies/hilserl/configuration_hilserl.py

+
+
+@dataclass
+class HILSerlConfig:


Should we drop it?

helper2424 · 2025-03-28T18:15:54Z

lerobot/common/policies/hilserl/modeling_hilserl.py

+from huggingface_hub import PyTorchModelHubMixin
+
+
+class HILSerlPolicy(


It seems that it oculd be dropped too

helper2424 · 2025-03-28T18:23:03Z

lerobot/common/robot_devices/control_utils.py

+    current_position = robot.follower_arms["main"].read("Present_Position")
+    trajectory = torch.from_numpy(
+        np.linspace(current_position, target_position, 50)
+    )  # NOTE: 30 is just an aribtrary number


Nice, GH suggests to fix, so a small fix

Suggested change

) # NOTE: 30 is just an aribtrary number

) # NOTE: 30 is just an arbitrary number

helper2424 · 2025-03-28T18:25:58Z

lerobot/common/robot_devices/control_utils.py

@@ -246,14 +263,21 @@ def control_loop(
    while timestamp < control_time_s:
        start_loop_t = time.perf_counter()

+        current_joint_positions = robot.follower_arms["main"].read("Present_Position")


A small fix, linter generate an error

Suggested change

current_joint_positions = robot.follower_arms["main"].read("Present_Position")

# current_joint_positions = robot.follower_arms["main"].read("Present_Position")

…918)

…ed divergence

for more information, see https://pre-commit.ci

helper2424 · 2025-04-02T16:18:19Z

lerobot/scripts/train_sac.py

@@ -0,0 +1,594 @@
+#!/usr/bin/env python


Should we drop the file?

helper2424 · 2025-04-02T16:23:09Z

lerobot/scripts/server/learner_service.py

+
+MAX_MESSAGE_SIZE = 4 * 1024 * 1024  # 4 MB
+MAX_WORKERS = 3  # Stream parameters, send transitions and interactions
+STUTDOWN_TIMEOUT = 10


Suggested change

STUTDOWN_TIMEOUT = 10

SHUTDOWN_TIMEOUT = 10

helper2424 · 2025-04-02T16:25:59Z

lerobot/scripts/server/learner_server.py

+
+    shutdown_event.wait()
+    logging.info("[LEARNER] Stopping gRPC server...")
+    server.stop(learner_service.STUTDOWN_TIMEOUT)


Suggested change

server.stop(learner_service.STUTDOWN_TIMEOUT)

server.stop(learner_service.SHUTDOWN_TIMEOUT)

Linter fix, should me merged together with another fix for STUTDOWN_TIMEOUT in learner_service

helper2424 · 2025-04-02T19:10:35Z

lerobot/configs/train.py

@@ -107,8 +106,9 @@ def validate(self):
            train_dir = f"{now:%Y-%m-%d}/{now:%H-%M-%S}_{self.job_name}"
            self.output_dir = Path("outputs/train") / train_dir

-        if isinstance(self.dataset.repo_id, list):
-            raise NotImplementedError("LeRobotMultiDataset is not currently implemented.")
+        if self.dataset is not None:


Fixup https://github.com/huggingface/lerobot/pull/929/files

helper2424 · 2025-04-02T19:12:05Z

lerobot/scripts/eval_on_robot.py

@@ -0,0 +1,412 @@
+#!/usr/bin/env python


It seems that this file could be dropped too

ChorntonYoel · 2025-04-03T17:17:32Z

lerobot/scripts/server/gym_manipulator.py

+        )
+
+        # Initialize kinematics instance for the appropriate robot type
+        robot_type = getattr(env.unwrapped.robot.config, "robot_type", "so100")


Suggested change

robot_type = getattr(env.unwrapped.robot.config, "robot_type", "so100")

robot_type = getattr(env.unwrapped.robot.config, "type", "so100")

Also I'm a bit against the getattr with default so100 as it leads to silent bugs. Took me a minute to realize why my moss was constrained

ChorntonYoel · 2025-04-03T17:18:10Z

lerobot/scripts/server/gym_manipulator.py

+        self.use_gripper = use_gripper
+
+        # Initialize kinematics instance for the appropriate robot type
+        robot_type = getattr(env.unwrapped.robot.config, "robot_type", "so100")


Suggested change

robot_type = getattr(env.unwrapped.robot.config, "robot_type", "so100")

robot_type = getattr(env.unwrapped.robot.config, "type", "so100")

Ke-Wang1017 · 2025-04-08T20:59:00Z

lerobot/scripts/server/gym_manipulator.py

+        if not self.robot.is_connected:
+            self.robot.connect()
+
+        self.initial_follower_position = robot.follower_arms["main"].read("Present_Position")


It seems this is not used

ChorntonYoel reviewed Jan 31, 2025

View reviewed changes

lerobot/common/policies/sac/modeling_sac.py Show resolved Hide resolved

ChorntonYoel reviewed Jan 31, 2025

View reviewed changes

lerobot/common/policies/sac/modeling_sac.py Outdated Show resolved Hide resolved

ChorntonYoel reviewed Jan 31, 2025

View reviewed changes

lerobot/common/policies/sac/modeling_sac.py Show resolved Hide resolved

michel-aractingi force-pushed the user/adil-zouitine/2025-1-7-port-hil-serl-new branch from b1be31a to 2211209 Compare February 3, 2025 15:11

ChorntonYoel reviewed Feb 6, 2025

View reviewed changes

lerobot/common/policies/hilserl/classifier/modeling_classifier.py Outdated Show resolved Hide resolved

ChorntonYoel reviewed Feb 6, 2025

View reviewed changes

lerobot/common/policies/sac/modeling_sac.py Outdated Show resolved Hide resolved

aliberts mentioned this pull request Feb 8, 2025

Remove offline training, refactor train.py and logging/checkpointing #670

Merged

ChorntonYoel reviewed Feb 26, 2025

View reviewed changes

lerobot/scripts/server/learner_server.py Outdated Show resolved Hide resolved

ChorntonYoel reviewed Mar 17, 2025

View reviewed changes

AdilZouitine changed the title ~~[WIP] Fix SAC and port HIL SERL~~ [WIP] Port HIL SERL Mar 18, 2025

AdilZouitine force-pushed the user/adil-zouitine/2025-1-7-port-hil-serl-new branch from 9a68f20 to ae12807 Compare March 24, 2025 11:05

AdilZouitine changed the base branch from user/michel-aractingi/2024-11-27-port-hil-serl to main March 24, 2025 11:07

AdilZouitine force-pushed the user/adil-zouitine/2025-1-7-port-hil-serl-new branch from dd50635 to 313812d Compare March 24, 2025 13:16

ChorntonYoel and others added 17 commits March 28, 2025 17:18

Reward classifier and training (#528)

58cc445

Co-authored-by: Daniel Ritchie <[email protected]> Co-authored-by: resolver101757 <[email protected]> Co-authored-by: Jannik Grothusen <[email protected]> Co-authored-by: Remi <[email protected]> Co-authored-by: Michel Aractingi <[email protected]>

Add human intervention mechanism and eval_robot script to evaluate po…

76234b7

…licy on the robot (#541) Co-authored-by: Yoel <[email protected]>

Fixup

df57d37

Update lerobot/scripts/train_hilserl_classifier.py

399f834

Co-authored-by: Yoel <[email protected]>

nit in control_robot.py

69b6de4

completed losses

44536d1

Port SAC WIP (#581)

dbadaae

Co-authored-by: KeWang1017 <[email protected]>

added comments from kewang

66268fc

[Port Hil-SERL] Add unit tests for the reward classifier & fix import…

6340d9d

…s & check script (#578)

[HIL-SERL PORT] Fix linter issues (#588)

d96edbf

added optimizer and sac to factory.py

9dafad1

Added normalization schemes and style checks

80b86e9

trying to get sac running

a113daa

michel-aractingi and others added 12 commits March 28, 2025 17:18

Change HILSerlRobotEnvConfig to inherit from EnvConfig

d0b7690

Added support for hil_serl classifier to be trained with train.py run classifier training by python lerobot/scripts/train.py --policy.type=hilserl_classifier fixes in find_joint_limits, control_robot, end_effector_control_utils

Added gripper control mechanism to gym_manipulator

02b9ea9

Moved HilSerl env config to configs/env/configs.py fixes in actor_server and modeling_sac and configuration_sac added the possibility of ignoring missing keys in env_cfg in get_features_from_env_config function

fix

3c56ad3

Refactor SACConfig properties for improved readability

6f70242

- Simplified the `image_features` property to directly iterate over `input_features`. - Removed unused imports and unnecessary code related to main execution, enhancing clarity and maintainability.

Refactor imports in modeling_sac.py for improved organization

82a6b69

- Rearranged import statements for better readability. - Removed unused imports and streamlined the code structure.

AdilZouitine force-pushed the user/adil-zouitine/2025-1-7-port-hil-serl-new branch from ad51d89 to 808cf63 Compare March 28, 2025 17:20

[pre-commit.ci] auto fixes from pre-commit.com hooks

c05e483

for more information, see https://pre-commit.ci

helper2424 reviewed Mar 28, 2025

View reviewed changes

lerobot/common/policies/hilserl/configuration_hilserl.py

@dataclass

class HILSerlConfig:

Copy link

Contributor

helper2424 Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we drop it?

helper2424 reviewed Mar 28, 2025

View reviewed changes

s1lent4gnt and others added 4 commits March 31, 2025 09:43

Fix: Prevent Invalid next_state References When optimize_memory=True (#…

66c3672

…918)

Fix cuda graph break

8494634

Fix convergence of sac, multiple torch compile on the same model caus…

026ad46

…ed divergence

[pre-commit.ci] auto fixes from pre-commit.com hooks

0f706ce

for more information, see https://pre-commit.ci

helper2424 reviewed Apr 2, 2025

View reviewed changes

lerobot/scripts/train_sac.py

@@ -0,0 +1,594 @@

#!/usr/bin/env python

Copy link

Contributor

helper2424 Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we drop the file?

helper2424 reviewed Apr 2, 2025

View reviewed changes

lerobot/scripts/eval_on_robot.py

@@ -0,0 +1,412 @@

#!/usr/bin/env python

Copy link

Contributor

helper2424 Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this file could be dropped too

ChorntonYoel reviewed Apr 3, 2025

View reviewed changes

Ke-Wang1017 reviewed Apr 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Port HIL SERL #644

[WIP] Port HIL SERL #644

AdilZouitine commented Jan 17, 2025 •

edited

Loading

ChorntonYoel Mar 17, 2025

helper2424 Mar 28, 2025

helper2424 Mar 28, 2025

helper2424 Mar 28, 2025 •

edited

Loading

helper2424 Mar 28, 2025 •

edited

Loading

helper2424 Apr 2, 2025

helper2424 Apr 2, 2025

helper2424 Apr 2, 2025

helper2424 Apr 2, 2025

helper2424 Apr 2, 2025

helper2424 Apr 2, 2025

helper2424 Apr 2, 2025

ChorntonYoel Apr 3, 2025

ChorntonYoel Apr 3, 2025

Ke-Wang1017 Apr 8, 2025

		from huggingface_hub import PyTorchModelHubMixin


		class HILSerlPolicy(

	) # NOTE: 30 is just an aribtrary number
	) # NOTE: 30 is just an arbitrary number

	current_joint_positions = robot.follower_arms["main"].read("Present_Position")
	# current_joint_positions = robot.follower_arms["main"].read("Present_Position")

	server.stop(learner_service.STUTDOWN_TIMEOUT)
	server.stop(learner_service.SHUTDOWN_TIMEOUT)

	robot_type = getattr(env.unwrapped.robot.config, "robot_type", "so100")
	robot_type = getattr(env.unwrapped.robot.config, "type", "so100")

[WIP] Port HIL SERL #644

Are you sure you want to change the base?

[WIP] Port HIL SERL #644

Conversation

AdilZouitine commented Jan 17, 2025 • edited Loading

What this does

How it was tested

How to check out & try it (for the reviewer) 😃

Examples:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

helper2424 Mar 28, 2025 • edited Loading

Choose a reason for hiding this comment

helper2424 Mar 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AdilZouitine commented Jan 17, 2025 •

edited

Loading

helper2424 Mar 28, 2025 •

edited

Loading

helper2424 Mar 28, 2025 •

edited

Loading