[Feature] Added RGB DP baseline and DrawTriangle/SVG task (#876)

* [Feature] Added RGBD diffusion policy implementation as well as Draw Triangle and Draw SVG task (#643) * Added draw triangle with success condition * parallelized progress * fixed triangle rotation issues * clean up and format * rgbd diffusion policy progress * diff policy rgbd cpu fixes * minor diff policy fixes and finished draw triangle parallelization * Added depth arg to diff pol rgbd + formatting * Removed unused code * Made requested fixes, made bugfix to frame stack wrapper * Edited make_env and frame_stack * Added state obs to draw triangle * Update draw_triangle max steps * Added DrawTriangle Docs * Fixed naming * draw svg progress * fixed most issues and parallelized draw svg * Update draw_svg.py * added success condition and state based obs * Added discontinuous paths for draw svg * formatting, discontinuous state * Updated run.py * fixed state obs error for drawing envs * Changed drawsvg imports * Update draw_svg.py * Update draw_svg.py * Small bugfix * Update draw_svg.py * Update draw_svg.py * Update draw_triangle.py * Bugfixes, speed progress * success condition speed up * small fix * Updated draw_triangle * drawing env gpu bugfixes * diff poll rgbd fixes * minor changes * fix for wandb logging * autobuild docs for DrawTriangle and SVG * Update utils.py to add PushCube * adjustments for rgb * added initial pose for drawing envs * Update draw_triangle.py * adjusted training arguments and shell scripts * StackCube fix * work * w * w * w * w * w * w * w * update docs * update docs * work * w * Update utils.py * remove dead code * Update flatten.py * Update flatten.py * fix flatten wrapper * possibly better docs * simplify cli args and remove assumption of rgb existing * Update README.md --------- Co-authored-by: Arnav G. <[email protected]>
haosulab · Feb 25, 2025 · b75e188 · b75e188
1 parent 5fb50fd
commit b75e188
Show file tree

Hide file tree

Showing 35 changed files with 2,049 additions and 301 deletions.
diff --git a/docs/source/_static/env_thumbnails/DrawSVG-v1_rt_thumb_first.png b/docs/source/_static/env_thumbnails/DrawSVG-v1_rt_thumb_first.png
diff --git a/docs/source/_static/env_thumbnails/DrawSVG-v1_rt_thumb_last.png b/docs/source/_static/env_thumbnails/DrawSVG-v1_rt_thumb_last.png
diff --git a/docs/source/_static/env_thumbnails/DrawTriangle-v1_rt_thumb_first.png b/docs/source/_static/env_thumbnails/DrawTriangle-v1_rt_thumb_first.png
diff --git a/docs/source/_static/env_thumbnails/DrawTriangle-v1_rt_thumb_last.png b/docs/source/_static/env_thumbnails/DrawTriangle-v1_rt_thumb_last.png
diff --git a/docs/source/tasks/drawing/index.md b/docs/source/tasks/drawing/index.md
@@ -32,6 +32,22 @@ Table of all tasks/environments in this category. Task column is the environment
 <td><p>❌</p></td>
 <td><p>1000</p></td>
 </tr>
+<tr class="row-odd">
+<td><p><a href="#drawsvg-v1">DrawSVG-v1</a></p></td>
+<td><div style='display:flex;gap:4px;align-items:center'><img style='min-width:min(50%, 100px);max-width:100px;height:auto' src='../../_static/env_thumbnails/DrawSVG-v1_rt_thumb_first.png' alt='DrawSVG-v1'> <img style='min-width:min(50%, 100px);max-width:100px;height:auto' src='../../_static/env_thumbnails/DrawSVG-v1_rt_thumb_last.png' alt='DrawSVG-v1'></div></td>
+<td><p>❌</p></td>
+<td><p>✅</p></td>
+<td><p>❌</p></td>
+<td><p>500</p></td>
+</tr>
+<tr class="row-odd">
+<td><p><a href="#drawtriangle-v1">DrawTriangle-v1</a></p></td>
+<td><div style='display:flex;gap:4px;align-items:center'><img style='min-width:min(50%, 100px);max-width:100px;height:auto' src='../../_static/env_thumbnails/DrawTriangle-v1_rt_thumb_first.png' alt='DrawTriangle-v1'> <img style='min-width:min(50%, 100px);max-width:100px;height:auto' src='../../_static/env_thumbnails/DrawTriangle-v1_rt_thumb_last.png' alt='DrawTriangle-v1'></div></td>
+<td><p>❌</p></td>
+<td><p>✅</p></td>
+<td><p>❌</p></td>
+<td><p>300</p></td>
+</tr>
 </tbody>
 </table>
 
@@ -62,3 +78,53 @@ None
 <source src="https://github.com/haosulab/ManiSkill/raw/main/figures/environment_demos/TableTopFreeDraw-v1_rt.mp4" type="video/mp4">
 </video>
 </div>
+
+## DrawSVG-v1
+
+![no-dense-reward][no-dense-reward-badge]
+![sparse-reward][sparse-reward-badge]
+:::{dropdown} Task Card
+:icon: note
+:color: primary
+
+**Task Description:**
+Instantiates a table with a white canvas on it and a svg path specified with an outline. A robot with a stick is to draw the triangle with a red line.
+
+**Randomizations:**
+- the goal svg's position on the xy-plane is randomized
+- the goal svg's z-rotation is randomized in range [0, 2 $\pi$]
+
+**Success Conditions:**
+- the drawn points by the robot are within a euclidean distance of 0.05m with points on the goal svg
+:::
+
+<div style="display: flex; justify-content: center;">
+<video preload="none" controls="True" width="100%" style="max-width: min(100%, 512px);" poster="../../_static/env_thumbnails/DrawSVG-v1_rt_thumb_first.png">
+<source src="https://github.com/haosulab/ManiSkill/raw/figures/environment_demos/DrawSVG-v1_rt.mp4" type="video/mp4">
+</video>
+</div>
+
+## DrawTriangle-v1
+
+![no-dense-reward][no-dense-reward-badge]
+![sparse-reward][sparse-reward-badge]
+:::{dropdown} Task Card
+:icon: note
+:color: primary
+
+**Task Description:**
+Instantiates a table with a white canvas on it and a goal triangle with an outline. A robot with a stick is to draw the triangle with a red line.
+
+**Randomizations:**
+- the goal triangle's position on the xy-plane is randomized
+- the goal triangle's z-rotation is randomized in range [0, 2 $\pi$]
+
+**Success Conditions:**
+- the drawn points by the robot are within a euclidean distance of 0.05m with points on the goal triangle
+:::
+
+<div style="display: flex; justify-content: center;">
+<video preload="none" controls="True" width="100%" style="max-width: min(100%, 512px);" poster="../../_static/env_thumbnails/DrawTriangle-v1_rt_thumb_first.png">
+<source src="https://github.com/haosulab/ManiSkill/raw/figures/environment_demos/DrawTriangle-v1_rt.mp4" type="video/mp4">
+</video>
+</div>
diff --git a/docs/source/user_guide/datasets/demos.md b/docs/source/user_guide/datasets/demos.md
@@ -1,6 +1,6 @@
 # Demonstrations
 
-We provide a command line tool to download demonstrations directly from our [Hugging Face 🤗 dataset](https://huggingface.co/datasets/haosulab/ManiSkill_Demonstrations) which is done by task ID. The tool will download the demonstration files to a folder and also a few demonstration videos visualizing what the demonstrations look like. See [Tasks](../../tasks/index.md) for a list of all supported tasks.
+We provide a command line tool to download demonstrations directly from our [Hugging Face 🤗 dataset](https://huggingface.co/datasets/haosulab/ManiSkill_Demonstrations) which is done by task ID. The tool will download the demonstration files to a folder and also a few demonstration videos visualizing what the demonstrations look like. See [Tasks](../../tasks/index.md) for a list of all supported tasks that have demonstrations.
 
 <!-- TODO: add a table here detailing the data info in detail -->
 <!-- Please see our [notes](https://docs.google.com/document/d/1bBKmsR-R_7tR9LwaT1c3J26SjIWw27tWSLdHnfBR01c/edit?usp=sharing) about the details of the demonstrations. -->
@@ -13,6 +13,9 @@ python -m mani_skill.utils.download_demo # with no args this prints all availabl
 python -m mani_skill.utils.download_demo all
 ```
 
+Demo datasets are typically stored in a minimal format (e.g., no observation data) and store env states instead to compress them. We provide a flexible tool to replay demonstration datasets to modify them e.g. add visual observation data, record videos and more, see the [trajectory replay documentation](../datasets/replay.md). If you want to generate the original compressed datasets yourself locally we save all scripts used for dataset generation in the [data_generation](https://github.com/haosulab/ManiSkill/tree/main/scripts/data_generation) folder. For users looking to benchmark imitation learning we strongly recommend following the instructions on the [imitation learning setup page](../learning_from_demos/setup.md) which details how to replay the compressed datasets for benchmarking training datasets.
+
+
 ## Format
 
 All demonstrations for a task are saved in the HDF5 format openable by [h5py](https://github.com/h5py/h5py). Each HDF5 dataset is named `trajectory.{obs_mode}.{control_mode}.{sim_backend}.h5`, and is associated with a JSON metadata file with the same base name. Unless otherwise specified, `trajectory.h5` is short for `trajectory.none.pd_joint_pos.physx_cpu.h5`, which contains the original demonstrations generated by the `pd_joint_pos` controller with the `none` observation mode (empty observations) in the CPU based simulation. However, there may exist demonstrations generated by other controllers. **Thus, please check the associated JSON to ensure which controller is used.**

diff --git a/docs/source/user_guide/getting_started/quickstart.md b/docs/source/user_guide/getting_started/quickstart.md
@@ -1,7 +1,12 @@
 # {octicon}`rocket` Quickstart
 
 <!-- TODO: add link to new sapien website eventually -->
-ManiSkill is a robotics simulator built on top of SAPIEN. It provides a standard Gym/Gymnasium interface for easy use with existing learning workflows like RL and imitation learning. Moreover, ManiSkill supports simulation on both the GPU and CPU, as well as fast parallelized rendering.
+ManiSkill is a robotics simulator built on top of SAPIEN. It provides a standard Gym/Gymnasium interface for easy use with existing learning workflows like reinforcement learning (RL) and imitation learning (IL). Moreover, ManiSkill supports simulation on both the GPU and CPU, as well as fast parallelized rendering.
+
+We recommend going through this document first and playing with some of the demos. Then for specific applications we recommend the following:
+- To get started with RL follow the [RL setup page](../reinforcement_learning/index.md).
+- To get started with IL follow the [IL setup page](../learning_from_demos/index.md).
+- To learn how to build your own tasks follow the [task creation tutorial](../tutorials/custom_tasks/index.md).
 
 ## Interface
 
@@ -74,7 +79,7 @@ You will also notice that all data returned is a batched torch tensor. To reduce
 ```python
 from mani_skill.utils.wrappers.gymnasium import CPUGymWrapper
 env = gym.make(env_id, num_envs=1)
-env = CPUGymWrapper(env)
+env = CPUGymWrapper(env) # this also completely implements standard Gymnasium Env interface
 obs, _ = env.reset() # obs is numpy and unbatched
 ```
 
@@ -86,6 +91,7 @@ See {py:class}`mani_skill.envs.sapien_env` for the full list of environment inst
 
 
 
+
 ## GPU Parallelized/Vectorized Tasks
 
 ManiSkill is powered by SAPIEN which supports GPU parallelized physics simulation and GPU parallelized rendering. This enables achieving 200,000+ state-based simulation FPS and 30,000+ FPS with rendering on a single 4090 GPU on a e.g. manipulation tasks. The FPS can be higher or lower depending on what is simulated. For full benchmarking results see [this page](../additional_resources/performance_benchmarking)

diff --git a/docs/source/user_guide/learning_from_demos/index.md b/docs/source/user_guide/learning_from_demos/index.md
@@ -1,6 +1,6 @@
 # Learning from Demonstrations
 
-ManiSkill supports all kinds of learning from demonstration / imitation learning methods via a unified API and provides multiple ready, already tested, baselines for use/comparison.  The pages below show how to [setup environments for learning from demonstrations](./setup.md) and how to use the [baselines](./baselines.md). All baseline results are published to our [public wandb page](https://wandb.ai/stonet2000/ManiSkill). On that page you can filter by algorithm used, environment type, etc. We are still in the progress of running all experiments so not all results are uploaded yet. 
+ManiSkill supports all kinds of learning from demonstration / imitation learning methods via a unified API and provides multiple ready, already tested, baselines for use/comparison.  The pages below show how to [setup datasets/environments for learning from demonstrations](./setup.md) and how to use the [baselines](./baselines.md). All baseline results are published to our [public wandb page](https://wandb.ai/stonet2000/ManiSkill). On that page you can filter by algorithm used, environment type, etc. We are still in the progress of running all experiments so not all results are uploaded yet. 
 
 ```{toctree}
 :titlesonly:

diff --git a/docs/source/user_guide/learning_from_demos/setup.md b/docs/source/user_guide/learning_from_demos/setup.md
@@ -19,7 +19,7 @@ To ensure everyone has the same preprocessed/replayed dataset, make sure to run
 
 It has fixed settings for the trajectory replay to generate observation data and set the desired action space/controller for all benchmarked tasks. All benchmarked results in the [Wandb project detailing all benchmarked training runs](https://wandb.ai/stonet2000/ManiSkill) use the data replayed by the script above
 
-If you need more advanced use-cases for trajectory replay (e.g. generating pointclouds, changing controller modes), see the [trajectory replay documentation](../datasets/replay.md).
+If you need more advanced use-cases for trajectory replay (e.g. generating pointclouds, changing controller modes), see the [trajectory replay documentation](../datasets/replay.md). If you want to generate the original datasets yourself locally we save all scripts used for dataset generation in the [data_generation](https://github.com/haosulab/ManiSkill/tree/main/scripts/data_generation) folder.
 
 
 ## Evaluation

diff --git a/docs/source/user_guide/wrappers/flatten.md b/docs/source/user_guide/wrappers/flatten.md
@@ -38,17 +38,17 @@ print(env.action_space) # is a flat array now
 
 ## Flatten RGBD Observations
 
-This wrapper concatenates all the RGB and Depth images into a single image with combined channels, and concatenates all state data into a single array so that the observation space becomes a simple dictionary composed of a `state` key and a `rgbd` key.
+This wrapper concatenates all the RGB and Depth images into a single image with combined channels, and concatenates all state data into a single array so that the observation space becomes a simple dictionary composed of a `state`, `rgb`, and `depth` key.
 
 ```python
 import mani_skill.envs
 from mani_skill.utils.wrappers import FlattenRGBDObservationWrapper
 import gymnasium as gym
 
-env = gym.make("PickCube-v1", obs_mode="rgbd")
+env = gym.make("PickCube-v1", obs_mode="rgb+depth")
 print(env.observation_space) # is a complex dictionary
 # Dict('agent': Dict('qpos': Box(-inf, inf, (1, 9), float32), 'qvel': Box(-inf, inf, (1, 9), float32)), 'extra': Dict('is_grasped': Box(False, True, (1,), bool), 'tcp_pose': Box(-inf, inf, (1, 7), float32), 'goal_pos': Box(-inf, inf, (1, 3), float32)), 'sensor_param': Dict('base_camera': Dict('extrinsic_cv': Box(-inf, inf, (1, 3, 4), float32), 'cam2world_gl': Box(-inf, inf, (1, 4, 4), float32), 'intrinsic_cv': Box(-inf, inf, (1, 3, 3), float32))), 'sensor_data': Dict('base_camera': Dict('rgb': Box(0, 255, (1, 128, 128, 3), uint8), 'depth': Box(-32768, 32767, (1, 128, 128, 1), int16))))
 env = FlattenRGBDObservationWrapper(env)
 print(env.observation_space) # is a much simpler dictionary now
-# Dict('state': Box(-inf, inf, (1, 29), float32), 'rgbd': Box(-32768, 32767, (1, 128, 128, 4), int16))
+# Dict('state': Box(-inf, inf, (1, 29), float32), 'rgb': Box(-32768, 32767, (1, 128, 128, 3), int16), 'depth': Box(-32768, 32767, (1, 128, 128, 1), int16))
 ```
diff --git a/examples/baselines/diffusion_policy/README.md b/examples/baselines/diffusion_policy/README.md
@@ -20,7 +20,7 @@ Read through the [imitation learning setup documentation](https://maniskill.read
 
 We provide scripts to train Diffusion Policy on demonstrations.
 
-Note that some demonstrations are slow (e.g. motion planning or human teleoperated) and can exceed the default max episode steps which can be an issue as imitation learning algorithms learn to solve the task at the same speed the demonstrations solve it. In this case, you can use the `--max-episode-steps` flag to set a higher value so that the policy can solve the task in time. General recommendation is to set `--max-episode-steps` to about 2x the length of the mean demonstrations length you are using for training. We have tuned baselines in the `baselines.sh` script that set a recommended `--max-episode-steps` for each task.
+Note that some demonstrations are slow (e.g. motion planning or human teleoperated) and can exceed the default max episode steps which can be an issue as imitation learning algorithms learn to solve the task at the same speed the demonstrations solve it. In this case, you can use the `--max-episode-steps` flag to set a higher value so that the policy can solve the task in time. General recommendation is to set `--max-episode-steps` to about 2x the length of the mean demonstrations length you are using for training. We have tuned baselines in the `baselines.sh` script that set a recommended `--max-episode-steps` for each task. Note we have not yet tuned/tested DP for RGB+Depth, just RGB or state only.
 
 Example state based training, learning from 100 demonstrations generated via motionplanning in the PickCube-v1 task
 
@@ -35,6 +35,19 @@ python train.py --env-id PickCube-v1 \
   --track # track training on wandb
 ```
 
+Example RGB based training (which currently assumes input images are 128x128), learning from 100 demonstrations generated via motionplanning in the PickCube-v1 task
+
+```bash
+seed=1
+demos=100
+python train_rgbd.py --env-id PickCube-v1 \
+  --demo-path ~/.maniskill/demos/PickCube-v1/motionplanning/trajectory.rgb.pd_ee_delta_pos.physx_cpu.h5 \
+  --control-mode "pd_ee_delta_pos" --sim-backend "physx_cpu" --num-demos ${demos} --max_episode_steps 100 \
+  --total_iters 30000 --obs-mode "rgb" \
+  --exp-name diffusion_policy-PickCube-v1-rgb-${demos}_motionplanning_demos-${seed} \
+  --track
+```
+
 ## Citation
 
 If you use this baseline please cite the following