Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support state+<visual_textures> mode #821

Merged
merged 2 commits into from
Feb 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/user_guide/concepts/observation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
All ManiSkill tasks take the observation mode (`obs_mode`) as one of the input arguments of `__init__`.
In general, the observation is organized as a dictionary (with an observation space of `gym.spaces.Dict`).

There are two raw observations modes: `state_dict` (privileged states) and `sensor_data` (raw sensor data like visual data without postprocessing). `state` is a flat version of `state_dict`. `rgb+depth`, `rgb+depth+segmentation` (or any combination of `rgb`, `depth`, `segmentation`), and `pointcloud` apply post-processing on `sensor_data` to give convenient representations of visual data.
There are three raw observations modes: `state_dict` (privileged states), `sensor_data` (raw sensor data like visual data without postprocessing) and `state+sensor_data` for both. `state` is a flat version of `state_dict`. `rgb+depth`, `rgb+depth+segmentation` (or any combination of `rgb`, `depth`, `segmentation`), and `pointcloud` apply post-processing on `sensor_data` to give convenient representations of visual data. `state+rgb` would return privileged states and visual data, you can mix and match the different modalities however you like.

The details here show the unbatched shapes. In general there is always a batch dimension unless you are using CPU simulation. Moreover, we annotate what dtype some values are, where some have both a torch and numpy dtype depending on whether you are using GPU or CPU simulation respectively.
The details here show the unbatched shapes. In general returned data always has a batch dimension unless you are using CPU simulation and returned as torch tensors. Moreover, we annotate what dtype some values are.

### state_dict

Expand Down
3 changes: 3 additions & 0 deletions mani_skill/envs/utils/observations/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@ def parse_visual_obs_mode_to_struct(obs_mode: str) -> CameraObsTextures:
# Parse obs mode into individual texture types
textures = obs_mode.split("+")
for texture in textures:
if texture == "state" or texture == "state_dict":
# allows fetching privileged state data in addition to visual data.
continue
assert (
texture in ALL_TEXTURES
), f"Invalid texture type '{texture}' requested in the obs mode '{obs_mode}'. Each individual texture must be one of {ALL_TEXTURES}"
Expand Down
1 change: 1 addition & 0 deletions mani_skill/examples/demo_random_action.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ def main(args: Args):
while True:
action = env.action_space.sample() if env.action_space is not None else None
obs, reward, terminated, truncated, info = env.step(action)
print(obs.keys(), obs["extra"])
if verbose:
print("reward", reward)
print("terminated", terminated)
Expand Down