Skip to content

Conversation

@bingogome
Copy link

What this does

It converts back from v3.0 dataset to v2.1 in case you need back compatibility.

Examples:

Title Label
Fixes #[issue] (πŸ› Bug)
Adds new dataset (πŸ—ƒοΈ Dataset)
Optimizes something (⚑️ Performance)

How it was tested

Convert a dataset and visualize it using lerobot tools to confirm data validity.

SECTION TO REMOVE BEFORE SUBMITTING YOUR PR

Note: Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR. Try to avoid tagging more than 3 people.

Note: Before submitting this PR, please read the contributor guideline.

Copilot AI review requested due to automatic review settings October 3, 2025 00:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds functionality to convert LeRobot datasets from v3.0 format back to v2.1 format for backward compatibility. This is the reverse operation of the existing v2.1 to v3.0 conversion.

  • Adds a new conversion script that transforms the consolidated v3.0 file layout back to the legacy per-episode structure of v2.1
  • Implements reverse transformations for data files, video files, metadata, and configuration
  • Provides command-line interface for dataset conversion with options for local directories and force conversion

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@5hadytru
Copy link

5hadytru commented Oct 14, 2025

Hey y'all; thanks for the useful script! I ran into a bug (in the context of fine-tuning GR00T-N1.5-3B with the Isaac-GR00T repo) which I fixed by making the following changes to the v2.1-converted dataset's info.json. Old lines:

"data_path": "data/chunk-{chunk_index:03d}/episode_{episode_index:06d}.parquet",
"video_path": "videos/chunk-{chunk_index:03d}/{video_key}/episode_{episode_index:06d}.mp4",

New lines:

"data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet",
"video_path": "videos/chunk-{episode_chunk:03d}/{video_key}/episode_{episode_index:06d}.mp4",

And for stats.json, I removed the "count" entry for the "action" and "observation.state" entries. This now matches actual v2.1 metadata sufficiently for GR00T fine-tuning. I believe y'all need to change the "LEGACY_DATA_PATH_TEMPLATE" and "LEGACY_VIDEO_PATH_TEMPLATE" variables in y'all's script + probably something else.

@bingogome
Copy link
Author

You probably also need a script from v2.1 to v2.0: checkout here.

@bingogome bingogome changed the title feat:v3.0 to v2.1 feat:v3.0 to v2.1 (and v2.1 to v2.0) Oct 14, 2025


def convert_dataset(
repo_id: str,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the converstion script, this will be really helpful as currently Nvidia Groot N1.5 expects the LeRobot format to be v2.1. However, can you also add support to convert the data sets locally from V3.0 to V2.1 instead of only supporting the datasets uploaded to Hugging Face?

@bingogome bingogome closed this Nov 13, 2025
@bingogome bingogome deleted the v3_0-to-v2_1 branch November 13, 2025 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants