Skip to content

HorizonRobotics/RoboTransfer

Repository files navigation

RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer

🌐 Project Page 📄 arXiv 🎥 Video 中文介绍 机器之心介绍

RoboTransfer, a diffusion-based video generation framework for robotic data synthesis. Unlike previous methods, RoboTransfer integrates multi-view geometry with explicit control over scene components, such as background and object attributes. By incorporating cross-view feature interactions and global depth/normal conditions, RoboTransfer ensures geometry consistency across views. This framework allows fine-grained control, including background edits and object swaps.

Overall Framework

✅ Setup Environment

We use uv to manage dependencies, to get our environments:

git clone https://github.com/HorizonRobotics/RoboTransfer.git
cd RoboTransfer
export UV_HTTP_TIMEOUT=600
uv sync
uv pip install -e .

🚀 Inference

uv run main.py # --mem_efficient for 4090

📈 More Inference Data

Update the dependencies of the data pipeline.

uv sync --extra data

⚙️ For more sim data

You can obtain more simulation data from the RoboTwin CVPR Challenge.

You can then use the process_sim.sh script to convert raw data (.pickle files and .hdf5) into the RoboTransfer format with geometric conditioning.

script/process_sim.sh

🤖 For more real data

For real-world data collected by the ALOHA-AgileX robot system, access the dataset RoboTransfer-RealData. You can then process raw RGB images using the process_real.sh script to convert them into RoboTransfer format with geometric conditioning.

script/process_real.sh

🙌 Acknowledgement

RoboTransfer builds upon the following amazing projects and models: 🌟 Video-Depth-Anything 🌟 Lotus 🌟 GPT4o 🌟 GroundSam 🌟 IOPaint

⚖️ License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

📚 Citation

If you use RoboTransfer in your research or projects, please cite:

@misc{liu2025robotransfergeometryconsistentvideodiffusion,
      title={RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer},
      author={Liu Liu and Xiaofeng Wang and Guosheng Zhao and Keyu Li and Wenkang Qin and Jiaxiong Qiu and Zheng Zhu and Guan Huang and Zhizhong Su},
      year={2025},
      eprint={2505.23171},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.23171},
}

About

Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published