RoboTransfer, a diffusion-based video generation framework for robotic data synthesis. Unlike previous methods, RoboTransfer integrates multi-view geometry with explicit control over scene components, such as background and object attributes. By incorporating cross-view feature interactions and global depth/normal conditions, RoboTransfer ensures geometry consistency across views. This framework allows fine-grained control, including background edits and object swaps.
We use uv to manage dependencies, to get our environments:
git clone https://github.com/HorizonRobotics/RoboTransfer.git
cd RoboTransfer
export UV_HTTP_TIMEOUT=600
uv sync
uv pip install -e .
uv run main.py # --mem_efficient for 4090
Update the dependencies of the data pipeline.
uv sync --extra data
You can obtain more simulation data from the RoboTwin CVPR Challenge.
You can then use the process_sim.sh script to convert raw data (.pickle files and .hdf5) into the RoboTransfer format with geometric conditioning.
script/process_sim.sh
For real-world data collected by the ALOHA-AgileX robot system, access the dataset RoboTransfer-RealData. You can then process raw RGB images using the process_real.sh script to convert them into RoboTransfer format with geometric conditioning.
script/process_real.sh
RoboTransfer builds upon the following amazing projects and models: 🌟 Video-Depth-Anything 🌟 Lotus 🌟 GPT4o 🌟 GroundSam 🌟 IOPaint
This project is licensed under the Apache License 2.0. See the LICENSE
file for details.
If you use RoboTransfer in your research or projects, please cite:
@misc{liu2025robotransfergeometryconsistentvideodiffusion,
title={RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer},
author={Liu Liu and Xiaofeng Wang and Guosheng Zhao and Keyu Li and Wenkang Qin and Jiaxiong Qiu and Zheng Zhu and Guan Huang and Zhizhong Su},
year={2025},
eprint={2505.23171},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.23171},
}