MotionGPT3 is a bimodal motion-language framework using MoT architecture designed to address the challenges of unified motion understanding and generation.
Technical details
Inspired by the mixture of experts, we propose MotionGPT3, a bimodal motion-language model that treats human motion as a second modality, decoupling motion modeling via separate model parameters and enabling both effective cross-modal interaction and efficient multimodal scaling training.
To preserve language intelligence, the text branch remains the same with the pretrained language model, while a motion branch is integrated via shared attention, enabling bidirectional information flow between two modalities. We employ a motion VAE to encode raw human motion into latent representations, while motion branch predicts motion latents directly from intermediate hidden states using a diffusion head, bypassing discrete tokenization.
Extensive experiments show that our approach achieves competitive performance on both motion understanding and generation tasks while preserving strong language capabilities, establishing a unified bimodal motion diffusion framework within an autoregressive manner.

- [2025/06/30] Upload and init project
Setup and download
conda create python=3.11 --name mgpt
conda activate mgpt
Install the packages in requirements.txt
and install PyTorch 2.0
pip install -r requirements.txt
python -m spacy download en_core_web_sm
We test our code on Python 3.11.11 and PyTorch 2.0.0.
Run the script to download dependencies materials:
bash prepare/download_smpl_model.sh
bash prepare/prepare_gpt2.sh
For Text to Motion Evaluation
bash prepare/download_t2m_evaluators.sh
Batch demo
We support txt file input, the output motions are npy files and output texts are txt files. Please check the configs/assets.yaml
for path config, TEST.FOLDER as output folder.
Then, run the following script:
python demo.py --cfg ./configs/MoT_vae_stage3.yaml --example ./demos/t2m.txt
Some parameters:
--example=./demo/t2m.txt
: input file as text prompts--task=t2m
: evaluation tasks including t2m, m2t, pred, inbetween
The outputs:
npy file
: the generated motions with the shape of (nframe, 22, 3)txt file
: the input text prompt or text output
Training guidance
-
Please refer to HumanML3D for text-to-motion dataset setup.
-
Put the instructions data in
prepare/instructions
to the same folder of HumanML3D dataset. -
(Optional) Refer to MotionGPT-Training guidance to generate motion code for VQ-based training.
bash prepare/download_motiongpt_pretrained_models.sh python -m scripts.get_motion_code --cfg configs/config_motiongpt.yaml
Please first check the parameters in configs/MoT_vae_stage1_t2m.yaml
, e.g. NAME
, instruction_type
, lm_ablation
, DEBUG
.
Then, run the following command:
python gen_mot_gpt.py
python -m train --cfg configs/MoT_vae_stage1_t2m.yaml --nodebug
Please update the parameters in configs/MoT_vae_stage2_instruct.yaml
and configs/MoT_vae_stage2_all.yaml
, e.g. NAME
, instruction_type
, lm_ablation
, DEBUG
, PRETRAINED_VAE
(change to your latest ckpt model path
in previous step)
Then, run the following command:
python -m train --cfg configs/MoT_vae_stage2_all.yaml --nodebug
python -m train --cfg configs/MoT_vae_stage2_instruct.yaml --nodebug
Please update the parameters in configs/MoT_vae_stage3.yaml
, e.g. NAME
, instruction_type
, lm_ablation
, DEBUG
, PRETRAINED
(change to your latest ckpt model path
in previous step)
Then, run the following command:
python -m train --cfg configs/MoT_vae_stage3.yaml --nodebug
Please first put the tained model checkpoint path to TEST.CHECKPOINT
in config files, e.g. configs/MoT_vae_stage3.yaml
.
Then, run the following command:
python -m test --cfg configs/MoT_vae_stage3.yaml --task t2m
Some parameters:
--task
: evaluation tasks including t2m(Text-to-Motion), m2t(Motion translation), pred(Motion prediction), inbetween(Motion inbetween)
Render SMPL
Refer to TEMOS-Rendering motions for blender setup, then install the following dependencies.
YOUR_BLENDER_PYTHON_PATH/python -m pip install -r prepare/requirements_render.txt
Run the following command using blender:
YOUR_BLENDER_PATH/blender --background --python render.py -- --cfg=./configs/render.yaml --dir=YOUR_NPY_FOLDER --mode=video
python -m fit --dir YOUR_NPY_FOLDER --save_folder TEMP_PLY_FOLDER --cuda
This outputs:
mesh npy file
: the generate SMPL vertices with the shape of (nframe, 6893, 3)ply files
: the ply mesh file for blender or meshlab
Run the following command to render SMPL using blender:
YOUR_BLENDER_PATH/blender --background --python render.py -- --cfg=./configs/render.yaml --dir=YOUR_NPY_FOLDER --mode=video
optional parameters:
--mode=video
: render mp4 video--mode=sequence
: render the whole motion in a png image.
If you find our code or paper helps, please consider citing:
@misc{zhu2025motiongpt3humanmotionsecond,
title={MotionGPT3: Human Motion as a Second Modality},
author={Bingfan Zhu and Biao Jiang and Sunyi Wang and Shixiang Tang and Tao Chen and Linjie Luo and Youyi Zheng and Xin Chen},
year={2025},
eprint={2506.24086},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.24086},
}
Thanks to MotionGPT, Motion-latent-diffusion, HumanML3D and MAR, our code is partially borrowing from them.
This code is distributed under an MIT LICENSE.
Note that our code depends on other libraries, including SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.