ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models

Dohwan Ko^1*, Sihyeon Kim^1*, Yumin Suh², Vijay Kumar², Minseo Yoon¹, Manmohan Chandraker^2,3, Hyunwoo J. Kim⁴

¹Korea University ²NEC Labs America ³UC San Diego ⁴KAIST

Code will be available soon!

Citations

@article{ko2025st,
  title={ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models},
  author={Ko, Dohwan and Kim, Sihyeon and Suh, Yumin and Yoon, Minseo and Chandraker, Manmohan and Kim, Hyunwoo J and others},
  journal={arXiv preprint arXiv:2503.19355},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models

Citations

Files

README.md

Latest commit

History

README.md

File metadata and controls

ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models

Citations