Skip to content

Latest commit

 

History

History
32 lines (23 loc) · 1.31 KB

README.md

File metadata and controls

32 lines (23 loc) · 1.31 KB

ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models

Dohwan Ko1*, Sihyeon Kim1*, Yumin Suh2, Vijay Kumar2, Minseo Yoon1, Manmohan Chandraker2,3, Hyunwoo J. Kim4

1Korea University 2NEC Labs America 3UC San Diego 4KAIST

arXiv Dataset Model Project Page

Code will be available soon!

Citations

@article{ko2025st,
  title={ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models},
  author={Ko, Dohwan and Kim, Sihyeon and Suh, Yumin and Yoon, Minseo and Chandraker, Manmohan and Kim, Hyunwoo J and others},
  journal={arXiv preprint arXiv:2503.19355},
  year={2025}
}