HRIBench

To systematically study VLM capabilities in human perception for HRI and performance-latency trade-offs, we introduce HRIBench, a visual question-answering (VQA) benchmark designed to evaluate VLMs across a diverse set of human perceptual tasks critical for HRI. HRIBench covers five key domains: (1) non-verbal cue understanding, (2) verbal instruction understanding, (3) human-robot-object relationship understanding, (4) social navigation, and (5) person identification. To construct HRIBench, we collected data from real-world HRI environments to curate questions for non-verbal cue understanding, and leveraged publicly available datasets for the remaining four domains. We curated 200 VQA questions for each domain, resulting in a total of 1000 questions for HRIBench.

You can find more details regarding our methods and results in our paper.

Dataset Access

You can download HRIBench using this google drive link.

Currently, the following four domains are publicly available in the google drive above: (1) non-verbal cue understanding (Non_Verbal), (2) verbal instruction understanding (Ins_Follow), (3) human-robot-object relationship understanding (HROI), (4) social navigation (Social_Nav).

Since we used frames from YouTube videos for the domain of person identification (Id_Rec), we are still looking into rules regarding copyright before publicly releasing data for this domain.

Dataset Structure

HRIBench follows the same dataset strcture as ManipBench.

For each domain, there are 200 VQA questions (demo_0 - demo_199). Within each question, the following files are included:

driod.txt: This file includes the question prompt with the options.
correct_answer.json: This file includes the correct answer.
wrong_answers/wrong_answers.json: This file include the wrong answers.
wrong_answers/processed_frame.png: This file include the input image with all the visual options for the VQA questions.
wrong_answers/processed_frame.png: Only questions in the Id_Rec domain includes this file, which is the photo of the person VLM needs to identify from the available options.

Results Logs

You can find our results logs saved as JSON files in their corresponding directories in this repo.

Citation

@article{shi2025hribench,
  title={HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction},
  author={Shi, Zhonghao and Zhao, Enyu and Dennler, Nathaniel and Wang, Jingzhen and Xu, Xinyang and Shrestha, Kaleen and Fu, Mengxue and Seita, Daniel and Matari{\'c}, Maja},
  journal={19th International Symposium on Experimental Robotics (ISER)},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
HROI		HROI
Id_Rec		Id_Rec
Ins_Follow		Ins_Follow
Non_Verbal		Non_Verbal
Social_Nav		Social_Nav
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HRIBench

Dataset Access

Dataset Structure

Results Logs

Citation

About

Uh oh!

Releases

Packages

License

interaction-lab/HRIBench

Folders and files

Latest commit

History

Repository files navigation

HRIBench

Dataset Access

Dataset Structure

Results Logs

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages