[Benchmark] Support SimpleVQA #1320

Bujiazi · 2025-11-20T14:11:48Z

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

🤗 huggingface: https://huggingface.co/datasets/m-a-p/SimpleVQA
💻 github: https://github.com/SimpleVQA/SimpleVQA
📖 arXiv: https://arxiv.org/abs/2502.13059

A VQA benchmark using GPT-4o as a judger. A 16-thread GPT-4o access logic is implemented to accelerate the evaluation. Qwen2.5-VL-7B-Instruct is tested on it, the result is slightly higher than the reported version.

The default environment for VLMEvalKit + Qwen-2.5-VL-7B-Instruct is utilized when implementing and testing the benchmark.

Bujiazi added 2 commits November 20, 2025 21:00

support simplevqa

d441633

fix flake8

f0d1ea3

FangXinyu-0913 merged commit 8dc65e0 into open-compass:main Nov 21, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Benchmark] Support SimpleVQA #1320

[Benchmark] Support SimpleVQA #1320

Uh oh!

Bujiazi commented Nov 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Benchmark] Support SimpleVQA #1320

[Benchmark] Support SimpleVQA #1320

Uh oh!

Conversation

Bujiazi commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bujiazi commented Nov 20, 2025 •

edited

Loading