Skip to content

Conversation

@Bujiazi
Copy link
Contributor

@Bujiazi Bujiazi commented Nov 20, 2025

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

🤗 huggingface: https://huggingface.co/datasets/m-a-p/SimpleVQA
💻 github: https://github.com/SimpleVQA/SimpleVQA
📖 arXiv: https://arxiv.org/abs/2502.13059

A VQA benchmark using GPT-4o as a judger. A 16-thread GPT-4o access logic is implemented to accelerate the evaluation. Qwen2.5-VL-7B-Instruct is tested on it, the result is slightly higher than the reported version.

image

The default environment for VLMEvalKit + Qwen-2.5-VL-7B-Instruct is utilized when implementing and testing the benchmark.

@FangXinyu-0913 FangXinyu-0913 merged commit 8dc65e0 into open-compass:main Nov 21, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants