Qwen2-VL-7B MCQ accuracy on perception cannot be reproduced.

I used Qwen2-VL-7B-Instruct to evaluate perception MCQs. The model output "going ahead" for most questions, making the accuracy ~50%, while the score reported in the paper is 59%. Did you modify the system prompt or user prompts when evaluating?