You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used Qwen2-VL-7B-Instruct to evaluate perception MCQs. The model output "going ahead" for most questions, making the accuracy ~50%, while the score reported in the paper is 59%. Did you modify the system prompt or user prompts when evaluating?