-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wrong results on SuperCLEVER test #100
Comments
i get the same result 42.5, i want to know when you after grpo,the acc is? |
maybe reason for doing sampling, I can't get 48% either |
I followed the instruction on README.md and run its script on to evaluate on SuperCLEVER dataset. But even with basic Qwen2-VL-2B-Instruct, the accuracy is 42.5%, different from 48%. I have no idea why this happens.
Below is part of the log file:
{
"accuracy": 42.5,
"results": [
{
"question": {
"image_path": "./images/superCLEVR_new_025000.png",
"question": "How many different items are there in the image?",
"ground_truth": 4
},
"ground_truth": 4,
"model_output": "\n There are four distinct objects in the image: a cyan fighter, a red minivan, a cyan mountain bike, and a green mountain bike.\n \n\n 4\n ",
"extracted_answer": 4
},
{
"question": {
"image_path": "./images/superCLEVR_new_025001.png",
"question": "How many different items are there in the image?",
"ground_truth": 9
},
"ground_truth": 9,
"model_output": "\nA. 10\n\n\n10\n",
"extracted_answer": 10
},
{
"question": {
"image_path": "./images/superCLEVR_new_025002.png",
"question": "How many different items are there in the image?",
"ground_truth": 10
},
"ground_truth": 10,
"model_output": "\nA. 10\n\n\n10\n",
"extracted_answer": 10
},
{
"question": {
"image_path": "./images/superCLEVR_new_025003.png",
"question": "How many different items are there in the image?",
"ground_truth": 4
},
"ground_truth": 4,
"model_output": "\nThe image contains four distinct objects: a purple dirt bike, a blue mountain bike, a brown bicycle, and a red tandem bicycle. There are no other objects in the image.\n\n\n\n0\n",
"extracted_answer": 0
},
{
"question": {
"image_path": "./images/superCLEVR_new_025004.png",
"question": "How many different items are there in the image?",
"ground_truth": 3
},
"ground_truth": 3,
"model_output": "\nTo determine the number of different items in the image, let's analyze each object:\n\n1. There is a yellow car.\n2. There is a blue car.\n3. There is a gray motorcycle.\n\nSo, there are three different items in the image.\n\n\n3",
"extracted_answer": 3
},
...
The text was updated successfully, but these errors were encountered: