Results from the FashionIQ dataset #10

Y111555 · 2025-04-16T04:19:17Z

I used openAI's CLIP ViT-B/32 to test on FashionIQ's validation set. The results obtained and the results reported in the paper are very different, may I ask what skills exist? Did the results in the paper come from using openAI's CLIP ViT-B/32? It seems to be closer to the openclip result.

Results from openAI:
dress_Recall@1 = 3.47
dress_Recall@5 = 9.87
dress_Recall@10 = 14.53
dress_Recall@50 = 33.22

Results from openclip:
dress_Recall@1 = 7.44
dress_Recall@5 = 18.74
dress_Recall@10 = 25.33
dress_Recall@50 = 46.50

sgk98 · 2025-04-17T12:04:33Z

Hello, thanks for your interest in our work!
I just went through regenerating the results on different benchmarks and realized that for Fashion-IQ they do seem to be based on the OpenCLIP series of models (your results look very similar to what I am getting with the older gpt3.5-turbo generated captions). I'm sorry for the confusion, and I hope this helps.

Y111555 · 2025-04-17T12:13:03Z

Ok. Thank you for your clarification

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Results from the FashionIQ dataset #10

Results from the FashionIQ dataset #10

Y111555 commented Apr 16, 2025

sgk98 commented Apr 17, 2025

Uh oh!

Y111555 commented Apr 17, 2025

Uh oh!

Results from the FashionIQ dataset #10

Results from the FashionIQ dataset #10

Comments

Y111555 commented Apr 16, 2025

sgk98 commented Apr 17, 2025

Uh oh!

Y111555 commented Apr 17, 2025

Uh oh!