-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
evaluation on partial test set #23
Comments
Hi @rahman-mdatiqur , I am happy to clarify this issue. TL; DR: The original source of Details:
|
Hello @HYPJUDY, Thanks again for your quick and wonderful response. It eventually directs me to raise the following concern. Since you do not mention in your paper that you are evaluating on 210 videos instead of 213, how does it fare to compare your method in Table. 2 against other SOTA methods that report results on 213 test videos? I mean, does leaving those test videos out from the evaluation give you any advantage over the other SOTA methods in terms of mAP? I know that you are not removing the corresponding annotations from the ground-truth annotations located in https://github.com/HYPJUDY/Decouple-SSAD/tree/master/EvalKit/THUMOS14_evalkit_20150930/annotation. But, I did not check the evaluation script to see if it would be advantageous/disadvantageous to leave some videos out from the evaluation set. Can you please comment on this? Thanks in advance. |
Hi @rahman-mdatiqur , thanks for your good question.
It seems that the code only evaluate the common videos of ground truths and detected results. So if the model can produce good (bad) results for these three videos, then the map should be better (worse) by incorporating their results. If your code is ready, you can quickly validate these by ablation experiments:
I think if the annotations of some videos are obviously wrong, then we should exclude them. Otherwise the overall result is not correct and the evaluation on these wrong annotated videos is meaningless. I should have clarified the video number (210) in paper. Thanks for you reminding. |
Thanks @HYPJUDY for suggesting ways to evaluate the effect of excluding videos from the predictions list. As you said, since doing good(bad) on these videos may improve(downgrade) final [email protected], and since the SOTA methods report results on all 213 videos without making any modifications to the ground-truth annotations, I believe, new methods should follow the same protocol when comparing with SOTA methods, or mention the video numbers while comparing. Thanks much for all the thoughts and helpful feedback. |
You are welcome! |
Hello @HYPJUDY,
It seems that, you are not doing evaluation on the THUMOS'14 full test set. As you reported in your paper, THUMOS'14 detection task is evaluated on 213 test videos. But, your test window_info.log file is missing window info for the following 3 test videos, as your thumos14_test_annotation.csv is missing annotations for those videos. As a result, you are basically evaluating your model on 210 test videos instead of 213.
video_test_0000270
video_test_0001292
video_test_0001496
Can you please comment on why this is the case?
Thanks much.
The text was updated successfully, but these errors were encountered: