Skip to content

Commit 047b989

Browse files
committed
Changed the new implementation of the "old" eval to be a flag.
As discussed in the pull request, using the default market evaluation is now an optional flag. This also adds a comment about why we use mergesort and updates the README according to the new changes.
1 parent 315ba4c commit 047b989

File tree

2 files changed

+20
-4
lines changed

2 files changed

+20
-4
lines changed

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -273,8 +273,8 @@ The evaluation code in this repository simply uses the scikit-learn code, and th
273273
Unfortunately, almost no paper mentions which code-base they used and how they computed `mAP` scores, so comparison is difficult.
274274
Other frameworks have [the same problem](https://github.com/Cysu/open-reid/issues/50), but we expect many not to be aware of this.
275275

276-
To make the evaluating code independent of the sklearn version we have implemented our own version of the average precision computation.
277-
This now follows the official Market1501 code and results in values directly comparable.
276+
We provide evaluation code that computes the mAP as done by the Market-1501 MATLAB evaluation script, independent of the scikit-learn version.
277+
This can be used by providing the `--use_market_ap` flag when running `evaluate.py`.
278278

279279
# Independent re-implementations
280280

evaluate.py

+18-2
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
import h5py
88
import json
99
import numpy as np
10+
from sklearn.metrics import average_precision_score
1011
import tensorflow as tf
1112

1213
import common
@@ -50,8 +51,15 @@
5051
'--batch_size', default=256, type=common.positive_int,
5152
help='Batch size used during evaluation, adapt based on your memory usage.')
5253

54+
parser.add_argument(
55+
'--use_market_ap', action='store_true', default=False,
56+
help='When this flag is provided, the average precision is computed exactly'
57+
' as done by the Market-1501 evaluation script, rather than the '
58+
'default scikit-learn implementation that gives slightly different'
59+
'scores.')
60+
5361

54-
def average_precision_score(y_true, y_score):
62+
def average_precision_score_market(y_true, y_score):
5563
""" Compute average precision (AP) from prediction scores.
5664
5765
This is a replacement for the scikit-learn version which, while likely more
@@ -75,6 +83,8 @@ def average_precision_score(y_true, y_score):
7583
'got lengths y_true:{} and y_score:{}'.format(
7684
len(y_true), len(y_score)))
7785

86+
# Mergesort is used since it is a stable sorting algorithm. This is
87+
# important to compute consistent and correct scores.
7888
y_true_sorted = y_true[np.argsort(-y_score, kind='mergesort')]
7989

8090
tp = np.cumsum(y_true_sorted)
@@ -119,6 +129,12 @@ def main():
119129

120130
batch_distances = loss.cdist(batch_embs, gallery_embs, metric=args.metric)
121131

132+
# Check if we should use Market-1501 specific average precision computation.
133+
if args.use_market_ap:
134+
average_precision = average_precision_score_market
135+
else:
136+
average_precision = average_precision_score
137+
122138
# Loop over the query embeddings and compute their APs and the CMC curve.
123139
aps = []
124140
cmc = np.zeros(len(gallery_pids), dtype=np.int32)
@@ -153,7 +169,7 @@ def main():
153169
# it won't change anything.
154170
scores = 1 / (1 + distances)
155171
for i in range(len(distances)):
156-
ap = average_precision_score(pid_matches[i], scores[i])
172+
ap = average_precision(pid_matches[i], scores[i])
157173

158174
if np.isnan(ap):
159175
print()

0 commit comments

Comments
 (0)