Add NDCG and full precision reranking to knn benchmarks #435

vigyasharma · 2025-08-08T18:17:33Z

Adds support to evaluate improvements in knn search ranking quality using Normalized Discounted Cumulative Gain (NDCG) at 10, and at k for configured topK. This is useful in measuring the impact of reranking quantized search results with full precision vectors.

Also adds support to rerank full precision vectors using a DoubleValuesSourceRescorer on FullPrecisionFloatVectorSimilarityValuesSource

Addresses #401

+++

Benchmark Results

I see an 8% - 14% improvement in NDCG@10 and a 6% - 8% improvement in NDCG@K for 4 bit quantized knn search with full precision reranking. Improvement increases with index size. Latency impact doesn't seem significant?

recall  ndcg@10  ndcg@K  rerank  latency(ms)  netCPU  avgCpuCount      nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.524    0.920   0.600   false        2.235   2.216        0.991    100000   100      20       32         50     4 bits     16.96       5896.23             4          333.95       329.971       37.003       HNSW
 0.524    0.999   0.637    true        2.279   2.261        0.992    100000   100      20       32         50     4 bits      0.00      Infinity             4          333.95       329.971       37.003       HNSW
 0.490    0.901   0.568   false        3.199   3.183        0.995    500000   100      20       32         50     4 bits     70.18       7124.23             3         1670.44      1649.857      185.013       HNSW
 0.490    0.999   0.607    true        3.201   3.181        0.994    500000   100      20       32         50     4 bits      0.00      Infinity             3         1670.44      1649.857      185.013       HNSW
 0.480    0.891   0.557   false        4.774   4.756        0.996   1000000   100      20       32         50     4 bits    134.12       7456.07             6         3341.86      3299.713      370.026       HNSW
 0.480    0.999   0.598    true        4.697   4.675        0.995   1000000   100      20       32         50     4 bits      0.00      Infinity             6         3341.86      3299.713      370.026       HNSW
 0.462    0.883   0.541   false        5.081   5.035        0.991   2000000   100      20       32         50     4 bits    465.46       4296.82             7         6688.32      6599.426      740.051       HNSW
 0.462    0.998   0.583    true        4.885   4.863        0.995   2000000   100      20       32         50     4 bits      0.00      Infinity             7         6688.32      6599.426      740.051       HNSW
 0.447    0.871   0.526   false       11.633  11.495        0.988  10000000   100      20       32         50     4 bits   2110.54       4738.12            14        33489.11     32997.131     3700.256       HNSW
 0.447    0.998   0.569    true       11.037  10.984        0.995  10000000   100      20       32         50     4 bits      0.00      Infinity            14        33489.11     32997.131     3700.256       HNSW

mikemccand · 2025-08-08T19:23:12Z

Awesome! Now we can benchmark with any re-ranker that's implemented as a DoubleValuesSource!

Curious, the latency seems a wee bit faster with re-rank? Is it possible the first run was somehow cold? I wonder if you swap the order (so rerank=true goes first) if that alters the results?

Thanks @vigyasharma.

vigyasharma · 2025-08-08T20:37:10Z

Curious, the latency seems a wee bit faster with re-rank? Is it possible the first run was somehow cold? I wonder if you swap the order (so rerank=true goes first) if that alters the results?

Maybe. Running another test with rerank=true first to confirm.

vigyasharma · 2025-08-11T04:50:59Z

Ran some more benchmarks...

For 4 bit quantized vectors, I see an ~8% improvement in NDCG@10 and NDCG@K (with k=100) on reranking with full precision vectors. The improvement, not surprisingly, is much more with 5x oversampling: ~51% for NDCG@K.

With 1 bit quantized vectors, there was a ~35% improvement in NDCG@K with 5x oversampling + reranking with full precision vectors. Reranking did not show any visible change in recall. For these runs, I ran rerank=true/false separately and merged benchmark results manually later for easier comparison.

I found it interesting that knn search results from 1 bit quantized vector were closer to full precision relevance order than 4 bit quantized ones. Nice @benwtrent !

# 4 bit quantized
recall  ndcg@10  ndcg@K   rerank  overSample  latency(ms)  netCPU  avgCpuCount     nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample
 0.524    0.921   0.600    false           1        2.231   2.221        0.996   100000   100      20       32         50     4 bits     15.51       6446.21             4          333.98       1.000
 0.523    0.999   0.636     true           1        2.254   2.239        0.993   100000   100      20       32         50     4 bits     17.11       5844.19             4          333.98       1.000
 0.883    0.921   0.608    false           5        5.159   5.146        0.997   100000   100      20       32         50     4 bits      0.00      Infinity             4          333.98       5.000
 0.884    1.000   0.916     true           5        6.181   6.146        0.994   100000   100      20       32         50     4 bits      0.00      Infinity             4          333.98       5.000

 0.508    0.911   0.584    false           1        2.710   2.691        0.993   200000   100      20       32         50     4 bits     24.48       8170.94             3          668.09       1.000
 0.507    0.999   0.622     true           1        2.862   2.849        0.995   200000   100      20       32         50     4 bits     24.32       8224.36             3          668.08       1.000
 0.865    0.910   0.591    false           5        6.861   6.845        0.998   200000   100      20       32         50     4 bits      0.00      Infinity             3          668.09       5.000
 0.863    1.000   0.900     true           5        7.353   7.320        0.996   200000   100      20       32         50     4 bits      0.00      Infinity             3          668.08       5.000

 0.493    0.901   0.571    false           1        3.535   3.521        0.996   500000   100      20       32         50     4 bits     58.75       8510.93             4         1670.33       1.000
 0.493    0.999   0.610     true           1        4.076   4.064        0.997   500000   100      20       32         50     4 bits     53.14       9409.99             5         1670.25       1.000
 0.844    0.904   0.580    false           5        8.285   8.268        0.998   500000   100      20       32         50     4 bits      0.00      Infinity             4         1670.33       5.000
 0.844    1.000   0.886     true           5        9.586   9.553        0.997   500000   100      20       32         50     4 bits      0.00      Infinity             5         1670.25       5.000

 0.480    0.893   0.558    false           1        5.544   5.533        0.998  1000000   100      20       32         50     4 bits    125.60       7961.66             9         3341.60       1.000
 0.480    0.999   0.599     true           1        5.500   5.489        0.998  1000000   100      20       32         50     4 bits    125.75       7952.35             8         3341.72       1.000
 0.829    0.896   0.568    false           5       10.911  10.893        0.998  1000000   100      20       32         50     4 bits      0.00      Infinity             9         3341.60       5.000
 0.827    1.000   0.873     true           5       11.528  11.493        0.997  1000000   100      20       32         50     4 bits      0.00      Infinity             8         3341.72       5.000

 0.468    0.880   0.546    false           1        7.150   7.125        0.997  2000000   100      20       32         50     4 bits    260.16       7687.70            10         6685.94       1.000
 0.467    0.999   0.588     true           1        5.928   5.894        0.994  2000000   100      20       32         50     4 bits    306.61       6523.01             6         6687.17       1.000
 0.812    0.884   0.557    false           5       13.813  13.791        0.998  2000000   100      20       32         50     4 bits      0.00      Infinity            10         6685.94       5.000
 0.809    1.000   0.859     true           5       12.784  12.752        0.997  2000000   100      20       32         50     4 bits      0.00      Infinity             6         6687.17       5.000

# 1 bit quantized
recall  ndcg@10  ndcg@K   rerank  overSample  latency(ms)  netCPU  avgCpuCount     nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample
 0.623    0.978   0.696    false           1        1.197   1.186        0.991   100000   100      20       32         50     1 bits      0.00      Infinity             3          307.49       1.000
 0.622    0.999   0.717     true           1        1.523   1.508        0.990   100000   100      20       32         50     1 bits     22.72       4400.63             3          307.51       1.000
 0.947    0.979   0.713    false           5        2.817   2.801        0.994   100000   100      20       32         50     1 bits      0.00      Infinity             3          307.49       5.000
 0.947    1.000   0.962     true           5        3.882   3.846        0.991   100000   100      20       32         50     1 bits      0.00      Infinity             3          307.51       5.000
 
 0.615    0.975   0.689    false           1        1.653   1.643        0.994   200000   100      20       32         50     1 bits     35.65       5609.94             3          615.08       1.000
 0.616    1.000   0.712     true           1        1.880   1.860        0.989   200000   100      20       32         50     1 bits     34.50       5796.93             3          615.08       1.000
 0.940    0.976   0.702    false           5        3.726   3.714        0.997   200000   100      20       32         50     1 bits      0.00      Infinity             3          615.08       5.000
 0.941    1.000   0.957     true           5        4.851   4.816        0.993   200000   100      20       32         50     1 bits      0.00      Infinity             3          615.08       5.000
 
 0.603    0.968   0.677    false           1        3.081   3.071        0.997   500000   100      20       32         50     1 bits     56.14       8906.78             7         1537.86       1.000
 0.601    0.999   0.700     true           1        3.474   3.453        0.994   500000   100      20       32         50     1 bits     57.48       8698.38             7         1537.83       1.000
 0.932    0.971   0.691    false           5        6.615   6.598        0.997   500000   100      20       32         50     1 bits      0.00      Infinity             7         1537.86       5.000
 0.930    1.000   0.950     true           5        7.312   7.278        0.995   500000   100      20       32         50     1 bits      0.00      Infinity             7         1537.83       5.000
 
 0.589    0.966   0.664    false           1        4.440   4.424        0.996  1000000   100      20       32         50     1 bits    112.66       8875.95             9         3076.38       1.000
 0.588    0.999   0.689     true           1        4.536   4.520        0.996  1000000   100      20       32         50     1 bits    127.19       7862.25             9         3076.33       1.000
 0.920    0.968   0.679    false           5        7.879   7.862        0.998  1000000   100      20       32         50     1 bits      0.00      Infinity             9         3076.38       5.000
 0.919    1.000   0.941     true           5        9.132   9.097        0.996  1000000   100      20       32         50     1 bits      0.00      Infinity             9         3076.33       5.000
 
 0.570    0.959   0.647    false           1        3.626   3.596        0.992  2000000   100      20       32         50     1 bits    411.33       4862.32             6         6157.14       1.000
 0.573    0.999   0.676     true           1        4.102   4.084        0.996  2000000   100      20       32         50     1 bits    420.06       4761.25             6         6158.07       1.000
 0.897    0.963   0.669    false           5        7.894   7.860        0.996  2000000   100      20       32         50     1 bits      0.00      Infinity             6         6157.14       5.000
 0.901    1.000   0.928     true           5        8.835   8.801        0.996  2000000   100      20       32         50     1 bits      0.00      Infinity             6         6158.07       5.000

benwtrent

I don't understand how this ndgc calculation (brute force rank vs. approximate rank) gives much value.

It seems to me that we could only do ndgc if the query vectors actually had labels (not by brute-force, but some relevancy data set).

We then calculate ndgc@10 and compare with what the model claims to provide for a given data set.

EDIT: Ah, yeah, I see what its doing, I think...

As for 1 bit being better than 4 bit, I think 4 bit can be made WAY better if we switched the codec over to use the new quantization scheme...let me open a Lucene issue :D.

vigyasharma · 2025-08-13T20:40:20Z

It gives a sense of ranking quality when using vector similarity as relevance scores. We use full precision vector similarity scores from brute-force search results as "ideal relevance" order. And compare it against the ranking order from ann results, which could be different with quantization.

Having a relevancy data set with established query-relevance sets, (like MS MARCO maybe?), would be ideal. We need something like that to test improvements from reranking with late interaction model multi-vectors. I'm looking to add that support, created this as a baby step in that direction. Was also just having this conversation on a different thread with @atris

benwtrent · 2025-08-13T21:13:21Z

Cool, I like the "baby steps" approach.

mikemccand

This looks awesome! Except I'm worried it'll break nightly KNN benchy...

mikemccand · 2025-08-08T19:59:27Z

src/main/knn/KnnGraphTester.java

      System.out.printf(
          Locale.ROOT,
-          "SUMMARY: %5.3f\t%5.3f\t%5.3f\t%5.3f\t%d\t%d\t%d\t%d\t%d\t%s\t%d\t%.2f\t%.2f\t%.2f\t%d\t%.2f\t%.2f\t%s\t%5.3f\t%5.3f\t%5.3f\t%s\n",
+          "SUMMARY: %5.3f\t%5.3f\t%5.3f\t%s\t%5.3f\t%5.3f\t%5.3f\t%d\t%d\t%d\t%d\t%d\t%s\t%d\t%.2f\t%.2f\t%.2f\t%d\t%.2f\t%.2f\t%s\t%5.3f\t%5.3f\t%5.3f\t%s\n",


Uh oh -- nightly benchy will be angry -- could you fix it to expect these new inserted columns? Don't worry about testing it ... I can do that when we merge this.

How do I do that Mike? Do I need to declare these columns somewhere for nightly benchmarks?

mikemccand · 2025-08-18T14:07:42Z

As for 1 bit being better than 4 bit, I think 4 bit can be made WAY better if we switched the codec over to use the new quantization scheme...let me open a Lucene issue :D.

Whoa, which new quantization scheme is this! Sounds exciting...

rohithreddynedhunuri-cmyk · 2025-08-18T15:13:12Z

As for 1 bit being better than 4 bit, I think 4 bit can be made WAY better if we switched the codec over to use the new quantization scheme...let me open a Lucene issue :D.

Whoa, which new quantization scheme is this! Sounds exciting...

+1

vigyasharma · 2025-08-18T18:32:03Z

As for 1 bit being better than 4 bit, I think 4 bit can be made WAY better if we switched the codec over to use the new quantization scheme...let me open a Lucene issue :D.

Whoa, which new quantization scheme is this! Sounds exciting...

Ben opened an issue for this here: apache/lucene#15064

github-actions · 2025-09-02T00:09:32Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

vigyasharma and others added 7 commits August 7, 2025 16:58

start storing ids and scores

47f23bc

ndcg working

5c6eb39

add rerank support

a829682

remove commented code

e69424a

run params

2543a63

print rerank flag in summary

0490500

pr ready

dd06331

vigyasharma mentioned this pull request Aug 8, 2025

Add a DoubleValuesSource for scoring full precision vector similarity apache/lucene#14708

Merged

fix reranking bug and always print ndcg

79c7635

benwtrent reviewed Aug 13, 2025

View reviewed changes

benwtrent mentioned this pull request Aug 13, 2025

Switch over current scalar quantization formats to use OptimizedScalarQuantizer apache/lucene#15064

Closed

mikemccand approved these changes Aug 18, 2025

View reviewed changes

github-actions bot added the Stale label Sep 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add NDCG and full precision reranking to knn benchmarks #435

Add NDCG and full precision reranking to knn benchmarks #435

Uh oh!

vigyasharma commented Aug 8, 2025 •

edited

Loading

Uh oh!

mikemccand commented Aug 8, 2025

Uh oh!

vigyasharma commented Aug 8, 2025

Uh oh!

vigyasharma commented Aug 11, 2025 •

edited

Loading

Uh oh!

benwtrent left a comment •

edited

Loading

Uh oh!

vigyasharma commented Aug 13, 2025

Uh oh!

benwtrent commented Aug 13, 2025

Uh oh!

mikemccand left a comment

Uh oh!

mikemccand Aug 8, 2025

Uh oh!

vigyasharma Aug 18, 2025

Uh oh!

mikemccand commented Aug 18, 2025

Uh oh!

rohithreddynedhunuri-cmyk commented Aug 18, 2025

Uh oh!

vigyasharma commented Aug 18, 2025

Uh oh!

github-actions bot commented Sep 2, 2025

Uh oh!

Uh oh!

Add NDCG and full precision reranking to knn benchmarks #435

Are you sure you want to change the base?

Add NDCG and full precision reranking to knn benchmarks #435

Uh oh!

Conversation

vigyasharma commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

Uh oh!

mikemccand commented Aug 8, 2025

Uh oh!

vigyasharma commented Aug 8, 2025

Uh oh!

vigyasharma commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benwtrent left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vigyasharma commented Aug 13, 2025

Uh oh!

benwtrent commented Aug 13, 2025

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

mikemccand Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

vigyasharma Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

mikemccand commented Aug 18, 2025

Uh oh!

rohithreddynedhunuri-cmyk commented Aug 18, 2025

Uh oh!

vigyasharma commented Aug 18, 2025

Uh oh!

github-actions bot commented Sep 2, 2025

Uh oh!

Uh oh!

vigyasharma commented Aug 8, 2025 •

edited

Loading

vigyasharma commented Aug 11, 2025 •

edited

Loading

benwtrent left a comment •

edited

Loading