Skip to content

Conversation

vigyasharma
Copy link
Collaborator

@vigyasharma vigyasharma commented Aug 8, 2025

Adds support to evaluate improvements in knn search ranking quality using Normalized Discounted Cumulative Gain (NDCG) at 10, and at k for configured topK. This is useful in measuring the impact of reranking quantized search results with full precision vectors.

Also adds support to rerank full precision vectors using a DoubleValuesSourceRescorer on FullPrecisionFloatVectorSimilarityValuesSource

Addresses #401

+++

Benchmark Results

I see an 8% - 14% improvement in NDCG@10 and a 6% - 8% improvement in NDCG@K for 4 bit quantized knn search with full precision reranking. Improvement increases with index size. Latency impact doesn't seem significant?

recall  ndcg@10  ndcg@K  rerank  latency(ms)  netCPU  avgCpuCount      nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.524    0.920   0.600   false        2.235   2.216        0.991    100000   100      20       32         50     4 bits     16.96       5896.23             4          333.95       329.971       37.003       HNSW
 0.524    0.999   0.637    true        2.279   2.261        0.992    100000   100      20       32         50     4 bits      0.00      Infinity             4          333.95       329.971       37.003       HNSW
 0.490    0.901   0.568   false        3.199   3.183        0.995    500000   100      20       32         50     4 bits     70.18       7124.23             3         1670.44      1649.857      185.013       HNSW
 0.490    0.999   0.607    true        3.201   3.181        0.994    500000   100      20       32         50     4 bits      0.00      Infinity             3         1670.44      1649.857      185.013       HNSW
 0.480    0.891   0.557   false        4.774   4.756        0.996   1000000   100      20       32         50     4 bits    134.12       7456.07             6         3341.86      3299.713      370.026       HNSW
 0.480    0.999   0.598    true        4.697   4.675        0.995   1000000   100      20       32         50     4 bits      0.00      Infinity             6         3341.86      3299.713      370.026       HNSW
 0.462    0.883   0.541   false        5.081   5.035        0.991   2000000   100      20       32         50     4 bits    465.46       4296.82             7         6688.32      6599.426      740.051       HNSW
 0.462    0.998   0.583    true        4.885   4.863        0.995   2000000   100      20       32         50     4 bits      0.00      Infinity             7         6688.32      6599.426      740.051       HNSW
 0.447    0.871   0.526   false       11.633  11.495        0.988  10000000   100      20       32         50     4 bits   2110.54       4738.12            14        33489.11     32997.131     3700.256       HNSW
 0.447    0.998   0.569    true       11.037  10.984        0.995  10000000   100      20       32         50     4 bits      0.00      Infinity            14        33489.11     32997.131     3700.256       HNSW

@mikemccand
Copy link
Owner

Awesome! Now we can benchmark with any re-ranker that's implemented as a DoubleValuesSource!

Curious, the latency seems a wee bit faster with re-rank? Is it possible the first run was somehow cold? I wonder if you swap the order (so rerank=true goes first) if that alters the results?

Thanks @vigyasharma.

@vigyasharma
Copy link
Collaborator Author

Curious, the latency seems a wee bit faster with re-rank? Is it possible the first run was somehow cold? I wonder if you swap the order (so rerank=true goes first) if that alters the results?

Maybe. Running another test with rerank=true first to confirm.

@vigyasharma
Copy link
Collaborator Author

vigyasharma commented Aug 11, 2025

Ran some more benchmarks...

For 4 bit quantized vectors, I see an ~8% improvement in NDCG@10 and NDCG@K (with k=100) on reranking with full precision vectors. The improvement, not surprisingly, is much more with 5x oversampling: ~51% for NDCG@K.

With 1 bit quantized vectors, there was a ~35% improvement in NDCG@K with 5x oversampling + reranking with full precision vectors. Reranking did not show any visible change in recall. For these runs, I ran rerank=true/false separately and merged benchmark results manually later for easier comparison.

I found it interesting that knn search results from 1 bit quantized vector were closer to full precision relevance order than 4 bit quantized ones. Nice @benwtrent !

# 4 bit quantized
recall  ndcg@10  ndcg@K   rerank  overSample  latency(ms)  netCPU  avgCpuCount     nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample
 0.524    0.921   0.600    false           1        2.231   2.221        0.996   100000   100      20       32         50     4 bits     15.51       6446.21             4          333.98       1.000
 0.523    0.999   0.636     true           1        2.254   2.239        0.993   100000   100      20       32         50     4 bits     17.11       5844.19             4          333.98       1.000
 0.883    0.921   0.608    false           5        5.159   5.146        0.997   100000   100      20       32         50     4 bits      0.00      Infinity             4          333.98       5.000
 0.884    1.000   0.916     true           5        6.181   6.146        0.994   100000   100      20       32         50     4 bits      0.00      Infinity             4          333.98       5.000

 0.508    0.911   0.584    false           1        2.710   2.691        0.993   200000   100      20       32         50     4 bits     24.48       8170.94             3          668.09       1.000
 0.507    0.999   0.622     true           1        2.862   2.849        0.995   200000   100      20       32         50     4 bits     24.32       8224.36             3          668.08       1.000
 0.865    0.910   0.591    false           5        6.861   6.845        0.998   200000   100      20       32         50     4 bits      0.00      Infinity             3          668.09       5.000
 0.863    1.000   0.900     true           5        7.353   7.320        0.996   200000   100      20       32         50     4 bits      0.00      Infinity             3          668.08       5.000

 0.493    0.901   0.571    false           1        3.535   3.521        0.996   500000   100      20       32         50     4 bits     58.75       8510.93             4         1670.33       1.000
 0.493    0.999   0.610     true           1        4.076   4.064        0.997   500000   100      20       32         50     4 bits     53.14       9409.99             5         1670.25       1.000
 0.844    0.904   0.580    false           5        8.285   8.268        0.998   500000   100      20       32         50     4 bits      0.00      Infinity             4         1670.33       5.000
 0.844    1.000   0.886     true           5        9.586   9.553        0.997   500000   100      20       32         50     4 bits      0.00      Infinity             5         1670.25       5.000

 0.480    0.893   0.558    false           1        5.544   5.533        0.998  1000000   100      20       32         50     4 bits    125.60       7961.66             9         3341.60       1.000
 0.480    0.999   0.599     true           1        5.500   5.489        0.998  1000000   100      20       32         50     4 bits    125.75       7952.35             8         3341.72       1.000
 0.829    0.896   0.568    false           5       10.911  10.893        0.998  1000000   100      20       32         50     4 bits      0.00      Infinity             9         3341.60       5.000
 0.827    1.000   0.873     true           5       11.528  11.493        0.997  1000000   100      20       32         50     4 bits      0.00      Infinity             8         3341.72       5.000

 0.468    0.880   0.546    false           1        7.150   7.125        0.997  2000000   100      20       32         50     4 bits    260.16       7687.70            10         6685.94       1.000
 0.467    0.999   0.588     true           1        5.928   5.894        0.994  2000000   100      20       32         50     4 bits    306.61       6523.01             6         6687.17       1.000
 0.812    0.884   0.557    false           5       13.813  13.791        0.998  2000000   100      20       32         50     4 bits      0.00      Infinity            10         6685.94       5.000
 0.809    1.000   0.859     true           5       12.784  12.752        0.997  2000000   100      20       32         50     4 bits      0.00      Infinity             6         6687.17       5.000
# 1 bit quantized
recall  ndcg@10  ndcg@K   rerank  overSample  latency(ms)  netCPU  avgCpuCount     nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample
 0.623    0.978   0.696    false           1        1.197   1.186        0.991   100000   100      20       32         50     1 bits      0.00      Infinity             3          307.49       1.000
 0.622    0.999   0.717     true           1        1.523   1.508        0.990   100000   100      20       32         50     1 bits     22.72       4400.63             3          307.51       1.000
 0.947    0.979   0.713    false           5        2.817   2.801        0.994   100000   100      20       32         50     1 bits      0.00      Infinity             3          307.49       5.000
 0.947    1.000   0.962     true           5        3.882   3.846        0.991   100000   100      20       32         50     1 bits      0.00      Infinity             3          307.51       5.000
 
 0.615    0.975   0.689    false           1        1.653   1.643        0.994   200000   100      20       32         50     1 bits     35.65       5609.94             3          615.08       1.000
 0.616    1.000   0.712     true           1        1.880   1.860        0.989   200000   100      20       32         50     1 bits     34.50       5796.93             3          615.08       1.000
 0.940    0.976   0.702    false           5        3.726   3.714        0.997   200000   100      20       32         50     1 bits      0.00      Infinity             3          615.08       5.000
 0.941    1.000   0.957     true           5        4.851   4.816        0.993   200000   100      20       32         50     1 bits      0.00      Infinity             3          615.08       5.000
 
 0.603    0.968   0.677    false           1        3.081   3.071        0.997   500000   100      20       32         50     1 bits     56.14       8906.78             7         1537.86       1.000
 0.601    0.999   0.700     true           1        3.474   3.453        0.994   500000   100      20       32         50     1 bits     57.48       8698.38             7         1537.83       1.000
 0.932    0.971   0.691    false           5        6.615   6.598        0.997   500000   100      20       32         50     1 bits      0.00      Infinity             7         1537.86       5.000
 0.930    1.000   0.950     true           5        7.312   7.278        0.995   500000   100      20       32         50     1 bits      0.00      Infinity             7         1537.83       5.000
 
 0.589    0.966   0.664    false           1        4.440   4.424        0.996  1000000   100      20       32         50     1 bits    112.66       8875.95             9         3076.38       1.000
 0.588    0.999   0.689     true           1        4.536   4.520        0.996  1000000   100      20       32         50     1 bits    127.19       7862.25             9         3076.33       1.000
 0.920    0.968   0.679    false           5        7.879   7.862        0.998  1000000   100      20       32         50     1 bits      0.00      Infinity             9         3076.38       5.000
 0.919    1.000   0.941     true           5        9.132   9.097        0.996  1000000   100      20       32         50     1 bits      0.00      Infinity             9         3076.33       5.000
 
 0.570    0.959   0.647    false           1        3.626   3.596        0.992  2000000   100      20       32         50     1 bits    411.33       4862.32             6         6157.14       1.000
 0.573    0.999   0.676     true           1        4.102   4.084        0.996  2000000   100      20       32         50     1 bits    420.06       4761.25             6         6158.07       1.000
 0.897    0.963   0.669    false           5        7.894   7.860        0.996  2000000   100      20       32         50     1 bits      0.00      Infinity             6         6157.14       5.000
 0.901    1.000   0.928     true           5        8.835   8.801        0.996  2000000   100      20       32         50     1 bits      0.00      Infinity             6         6158.07       5.000

Copy link
Collaborator

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this ndgc calculation (brute force rank vs. approximate rank) gives much value.

It seems to me that we could only do ndgc if the query vectors actually had labels (not by brute-force, but some relevancy data set).

We then calculate ndgc@10 and compare with what the model claims to provide for a given data set.

EDIT: Ah, yeah, I see what its doing, I think...

As for 1 bit being better than 4 bit, I think 4 bit can be made WAY better if we switched the codec over to use the new quantization scheme...let me open a Lucene issue :D.

@vigyasharma
Copy link
Collaborator Author

It gives a sense of ranking quality when using vector similarity as relevance scores. We use full precision vector similarity scores from brute-force search results as "ideal relevance" order. And compare it against the ranking order from ann results, which could be different with quantization.

Having a relevancy data set with established query-relevance sets, (like MS MARCO maybe?), would be ideal. We need something like that to test improvements from reranking with late interaction model multi-vectors. I'm looking to add that support, created this as a baby step in that direction. Was also just having this conversation on a different thread with @atris

@benwtrent
Copy link
Collaborator

Cool, I like the "baby steps" approach.

Copy link
Owner

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! Except I'm worried it'll break nightly KNN benchy...

System.out.printf(
Locale.ROOT,
"SUMMARY: %5.3f\t%5.3f\t%5.3f\t%5.3f\t%d\t%d\t%d\t%d\t%d\t%s\t%d\t%.2f\t%.2f\t%.2f\t%d\t%.2f\t%.2f\t%s\t%5.3f\t%5.3f\t%5.3f\t%s\n",
"SUMMARY: %5.3f\t%5.3f\t%5.3f\t%s\t%5.3f\t%5.3f\t%5.3f\t%d\t%d\t%d\t%d\t%d\t%s\t%d\t%.2f\t%.2f\t%.2f\t%d\t%.2f\t%.2f\t%s\t%5.3f\t%5.3f\t%5.3f\t%s\n",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh oh -- nightly benchy will be angry -- could you fix it to expect these new inserted columns? Don't worry about testing it ... I can do that when we merge this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I do that Mike? Do I need to declare these columns somewhere for nightly benchmarks?

@mikemccand
Copy link
Owner

As for 1 bit being better than 4 bit, I think 4 bit can be made WAY better if we switched the codec over to use the new quantization scheme...let me open a Lucene issue :D.

Whoa, which new quantization scheme is this! Sounds exciting...

@rohithreddynedhunuri-cmyk

As for 1 bit being better than 4 bit, I think 4 bit can be made WAY better if we switched the codec over to use the new quantization scheme...let me open a Lucene issue :D.

Whoa, which new quantization scheme is this! Sounds exciting...

+1

@vigyasharma
Copy link
Collaborator Author

As for 1 bit being better than 4 bit, I think 4 bit can be made WAY better if we switched the codec over to use the new quantization scheme...let me open a Lucene issue :D.

Whoa, which new quantization scheme is this! Sounds exciting...

Ben opened an issue for this here: apache/lucene#15064

Copy link

github-actions bot commented Sep 2, 2025

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants