Add a DoubleValuesSource for scoring full precision vector similarity #14708

vigyasharma · 2025-05-24T07:57:18Z

A DoubleValuesSource that scores on full precision vectors can be used on top of quantized knn vector queries to rerank with full precision similarity scores.

This change adds a 'full precision' similarity mode to existing VectorSimilarityValuesSources.
Supports #14009

github-actions · 2025-05-24T07:58:12Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

msokolov · 2025-05-28T16:05:14Z

+1 to add support for full precision re-ranking. Have you considered writing a FullPrecisionVectorSimilaritySource as a separate class? We like to avoid conditional logic on boolean parameters where possible. I don't know if there is really a need for a byte-flavored version either? For the Query support we can expect the query to be supplied with high precision.

During indexing wouldn't we want to re-rank using non-quantized query vectors as well as full-precision document vectors? Not sure if that can be solved using a DoubleValuesSource - we would probably need to bake that in to the codec/hnswsearcher etc. I'm not sure if that is envisioned, anyway, and it makes sense to start with the Query-side.

benwtrent · 2025-05-28T16:09:52Z

Have you considered writing a FullPrecisionVectorSimilaritySource as a separate class?

A separate class would allow users to provide a custom vector comparator, which might be beneficial.

But I agree, a new similarity source is a good idea here!

vigyasharma · 2025-05-28T18:52:18Z

Thanks for the review folks! I like the idea of a separate class and a custom vector comparator, will make these changes.

vigyasharma · 2025-05-28T18:59:36Z

I'm not sure about the byte vector case myself. Do we see a viable need for it in FullPrecisionVectorSimilaritySource ?

benwtrent · 2025-05-28T19:01:02Z

I'm not sure about the byte vector case myself. Do we see a viable need for it in FullPrecisionVectorSimilaritySource ?

I am not sure. As of right now, none of the quantization schemes support vectors that are already bytes.

But, for future proofing, I think the name of the new similarity source should indicate the appropriate vector type right? Maybe we support byte in the future?

dungba88 · 2025-05-30T03:34:49Z

lucene/core/src/java/org/apache/lucene/search/ByteVectorSimilarityValuesSource.java

+      @Override
+      public float score() throws IOException {
+        return vectorSimilarityFunction.compare(
+            queryVector, vectorValues.vectorValue(iterator.index()));


One of the possible concern for the reranking is that calling this will page in the full vector data (or what is being used in the iterator), and can compete the RAM with normal vectors used in HNSW graph search. As HNSW will suffer the performance if the vectors are not in RAM, I'm wondering if we can restrict the memory used by the re-ranking phase. Or alternatively, using mlock (could be tricky in Java) to lock in the normal vectors page.

This can be done separately too. I'm just curious what you think about this.

The answer here is directIO for ranking. mlock feels tricky to implement well in Java land.

But I don't think this concern should block this PR.

It's a valid concern for setups with limited memory.

As HNSW will suffer the performance if the vectors are not in RAM, I'm wondering if we can restrict the memory used by the re-ranking phase.

Maybe.. I wonder how we decide that the pages used for HNSW search are more important than pages used for FP reranking. For an application which does KNN search and reranks via full precision vectors, a query doesn't really complete until both phases are done. Wouldn't thrashing queries during reranking add to overall query latency. Maybe this is okay if you were reranking for only a subset of queries, and the vast majority is still only HNSW search, but that seems very use-case specific. Might be best to let OS page cache handle this?

Anyway, I think this deserves it's own discussion, perhaps in a separate issue? And as you and others have already mentioned, it can be handled independent of this PR.

I do remember reading some results on the DiskANN issue where benchmarks indicated that having the vectors needed for ANN graph search in memory (the quantized vectors in this case), does lead to better performance. So maybe, an option to use only DIRECT_IO for this makes sense.

I'm putting #14746 for further discussion on this topic.

github-actions · 2025-06-02T00:01:24Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

vigyasharma · 2025-06-02T00:06:50Z

Moved full precision scores logic to a separate FullPrecisionFloatVectorSimilarityValuesSource that can take a custom vector similarity function.

dungba88

LGTM, thank you

lucene/core/src/test/org/apache/lucene/search/TestQuantizedVectorSimilarityValueSource.java

vigyasharma · 2025-06-15T04:56:46Z

Thanks all for reviewing this PR. I've addressed the comments above. Will merge this early next week, unless there is any new feedback.

…#14708)

vigyasharma · 2025-08-08T18:23:49Z

I added some benchmarks to measure NDCG for knn search with 4 bit quantized vectors, reranked with full precision vector similarity scores here - mikemccand/luceneutil#435

Pasting results below as well.

+++

Benchmark Results

I see an 8% - 14% improvement in NDCG@10 and a 6% - 8% improvement in NDCG@K for 4 bit quantized knn search with full precision reranking. Improvement increases with index size. Latency impact doesn't seem significant?

recall  ndcg@10  ndcg@K  rerank  latency(ms)  netCPU  avgCpuCount      nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.524    0.920   0.600   false        2.235   2.216        0.991    100000   100      20       32         50     4 bits     16.96       5896.23             4          333.95       329.971       37.003       HNSW
 0.524    0.999   0.637    true        2.279   2.261        0.992    100000   100      20       32         50     4 bits      0.00      Infinity             4          333.95       329.971       37.003       HNSW
 0.490    0.901   0.568   false        3.199   3.183        0.995    500000   100      20       32         50     4 bits     70.18       7124.23             3         1670.44      1649.857      185.013       HNSW
 0.490    0.999   0.607    true        3.201   3.181        0.994    500000   100      20       32         50     4 bits      0.00      Infinity             3         1670.44      1649.857      185.013       HNSW
 0.480    0.891   0.557   false        4.774   4.756        0.996   1000000   100      20       32         50     4 bits    134.12       7456.07             6         3341.86      3299.713      370.026       HNSW
 0.480    0.999   0.598    true        4.697   4.675        0.995   1000000   100      20       32         50     4 bits      0.00      Infinity             6         3341.86      3299.713      370.026       HNSW
 0.462    0.883   0.541   false        5.081   5.035        0.991   2000000   100      20       32         50     4 bits    465.46       4296.82             7         6688.32      6599.426      740.051       HNSW
 0.462    0.998   0.583    true        4.885   4.863        0.995   2000000   100      20       32         50     4 bits      0.00      Infinity             7         6688.32      6599.426      740.051       HNSW
 0.447    0.871   0.526   false       11.633  11.495        0.988  10000000   100      20       32         50     4 bits   2110.54       4738.12            14        33489.11     32997.131     3700.256       HNSW
 0.447    0.998   0.569    true       11.037  10.984        0.995  10000000   100      20       32         50     4 bits      0.00      Infinity            14        33489.11     32997.131     3700.256       HNSW

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking May 24, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking May 24, 2025

github-actions bot added the module:core/search label May 24, 2025

vigyasharma mentioned this pull request May 24, 2025

Add Query for reranking KnnFloatVectorQuery with full-precision vectors #14009

Merged

dungba88 reviewed May 30, 2025

View reviewed changes

vigyasharma force-pushed the fp_vector_score branch from 08dccad to 07d44de Compare June 2, 2025 00:00

github-actions bot added this to the 10.3.0 milestone Jun 2, 2025

dungba88 mentioned this pull request Jun 2, 2025

Can we support vectors to be loaded with direct I/O for full precision re-ranking? #14746

Open

vigyasharma mentioned this pull request Jun 12, 2025

Add a rescorer that uses DoubleValuesSource values to re-score first pass hits #14776

Merged

dungba88 approved these changes Jun 12, 2025

View reviewed changes

dungba88 reviewed Jun 13, 2025

View reviewed changes

lucene/core/src/test/org/apache/lucene/search/TestQuantizedVectorSimilarityValueSource.java Outdated Show resolved Hide resolved

github-actions bot added module:analysis module:core/index module:demo module:highlighter module:benchmark module:suggest module:spatial module:luke module:core/codecs module:facet module:grouping labels Jun 15, 2025

github-actions bot modified the milestones: 11.0.0, 10.3.0 Jun 15, 2025

vigyasharma merged commit 72a655c into apache:main Jun 17, 2025
7 checks passed

github-project-automation bot moved this from Open to Merged in OpenSearch Lucene & Core Performance Tracking Jun 17, 2025

vigyasharma added a commit that referenced this pull request Jun 17, 2025

Add a DoubleValuesSource for scoring full precision vector similarity (…

6e2f1e6

…#14708)

Add a DoubleValuesSource for scoring full precision vector similarity #14708

Add a DoubleValuesSource for scoring full precision vector similarity #14708

Uh oh!

Conversation

vigyasharma commented May 24, 2025

Uh oh!

github-actions bot commented May 24, 2025

Uh oh!

msokolov commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benwtrent commented May 28, 2025

Uh oh!

vigyasharma commented May 28, 2025

Uh oh!

vigyasharma commented May 28, 2025

Uh oh!

benwtrent commented May 28, 2025

Uh oh!

dungba88 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

dungba88 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent May 30, 2025

Choose a reason for hiding this comment

Uh oh!

vigyasharma Jun 1, 2025

Choose a reason for hiding this comment

Uh oh!

vigyasharma Jun 1, 2025

Choose a reason for hiding this comment

Uh oh!

dungba88 Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 2, 2025

Uh oh!

vigyasharma commented Jun 2, 2025

Uh oh!

dungba88 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vigyasharma commented Jun 15, 2025

Uh oh!

Uh oh!

vigyasharma commented Aug 8, 2025

Benchmark Results

Uh oh!

Uh oh!

msokolov commented May 28, 2025 •

edited

Loading