Skip to content

Conversation

vigyasharma
Copy link
Contributor

A DoubleValuesSource that scores on full precision vectors can be used on top of quantized knn vector queries to rerank with full precision similarity scores.

This change adds a 'full precision' similarity mode to existing VectorSimilarityValuesSources.
Supports #14009

Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

@msokolov
Copy link
Contributor

msokolov commented May 28, 2025

+1 to add support for full precision re-ranking. Have you considered writing a FullPrecisionVectorSimilaritySource as a separate class? We like to avoid conditional logic on boolean parameters where possible. I don't know if there is really a need for a byte-flavored version either? For the Query support we can expect the query to be supplied with high precision.

During indexing wouldn't we want to re-rank using non-quantized query vectors as well as full-precision document vectors? Not sure if that can be solved using a DoubleValuesSource - we would probably need to bake that in to the codec/hnswsearcher etc. I'm not sure if that is envisioned, anyway, and it makes sense to start with the Query-side.

@benwtrent
Copy link
Member

Have you considered writing a FullPrecisionVectorSimilaritySource as a separate class?

A separate class would allow users to provide a custom vector comparator, which might be beneficial.

But I agree, a new similarity source is a good idea here!

@vigyasharma
Copy link
Contributor Author

Thanks for the review folks! I like the idea of a separate class and a custom vector comparator, will make these changes.

@vigyasharma
Copy link
Contributor Author

I'm not sure about the byte vector case myself. Do we see a viable need for it in FullPrecisionVectorSimilaritySource ?

@benwtrent
Copy link
Member

I'm not sure about the byte vector case myself. Do we see a viable need for it in FullPrecisionVectorSimilaritySource ?

I am not sure. As of right now, none of the quantization schemes support vectors that are already bytes.

But, for future proofing, I think the name of the new similarity source should indicate the appropriate vector type right? Maybe we support byte in the future?

@Override
public float score() throws IOException {
return vectorSimilarityFunction.compare(
queryVector, vectorValues.vectorValue(iterator.index()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the possible concern for the reranking is that calling this will page in the full vector data (or what is being used in the iterator), and can compete the RAM with normal vectors used in HNSW graph search. As HNSW will suffer the performance if the vectors are not in RAM, I'm wondering if we can restrict the memory used by the re-ranking phase. Or alternatively, using mlock (could be tricky in Java) to lock in the normal vectors page.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done separately too. I'm just curious what you think about this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer here is directIO for ranking. mlock feels tricky to implement well in Java land.

But I don't think this concern should block this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a valid concern for setups with limited memory.

As HNSW will suffer the performance if the vectors are not in RAM, I'm wondering if we can restrict the memory used by the re-ranking phase.

Maybe.. I wonder how we decide that the pages used for HNSW search are more important than pages used for FP reranking. For an application which does KNN search and reranks via full precision vectors, a query doesn't really complete until both phases are done. Wouldn't thrashing queries during reranking add to overall query latency. Maybe this is okay if you were reranking for only a subset of queries, and the vast majority is still only HNSW search, but that seems very use-case specific. Might be best to let OS page cache handle this?

Anyway, I think this deserves it's own discussion, perhaps in a separate issue? And as you and others have already mentioned, it can be handled independent of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do remember reading some results on the DiskANN issue where benchmarks indicated that having the vectors needed for ANN graph search in memory (the quantized vectors in this case), does lead to better performance. So maybe, an option to use only DIRECT_IO for this makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm putting #14746 for further discussion on this topic.

Copy link
Contributor

github-actions bot commented Jun 2, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

@github-actions github-actions bot added this to the 10.3.0 milestone Jun 2, 2025
@vigyasharma
Copy link
Contributor Author

Moved full precision scores logic to a separate FullPrecisionFloatVectorSimilarityValuesSource that can take a custom vector similarity function.

Copy link
Contributor

@dungba88 dungba88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

@vigyasharma
Copy link
Contributor Author

Thanks all for reviewing this PR. I've addressed the comments above. Will merge this early next week, unless there is any new feedback.

@vigyasharma vigyasharma merged commit 72a655c into apache:main Jun 17, 2025
7 checks passed
@vigyasharma
Copy link
Contributor Author

I added some benchmarks to measure NDCG for knn search with 4 bit quantized vectors, reranked with full precision vector similarity scores here - mikemccand/luceneutil#435

Pasting results below as well.

+++

Benchmark Results

I see an 8% - 14% improvement in NDCG@10 and a 6% - 8% improvement in NDCG@K for 4 bit quantized knn search with full precision reranking. Improvement increases with index size. Latency impact doesn't seem significant?

recall  ndcg@10  ndcg@K  rerank  latency(ms)  netCPU  avgCpuCount      nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.524    0.920   0.600   false        2.235   2.216        0.991    100000   100      20       32         50     4 bits     16.96       5896.23             4          333.95       329.971       37.003       HNSW
 0.524    0.999   0.637    true        2.279   2.261        0.992    100000   100      20       32         50     4 bits      0.00      Infinity             4          333.95       329.971       37.003       HNSW
 0.490    0.901   0.568   false        3.199   3.183        0.995    500000   100      20       32         50     4 bits     70.18       7124.23             3         1670.44      1649.857      185.013       HNSW
 0.490    0.999   0.607    true        3.201   3.181        0.994    500000   100      20       32         50     4 bits      0.00      Infinity             3         1670.44      1649.857      185.013       HNSW
 0.480    0.891   0.557   false        4.774   4.756        0.996   1000000   100      20       32         50     4 bits    134.12       7456.07             6         3341.86      3299.713      370.026       HNSW
 0.480    0.999   0.598    true        4.697   4.675        0.995   1000000   100      20       32         50     4 bits      0.00      Infinity             6         3341.86      3299.713      370.026       HNSW
 0.462    0.883   0.541   false        5.081   5.035        0.991   2000000   100      20       32         50     4 bits    465.46       4296.82             7         6688.32      6599.426      740.051       HNSW
 0.462    0.998   0.583    true        4.885   4.863        0.995   2000000   100      20       32         50     4 bits      0.00      Infinity             7         6688.32      6599.426      740.051       HNSW
 0.447    0.871   0.526   false       11.633  11.495        0.988  10000000   100      20       32         50     4 bits   2110.54       4738.12            14        33489.11     32997.131     3700.256       HNSW
 0.447    0.998   0.569    true       11.037  10.984        0.995  10000000   100      20       32         50     4 bits      0.00      Infinity            14        33489.11     32997.131     3700.256       HNSW

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants