-
Notifications
You must be signed in to change notification settings - Fork 136
Add NDCG and full precision reranking to knn benchmarks #435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Awesome! Now we can benchmark with any re-ranker that's implemented as a Curious, the latency seems a wee bit faster with re-rank? Is it possible the first run was somehow cold? I wonder if you swap the order (so Thanks @vigyasharma. |
Maybe. Running another test with |
Ran some more benchmarks... For 4 bit quantized vectors, I see an ~8% improvement in NDCG@10 and NDCG@K (with k=100) on reranking with full precision vectors. The improvement, not surprisingly, is much more with 5x oversampling: ~51% for NDCG@K. With 1 bit quantized vectors, there was a ~35% improvement in NDCG@K with 5x oversampling + reranking with full precision vectors. Reranking did not show any visible change in recall. For these runs, I ran I found it interesting that knn search results from 1 bit quantized vector were closer to full precision relevance order than 4 bit quantized ones. Nice @benwtrent ! # 4 bit quantized
recall ndcg@10 ndcg@K rerank overSample latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s num_segments index_size(MB) overSample
0.524 0.921 0.600 false 1 2.231 2.221 0.996 100000 100 20 32 50 4 bits 15.51 6446.21 4 333.98 1.000
0.523 0.999 0.636 true 1 2.254 2.239 0.993 100000 100 20 32 50 4 bits 17.11 5844.19 4 333.98 1.000
0.883 0.921 0.608 false 5 5.159 5.146 0.997 100000 100 20 32 50 4 bits 0.00 Infinity 4 333.98 5.000
0.884 1.000 0.916 true 5 6.181 6.146 0.994 100000 100 20 32 50 4 bits 0.00 Infinity 4 333.98 5.000
0.508 0.911 0.584 false 1 2.710 2.691 0.993 200000 100 20 32 50 4 bits 24.48 8170.94 3 668.09 1.000
0.507 0.999 0.622 true 1 2.862 2.849 0.995 200000 100 20 32 50 4 bits 24.32 8224.36 3 668.08 1.000
0.865 0.910 0.591 false 5 6.861 6.845 0.998 200000 100 20 32 50 4 bits 0.00 Infinity 3 668.09 5.000
0.863 1.000 0.900 true 5 7.353 7.320 0.996 200000 100 20 32 50 4 bits 0.00 Infinity 3 668.08 5.000
0.493 0.901 0.571 false 1 3.535 3.521 0.996 500000 100 20 32 50 4 bits 58.75 8510.93 4 1670.33 1.000
0.493 0.999 0.610 true 1 4.076 4.064 0.997 500000 100 20 32 50 4 bits 53.14 9409.99 5 1670.25 1.000
0.844 0.904 0.580 false 5 8.285 8.268 0.998 500000 100 20 32 50 4 bits 0.00 Infinity 4 1670.33 5.000
0.844 1.000 0.886 true 5 9.586 9.553 0.997 500000 100 20 32 50 4 bits 0.00 Infinity 5 1670.25 5.000
0.480 0.893 0.558 false 1 5.544 5.533 0.998 1000000 100 20 32 50 4 bits 125.60 7961.66 9 3341.60 1.000
0.480 0.999 0.599 true 1 5.500 5.489 0.998 1000000 100 20 32 50 4 bits 125.75 7952.35 8 3341.72 1.000
0.829 0.896 0.568 false 5 10.911 10.893 0.998 1000000 100 20 32 50 4 bits 0.00 Infinity 9 3341.60 5.000
0.827 1.000 0.873 true 5 11.528 11.493 0.997 1000000 100 20 32 50 4 bits 0.00 Infinity 8 3341.72 5.000
0.468 0.880 0.546 false 1 7.150 7.125 0.997 2000000 100 20 32 50 4 bits 260.16 7687.70 10 6685.94 1.000
0.467 0.999 0.588 true 1 5.928 5.894 0.994 2000000 100 20 32 50 4 bits 306.61 6523.01 6 6687.17 1.000
0.812 0.884 0.557 false 5 13.813 13.791 0.998 2000000 100 20 32 50 4 bits 0.00 Infinity 10 6685.94 5.000
0.809 1.000 0.859 true 5 12.784 12.752 0.997 2000000 100 20 32 50 4 bits 0.00 Infinity 6 6687.17 5.000 # 1 bit quantized
recall ndcg@10 ndcg@K rerank overSample latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn beamWidth quantized index(s) index_docs/s num_segments index_size(MB) overSample
0.623 0.978 0.696 false 1 1.197 1.186 0.991 100000 100 20 32 50 1 bits 0.00 Infinity 3 307.49 1.000
0.622 0.999 0.717 true 1 1.523 1.508 0.990 100000 100 20 32 50 1 bits 22.72 4400.63 3 307.51 1.000
0.947 0.979 0.713 false 5 2.817 2.801 0.994 100000 100 20 32 50 1 bits 0.00 Infinity 3 307.49 5.000
0.947 1.000 0.962 true 5 3.882 3.846 0.991 100000 100 20 32 50 1 bits 0.00 Infinity 3 307.51 5.000
0.615 0.975 0.689 false 1 1.653 1.643 0.994 200000 100 20 32 50 1 bits 35.65 5609.94 3 615.08 1.000
0.616 1.000 0.712 true 1 1.880 1.860 0.989 200000 100 20 32 50 1 bits 34.50 5796.93 3 615.08 1.000
0.940 0.976 0.702 false 5 3.726 3.714 0.997 200000 100 20 32 50 1 bits 0.00 Infinity 3 615.08 5.000
0.941 1.000 0.957 true 5 4.851 4.816 0.993 200000 100 20 32 50 1 bits 0.00 Infinity 3 615.08 5.000
0.603 0.968 0.677 false 1 3.081 3.071 0.997 500000 100 20 32 50 1 bits 56.14 8906.78 7 1537.86 1.000
0.601 0.999 0.700 true 1 3.474 3.453 0.994 500000 100 20 32 50 1 bits 57.48 8698.38 7 1537.83 1.000
0.932 0.971 0.691 false 5 6.615 6.598 0.997 500000 100 20 32 50 1 bits 0.00 Infinity 7 1537.86 5.000
0.930 1.000 0.950 true 5 7.312 7.278 0.995 500000 100 20 32 50 1 bits 0.00 Infinity 7 1537.83 5.000
0.589 0.966 0.664 false 1 4.440 4.424 0.996 1000000 100 20 32 50 1 bits 112.66 8875.95 9 3076.38 1.000
0.588 0.999 0.689 true 1 4.536 4.520 0.996 1000000 100 20 32 50 1 bits 127.19 7862.25 9 3076.33 1.000
0.920 0.968 0.679 false 5 7.879 7.862 0.998 1000000 100 20 32 50 1 bits 0.00 Infinity 9 3076.38 5.000
0.919 1.000 0.941 true 5 9.132 9.097 0.996 1000000 100 20 32 50 1 bits 0.00 Infinity 9 3076.33 5.000
0.570 0.959 0.647 false 1 3.626 3.596 0.992 2000000 100 20 32 50 1 bits 411.33 4862.32 6 6157.14 1.000
0.573 0.999 0.676 true 1 4.102 4.084 0.996 2000000 100 20 32 50 1 bits 420.06 4761.25 6 6158.07 1.000
0.897 0.963 0.669 false 5 7.894 7.860 0.996 2000000 100 20 32 50 1 bits 0.00 Infinity 6 6157.14 5.000
0.901 1.000 0.928 true 5 8.835 8.801 0.996 2000000 100 20 32 50 1 bits 0.00 Infinity 6 6158.07 5.000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this ndgc calculation (brute force rank vs. approximate rank) gives much value.
It seems to me that we could only do ndgc if the query vectors actually had labels (not by brute-force, but some relevancy data set).
We then calculate ndgc@10 and compare with what the model claims to provide for a given data set.
EDIT: Ah, yeah, I see what its doing, I think...
As for 1 bit being better than 4 bit, I think 4 bit can be made WAY better if we switched the codec over to use the new quantization scheme...let me open a Lucene issue :D.
It gives a sense of ranking quality when using vector similarity as relevance scores. We use full precision vector similarity scores from brute-force search results as "ideal relevance" order. And compare it against the ranking order from ann results, which could be different with quantization. Having a relevancy data set with established query-relevance sets, (like MS MARCO maybe?), would be ideal. We need something like that to test improvements from reranking with late interaction model multi-vectors. I'm looking to add that support, created this as a baby step in that direction. Was also just having this conversation on a different thread with @atris |
Cool, I like the "baby steps" approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome! Except I'm worried it'll break nightly KNN benchy...
System.out.printf( | ||
Locale.ROOT, | ||
"SUMMARY: %5.3f\t%5.3f\t%5.3f\t%5.3f\t%d\t%d\t%d\t%d\t%d\t%s\t%d\t%.2f\t%.2f\t%.2f\t%d\t%.2f\t%.2f\t%s\t%5.3f\t%5.3f\t%5.3f\t%s\n", | ||
"SUMMARY: %5.3f\t%5.3f\t%5.3f\t%s\t%5.3f\t%5.3f\t%5.3f\t%d\t%d\t%d\t%d\t%d\t%s\t%d\t%.2f\t%.2f\t%.2f\t%d\t%.2f\t%.2f\t%s\t%5.3f\t%5.3f\t%5.3f\t%s\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh oh -- nightly benchy will be angry -- could you fix it to expect these new inserted columns? Don't worry about testing it ... I can do that when we merge this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I do that Mike? Do I need to declare these columns somewhere for nightly benchmarks?
Whoa, which new quantization scheme is this! Sounds exciting... |
+1 |
Ben opened an issue for this here: apache/lucene#15064 |
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
Adds support to evaluate improvements in knn search ranking quality using Normalized Discounted Cumulative Gain (NDCG) at 10, and at
k
for configured topK. This is useful in measuring the impact of reranking quantized search results with full precision vectors.Also adds support to rerank full precision vectors using a
DoubleValuesSourceRescorer
onFullPrecisionFloatVectorSimilarityValuesSource
Addresses #401
+++
Benchmark Results
I see an
8% - 14%
improvement in NDCG@10 and a6% - 8%
improvement in NDCG@K for 4 bit quantized knn search with full precision reranking. Improvement increases with index size. Latency impact doesn't seem significant?