-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Global HSNW vector index #6103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Global HSNW vector index #6103
Conversation
| this->filter = make_unique<AstNode>(std::move(filter)); | ||
| } | ||
|
|
||
| bool AstKnnNode::PreFilter() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PreFilter is a bad name, imho. Maybe HasPreFilter and then revert the condition filter != nullptr
| // Queries should be done directly on subclasses with their distinc | ||
| // query functions. All results for all index types should be sorted. | ||
| struct BaseIndex { | ||
| template <typename T> struct BaseIndex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do all of this because you want to use the hnsw index with different types, it seems like it would be easier to branch off somewhere there. Maybe create a templated "base hnsw index" type and a subtype that implements BaseIndex (if we need it at all). Either way it looks simpler than changing everythign up the chain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have BaseIndex that has virtual functions
// Returns true if the document was added / indexed
virtual bool Add(T id, const DocumentAccessor& doc, std::string_view field) = 0;
virtual void Remove(T id, const DocumentAccessor& doc, std::string_view field) = 0;
// Returns documents that have non-null values for this field (used for @field:* queries)
// Result must be sorted
virtual std::vector<T> GetAllDocsWithNonNullValues() const = 0;
BaseVectorIndex is derived class and implements Add as final function.
There are 2 derived classes from BaseVectorIndex. FlatVectorIndex that uses DocId and HnswVectorIndex uses GlobalDocId.
FlatVectorIndex is stored in shard so it is stored in indices as base class of BaseIndex. We don't do this for HnswVectorIndex.
What could be possible is to just write HnswVectorIndex without any base class and implement all function that are used - wdyt ? @dranikpg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BaseVectorIndex is just a helper class that provides:
- info lookup about dimension etc
- a conversion function from document to vector
You can move it out of the inheritance chain, so FlatIndex inherits both BaseVectorIndex and DocIndex, whereas HNSW index will inherit only BaseVectorIndex. There are many ways to re-use the code
|
|
||
| SpaceUnion space_; | ||
| hnswlib::HierarchicalNSW<float> world_; | ||
| absl::Mutex resize_mutex_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why resize_mutex_ is needed? I thought HierarchicalNSW is thread-safe
| search::HnswVectorIndex* index, const search::AstKnnNode* knn, | ||
| const std::optional<std::vector<search::GlobalDocId>>& allowed_docs); | ||
|
|
||
| std::unique_ptr<search::BaseVectorIndex<search::GlobalDocId>> vector_index_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it follows @dranikpg question but what is the added value of declaring here BaseVectorIndex instead of specifying search::HnswVectorIndex explicitly. Moreover, what is the value of having GlobalHnswVectorIndex wrappin search::HnswVectorIndex ? why not have only class?
|
|
||
| shard_set->PreShutdown(); | ||
| shard_set->Shutdown(); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets introduce SearchFamily::Shutdown() and move it there -no need to leak the implementation into main_service.cc
| #include "absl/container/flat_hash_set.h" | ||
| #include "base/pmr/memory_resource.h" | ||
| #include "core/string_map.h" | ||
| #include "server/tx_base.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not good. we can not include server headers into core library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just redefine using ShardId = uint16_t; under GlobalDocId
No description provided.