Skip to content

Conversation

@mkaruza
Copy link
Contributor

@mkaruza mkaruza commented Nov 24, 2025

No description provided.

@mkaruza mkaruza marked this pull request as draft November 24, 2025 12:13
this->filter = make_unique<AstNode>(std::move(filter));
}

bool AstKnnNode::PreFilter() const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PreFilter is a bad name, imho. Maybe HasPreFilter and then revert the condition filter != nullptr

// Queries should be done directly on subclasses with their distinc
// query functions. All results for all index types should be sorted.
struct BaseIndex {
template <typename T> struct BaseIndex {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do all of this because you want to use the hnsw index with different types, it seems like it would be easier to branch off somewhere there. Maybe create a templated "base hnsw index" type and a subtype that implements BaseIndex (if we need it at all). Either way it looks simpler than changing everythign up the chain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have BaseIndex that has virtual functions

  // Returns true if the document was added / indexed
  virtual bool Add(T id, const DocumentAccessor& doc, std::string_view field) = 0;
  virtual void Remove(T id, const DocumentAccessor& doc, std::string_view field) = 0;

  // Returns documents that have non-null values for this field (used for @field:* queries)
  // Result must be sorted
  virtual std::vector<T> GetAllDocsWithNonNullValues() const = 0;

BaseVectorIndex is derived class and implements Add as final function.

There are 2 derived classes from BaseVectorIndex. FlatVectorIndex that uses DocId and HnswVectorIndex uses GlobalDocId.

FlatVectorIndex is stored in shard so it is stored in indices as base class of BaseIndex. We don't do this for HnswVectorIndex.

What could be possible is to just write HnswVectorIndex without any base class and implement all function that are used - wdyt ? @dranikpg

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseVectorIndex is just a helper class that provides:

  • info lookup about dimension etc
  • a conversion function from document to vector

You can move it out of the inheritance chain, so FlatIndex inherits both BaseVectorIndex and DocIndex, whereas HNSW index will inherit only BaseVectorIndex. There are many ways to re-use the code


SpaceUnion space_;
hnswlib::HierarchicalNSW<float> world_;
absl::Mutex resize_mutex_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why resize_mutex_ is needed? I thought HierarchicalNSW is thread-safe

search::HnswVectorIndex* index, const search::AstKnnNode* knn,
const std::optional<std::vector<search::GlobalDocId>>& allowed_docs);

std::unique_ptr<search::BaseVectorIndex<search::GlobalDocId>> vector_index_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it follows @dranikpg question but what is the added value of declaring here BaseVectorIndex instead of specifying search::HnswVectorIndex explicitly. Moreover, what is the value of having GlobalHnswVectorIndex wrappin search::HnswVectorIndex ? why not have only class?


shard_set->PreShutdown();
shard_set->Shutdown();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets introduce SearchFamily::Shutdown() and move it there -no need to leak the implementation into main_service.cc

#include "absl/container/flat_hash_set.h"
#include "base/pmr/memory_resource.h"
#include "core/string_map.h"
#include "server/tx_base.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not good. we can not include server headers into core library.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just redefine using ShardId = uint16_t; under GlobalDocId

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants