Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create PZ Index Class which Can Be Used by Retrieve #137

Open
mdr223 opened this issue Feb 20, 2025 · 2 comments
Open

Create PZ Index Class which Can Be Used by Retrieve #137

mdr223 opened this issue Feb 20, 2025 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@mdr223
Copy link
Collaborator

mdr223 commented Feb 20, 2025

Right now we do not have a strong abstraction in place for indices which are used by the Retrieve operator.

As a result, we require the user to do some heavy lifting in writing a search_func() which uses their index and returns results to the RetreiveOp.

In an effort to mitigate this heavy lifting -- and to make it easier for us to program against a standardized class -- this issue aims to add a BaseIndex class, along with sub-classes for ChromaIndex and RagatouilleIndex.

These indices will expose:

  • a __str__ function which can replace the need for index_helpers.py
  • a query function, which takes a query: str | list[str] and a results_per_query: int and returns a list | list[list] with the top results_per_query results for each query.

This issue will also standardize the semantics of the search_func. If the user's search function returns a list[str], then RetrieveOp will take the top-k elements from that list. If the user's search function returns a list[list[str]] then RetrieveOp will take the top-k elements from each sub-list.

@mdr223 mdr223 added the enhancement New feature or request label Feb 20, 2025
@mdr223 mdr223 self-assigned this Feb 20, 2025
@mdr223
Copy link
Collaborator Author

mdr223 commented Feb 20, 2025

Based on conversation w/ @sivaprasadsudhir, we will rethink the interface for Retrieve for the longer-term -- but for now we have a short-term solution.

@mdr223
Copy link
Collaborator Author

mdr223 commented Feb 20, 2025

A couple of notes:

  • RAGPretrainedModel can only perform search w/string input (not embedding(s))
  • Chromadb.Collection may not have the correct model name if the user creates the index w/out specifying the embedding_function (which you technically don't need to do if you are only querying the index)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant