Skip to content

Latest commit

 

History

History
38 lines (22 loc) · 961 Bytes

VECTOR_STORES.md

File metadata and controls

38 lines (22 loc) · 961 Bytes

Vector stores

Investigation of alternative vector stores for image model embeddings.

ChromaDB

  • "Simplest useful thing", default in the LangChain examples for LLM rapid prototyping
  • Idiosyncratic, not standards-oriented
  • Evolving quickly (a couple of back-incompatible API changes since starting with it)

SQLite-vec

  • Lightweight and helpful examples, quick to start with?
  • Single process
  • "expect breaking changes!"

https://til.simonwillison.net/sqlite/sqlite-vec

https://github.com/asg017/sqlite-vec

https://github.com/asg017/sqlite-vec/releases

pip install sqlite-utils
sqlite-utils install sqlite-utils-sqlite-vec

Main use is in the streamlit app which is really tied to the internal logic of chromadb :/

Queries are

  • get all identifiers (need LIMIT for large collection) - URLs were used directly as IDs
  • get embeddings vector for one ID
  • get N closest results to one set of embeddings by cosine similarity