Investigation of alternative vector stores for image model embeddings.
- "Simplest useful thing", default in the LangChain examples for LLM rapid prototyping
- Idiosyncratic, not standards-oriented
- Evolving quickly (a couple of back-incompatible API changes since starting with it)
- Lightweight and helpful examples, quick to start with?
- Single process
- "expect breaking changes!"
https://til.simonwillison.net/sqlite/sqlite-vec
https://github.com/asg017/sqlite-vec
https://github.com/asg017/sqlite-vec/releases
pip install sqlite-utils
sqlite-utils install sqlite-utils-sqlite-vec
Main use is in the streamlit
app which is really tied to the internal logic of chromadb
:/
Queries are
- get all identifiers (need
LIMIT
for large collection) - URLs were used directly as IDs - get embeddings vector for one ID
- get N closest results to one set of embeddings by cosine similarity