Sapien is a lightweight temporal knowledge graph library that lets you:
- Persist chat messages (or any event) in MongoDB.
- Compute dense embeddings with sentence‑transformers.
- Store those vectors in Qdrant for fast semantic search.
- Retrieve the most relevant historical context for a given query.
It’s ideal for building LLM‑powered assistants that need to remember past conversations or knowledge graphs while keeping all data stored in open‑source databases.
⚠️ The library is still a work‑in‑progress. It has minimal tests and expects MongoDB + Qdrant to be running locally (or reachable from your environment).
Feature | Status |
---|---|
MongoDB persistence | ✅ |
Qdrant vector search | ✅ |
Sentence‑transformers embeddings | ✅ |
Convenient async API | ✅ |
Zero‑configuration defaults (except for services) | ✅ |
Type‑hinted, testable code | ✅ |
# 1️⃣ Install the package + dev deps
poetry install
# 2️⃣ Start MongoDB and Qdrant locally
# (use Docker Compose – see docker-compose.yml)
docker compose up -d
# 3️⃣ Run a short demo script
python demo.py
demo.py
import asyncio
from datetime import datetime
from sapien import SapienClient, SapienConfig, CollectionNames
async def main():
cfg = SapienConfig(
mongo_uri="mongodb://localhost:27017",
db_name="sapien",
qdrant_url="http://localhost:6333",
collections=CollectionNames(),
)
async with SapienClient(cfg) as db:
await db.init_indexes()
# Add a new message
msg_id = await db.add_message(
session_id="chat_42",
role="user",
content="I need a laptop for gaming.",
timestamp=datetime.utcnow()
)
# Ask the context for a keyword
ctx = await db.get_context("chat_42", "laptop")
print(f"Context ({len(ctx)} docs):")
for doc in ctx:
print("-", doc["content"])
if __name__ == "__main__":
asyncio.run(main())
Prerequisites –
Python 3.12+
MongoDB server (>=4.0)
Qdrant server (>=1.0)
# Install via Poetry
poetry add sapien
If you want the full stack (including the optional sentence-transformers
and Qdrant client), just install the package normally – all dependencies are pulled in automatically.
All configuration is done through a single dataclass:
from sapien import SapienConfig, CollectionNames
cfg = SapienConfig(
mongo_uri="mongodb://localhost:27017",
db_name="sapien", # database name
qdrant_url="http://localhost:6333",
collections=CollectionNames(), # optional custom names
)
Tip – The default collection names are prefixed with
sapien_
(sessions
,messages
, etc.) to keep them isolated in thesapien
database.
Method | Description |
---|---|
SapienClient.__aenter__ / __aexit__ |
Async context manager that ensures collection creation. |
add_message(session_id, role, content, timestamp=None) |
Persist a message and fire‑and‑forget its embedding & Qdrant upsert. Returns the Mongo _id . |
get_context(session_id, query, k=10) |
Vector search in Qdrant → return full Mongo docs for the top k matches. |
init_indexes() |
Create idempotent indexes (sessions.session_id , messages.timestamp , etc.). |
The project ships with a small async test‑suite that expects MongoDB and Qdrant to be running locally.
# Run tests
poetry run pytest -vv
If the services are not reachable, the integration tests will be skipped automatically.
-
Clone & install
git clone https://github.com/yourname/sapien.git cd sapien poetry install
-
Run linters / formatters
poetry run ruff check . poetry run black src tests
-
Run the demo
python demo.py
-
Add a new feature – remember to update
pyproject.toml
, write tests, and add documentation.
Pull requests are welcome!
Please:
- Fork the repo.
- Create a feature branch (
feature/your-feature
). - Write or update tests.
- Run
poetry run pytest
. - Submit a PR.
For major changes, open an issue first to discuss the scope.
MIT © 2025 – feel free to use it however you like.