Bucket-native, multi-tenant Postgres alternative.
RAM-per-connection is ~30× cheaper than Postgres. Projects are S3 prefixes,
not databases — operator cost is O(bytes active), not O(projects
provisioned). One binary, pgwire on the front, Vortex-compressed columnar
files on any S3-compatible bucket on the back.
Pre-alpha — public eval. Basin is being built in the open. Use it today to evaluate cost economics, prototype multi-tenant patterns, or contribute. The hot-tier UPDATE/DELETE fast paths are on by default as of Phase 5.14 closure (
bed431c). On the fresh integrity cards (2026-06-11, single idle-box session; the 1M card run solo, no index either side): point query p50 is sub-ms at every scale — 0.06ms at 10k / 0.12ms at 100k / 0.50ms at 1M — single-row UPDATE p50 lands at 0.33ms at 10k / 0.64ms at 100k / 1.24ms at 1M, and bulk INSERT now beats Postgres at every scale (33.6ms vs 117ms at 10k; 216ms vs 839ms at 100k; 2.08s vs 8.10s at 1M — see the durability note under the table below). The OLTP latency profile and per-shape table are inRESULTS_localfs.md; the kill-switch notes are in ADR 0016 if you need to roll the fastpaths back without a redeploy (BASIN_HOTTIER_FASTPATH_DISABLE=1).See
docs/V0_1_SCOPE.mdfor the v0.1 cut-off — required items, what's shipped, and what's parked.
Two things are structurally true regardless of workload, and they're the load-bearing reason Basin exists:
- Pure-Rust async server, ~310 KiB RAM per held-open connection vs Postgres's ~8.1 MiB. That's ~27× per the LocalFS lifecycle card (measured locally). Under a 1,000-connection flood, Basin holds 1,000 / refuses 0; Postgres holds 100 / refuses 900. This isn't a tuning result — it's the difference between a from-scratch tokio server and a forking daemon. (
server_lifecyclecard) - A new project is an S3 bucket prefix, not a provisioned database. Idle projects cost only their bytes — measured at ~2 KiB of RAM (2.16 KiB/project over 1,000 idle projects) and $0.10/month/project at typical SaaS-tail sizes. Spin up one project for a side app, or ten thousand for a SaaS — same architecture, same binary. (
idle-project cost curve)
Everything else — Vortex columnar storage, native vector search, basin-auth + basin-rest, time travel via Iceberg snapshots, pg_cron/pg_net/pg_trgm/PostGIS subsets as native crates, real pgwire v3 with TLS + COPY + extended-query + native JSONB/UUID/BYTEA binding — exists to make those two structural wins addressable from real applications. Decision log lives in docs/decisions/.
Basin publishes all of its head-to-head numbers, wins and losses, regenerated from integration tests on every push. Some shapes Basin beats Postgres; many it does not. The picture is workload-dependent — the table below is the 1M-row SaaS+OLAP suite, LocalFS, no index either side (the most apples-to-apples wedge). For 10k / 100k / real-S3 numbers, see the benchmark results.
Numbers below are the live 1M-row Vortex card from the 2026-06-11 integrity run, regenerated by benchmark/run/run_all.sh and measured locally in a single idle-box session with the 1M card run solo (Postgres-compare cards run with Basin's default config — the HTAP fastpaths and fast bulk-INSERT encoding are always-on, no non-default flags, as disclosed on the dashboard). Per-shape numbers reduce through a robust estimator (median at n ≥ 5 samples, min-of-K below; 1M shapes sample at least 3×) applied identically to both sides. Write-shape durability note: Basin's default acknowledges INSERTs before fsync (bounded loss window, ≤200 ms) while the Postgres numbers are fsync-durable per commit; SET basin.synchronous_commit = on provides group-committed fsync durability per statement — measured: it adds ~2% to the 10k bulk INSERT in the probe harness (group commit amortizes one fsync per statement group). The per-scale matrix in RESULTS_localfs.md and the real-S3 card below are the fuller picture.
| Workload (1M LocalFS, no index either side) | Basin (Vortex) | Postgres 18 | Verdict |
|---|---|---|---|
| Structural | |||
| RAM per held-open connection | 310 KiB | 8,257 KiB | ~27× less |
| Connections held under 1,000-conn flood | 1,000 held / 0 refused | 100 held / 900 refused | structural |
| On-disk bytes (users + events, 1M rows) | 321 MB | 306 MB | ~5% larger on this card — honest flip; Basin is still 1.9× smaller at 100k and 102× smaller on real S3 |
| Analytics / read — wins | |||
| LATERAL JOIN (correlated derived table) | 6.7 ms | 3,080 ms | 462× faster |
| Star join (events ⋈ users ⋈ categories) | 11.6 ms | 3,040 ms | 261× faster |
| Correlated subquery in SELECT p50 | 49 ms | 5,510 ms | 113× faster |
| Range scan p50 (~1k rows) | 0.40 ms | 32 ms | 81× faster (p99 0.70 vs 72 — 101×) |
| WHERE col = ANY(int[]) | 0.60 ms | 38 ms | 63× faster |
| Selective low-card COUNT (status filter) | 0.68 ms | 34 ms | 51× faster |
| COUNT(DISTINCT user_id) per status p50 | 31 ms | 913 ms | 30× faster |
| 2-table JOIN GROUP BY p50 | 28 ms | 791 ms | 28× faster |
| Large result stream (100k-row drain) p50 | 0.44 ms | 11.0 ms | 25× faster |
| PERCENTILE_CONT(0.5) per status p50 | 23 ms | 442 ms | 19× faster |
| UNION (dedup) | 22 ms | 246 ms | 11× faster |
| Aggregate GROUP BY user_id p50 | 13.6 ms | 137 ms | 10× faster |
| Window frame SUM (5 PRECEDING) p50 | 9.8 ms | 91 ms | 9.3× faster |
ILIKE '%@gmail.com' p50 |
4.6 ms | 25 ms | 5.3× faster |
| Bulk / concurrent INSERT — wins | |||
| Bulk INSERT 1,000,000 rows | 2,080 ms | 8,100 ms | 3.9× faster† |
| Concurrent INSERT 8×1000 rows | 3.9 ms | 20 ms | 5.2× faster† |
| Prepared INSERT 100× bind/execute | 4.6 ms | 10.2 ms | 2.2× faster† |
| Selective single-row / OLTP — losses (PG's PK btree wins) | |||
| Point query p50 (unindexed PK) | 0.50 ms | 0.002 ms | slower — sub-ms, vs PG's µs-class PK btree |
| UPSERT (INSERT ON CONFLICT DO UPDATE) | 0.26 ms | 0.02 ms | slower — both sub-ms |
| Single-row UPDATE p50 | 1.24 ms | 0.012 ms | slower — ms-class vs µs-class (0.33 ms on the 10k card) |
| DELETE WHERE id IN (10 rows) | 1.16 ms | 0.41 ms | slower (wins at 10k/100k: 0.16/0.22 ms vs 0.24/2.9 ms) |
| Conditional UPDATE (SET = CASE WHEN) | 4.9 ms | 1.9 ms | slower — delta-update overlay, near-parity class |
Keyset pagination (WHERE id > … LIMIT) |
23.5 ms | 0.01 ms | slower — measures 0.07 ms on the 10k card; see note below |
| LIMIT without ORDER BY (early-exit scan) | 52 ms | 0.03 ms | slower — 0.06 ms on the 10k card; see note below |
| Deep top-K sort (ORDER BY amount LIMIT 1000) | 161 ms | 53 ms | slower — whole-table wide-decode floor |
| DISTINCT ON first-row per group p50 | 155 ms | 92 ms | slower (wins at 10k: 1.9 vs 2.2 ms) |
| ARRAY_AGG + ORDER BY in aggregate | 516 ms | 90 ms | slower (parity at 10k: 3.2 vs 3.4 ms) |
| COUNT(*) full table p50 | 95 ms | 29 ms | slower on this card |
JSONB ->> get-text p50 (steady, promoted) |
1.97 ms | 0.11 ms | slower — promoted-column read; PG's btree wins selective JSONB |
JSONB @> / ? / filter+agg |
240–1,234 ms | 53–108 ms | slower — GIN effectiveness 1.12× at 1M (4.3× at 10k); see note below |
| MERGE INTO (matched/not-matched upsert) | unsupported | 0.08 ms | GAP — honest SQLSTATE reject, not a silent no-op |
| INSERT … SELECT (10k-row pipeline) | unsupported | 15.5 ms | GAP |
| Write / concurrency — losses | |||
| Bulk UPDATE (~1/3 rows) | 9.4 s | 3.4 s | 2.8× slower |
| Mixed read-write concurrency 8R+4W (600 ops) | 202 ms | 12 ms | slower — newly measured shape |
| Concurrent SELECT (16 sessions, mixed) | 19 ms | 2.3 ms | slower |
| Read-modify-write contention (8 sessions) | 8.4 ms | 4.7 ms | slower |
† Durability disclosure: Basin's default acks before fsync (≤200 ms bounded loss window) while the PG numbers are fsync-durable per commit. SET basin.synchronous_commit = on provides group-committed fsync durability — measured at ~2% added latency on the 10k bulk INSERT in the probe harness (group commit amortizes one fsync per statement group).
The honest tally: across the ~100 ms-shapes on this card Basin is faster on 51, Postgres on 54. Several OLTP fast paths measure at full speed on the 10k card (keyset 0.07ms, LIMIT-no-ORDER 0.06ms, ARRAY_AGG/DISTINCT-ON at-or-better than PG) but decline on the 1M card while hot-tier overlays from earlier card shapes are live — under investigation as a card-order interaction; the engine stays correctness-first by design.
At 1M rows on LocalFS the picture is read-win, point-write-loss: Basin wins the shapes where columnar scans and pushdown dominate — LATERAL JOIN 462×, star join 261×, correlated subquery 113×, range scan 81×, large-result streaming 25× — and now wins bulk INSERT outright at every scale (2.08 s vs 8.10 s at 1M, with the durability footnote). It loses PG's home turf: selective single-row lookups and point mutations (point query, UPSERT, single-row UPDATE, JSONB ->>) where PG's PK btree is µs-class and Basin is sub-ms-to-ms-class, plus bulk UPDATE and read/write concurrency (mixed 8R+4W 202 ms vs 12 ms — a newly measured shape and the largest open concurrency gap). The structural wins (RAM/conn, idle cost) and the real-S3 card are the load-bearing comparison. At 10k the OLTP fast paths run at full speed (keyset 0.07 ms, point 0.06 ms, UPDATE 0.33 ms) and bulk INSERT already wins 3.5×; at 100k Basin wins most analytical shapes and bulk INSERT 3.9×. On real S3 (SeaweedFS, 100k rows) Basin wins outright — 102× smaller on disk (99 KB vs 10.1 MB), point query 4.1× faster (4.6ms vs 18.5ms), 100k-row INSERT 20× faster (96ms vs 1.9s) — because object-storage economics and warm-cache reads beat PG's block-storage heap once data leaves a single local SSD; on the real-S3 scale-up card point-query p50 stays bounded as data grows (0.89 ms at 10k → 1.8 ms at 1M, 2.0× growth over 100× data). Full per-scale matrix:
benchmark/RESULTS_localfs.md— 10k / 100k / 1M cards, Vortex + Parquet, every shapebenchmark/RESULTS_seaweedfs.md— local S3-compatible (SeaweedFS) results, same shape batterybenchmark/RESULTS_real.md— real-cloud S3 resultsbenchmark/BENCHMARKS.md— methodology, caveats, reproduction- Live dashboards:
benchmark/index_localfs.html,benchmark/index_real.html
Every card is generated by an integration test in tests/integration/tests/ — read the source and tell us where we're wrong. File a Benchmark Methodology issue and we'll either fix the methodology, soften the headline claim, or explain why we think the test is fair. Decisions are logged in docs/decisions/.
- Append-heavy multi-tenant SaaS — many isolated projects, mostly-reads, occasional point UPDATE. The RAM-per-conn + per-project-prefix economics dominate; the OLTP point-write tax stays bounded as long as writes are append-shaped.
- Audit logs, event streams, IoT, activity feeds — write-once-read-many. Vortex columnar compression shrinks bytes-at-rest 1.9× vs PG heap on the mixed 100k SaaS shape (users + events; roughly at parity on the latest 1M card), and far more on append-only event data: 102× smaller on real S3 at 100k on the audit-log shape, and ~78× vs raw CSV. Object-storage $/GB compounds.
- AI / RAG with mixed tabular + vector data — native
vector(N)+ HNSW alongside transactional rows in the same database, nopg_vectorinstall. Useful when "store the document + the embedding + the audit row in one place" is the requirement. - Cheap-idle multi-environment apps — dev / staging / per-region / per-customer all live as cheap project prefixes on one cluster. Per-project cost is O(bytes), so 10k mostly-idle projects stays cheap. See
docs/multi-project.md.
- Drop-in OLTP replacement at 1M+ scale with microsecond point-mutation latency requirements. Phase 5.14 closure (
bed431c) flipped the hot-tier UPDATE/DELETE fast paths on by default, and three work waves of OLTP scale fixes cut single-row UPDATE at 1M to 1.24ms (vs PG 0.012ms) — a huge improvement over the prior ~162ms, and ms-class in absolute terms, but PG's index-and-heap is still structurally faster on point mutations. For most multi-tenant SaaS workloads this gap doesn't matter (point mutations are <1% of traffic); for hot-write OLTP front ends it might. The kill-switchBASIN_HOTTIER_FASTPATH_DISABLE=1rolls the fastpaths back without a redeploy if needed. - Index-heavy workloads on non-PK columns — Phase 5.7.B1 secondary B-tree indexes shipped (
CREATE INDEX … USING btree), soWHERE non_pk_col = Xcan now probe the index instead of full-file scanning. Bloom filters onbasin.sort_by+ per-filecolumn_statsstill prune at the file level for indexes that aren't declared. The GIN-on-tsvector (@@, Phase 5.20.E) and GIN-on-jsonb (@>) read paths are wired with file/row-group pruning; the remaining open perf work there is keeping the in-RAM posting lists warm across compaction and restart (a cold or incomplete registry degrades to a full scan — correct, just unpruned). - Pure analytical workhorse — Snowflake / DuckDB / ClickHouse will out-run Basin on heavy GROUP BY / window / recursive-CTE shapes. Basin trades some bench wins for PG-compat, and inherits DataFusion's upstream limits on a handful of shapes (recursive CTE, exact COUNT(DISTINCT)).
- High-frequency single-DB OLTP → Postgres / Aurora / Neon. Edge / local-first → Turso / libSQL. Geospatial primary store → PostGIS. Embedded SQLite-class → SQLite. Globally strongly-consistent multi-region writes → Spanner / Cockroach.
Basin is one binary that ships the pieces you'd otherwise wire up across five vendors. The wedge isn't "a cheaper Postgres" — it's "the same stack, but every project is a bucket prefix and the marginal cost is near zero."
| Piece | Status | Honest caveat |
|---|---|---|
| Auth (signup, signin, magic-link, JWT, refresh, OAuth) | ✅ Shipped | Per-project schema; auth.uid() / auth.role() work in RLS policies. |
| REST API (PostgREST-compatible) | ✅ Shipped | GET/POST/PATCH/DELETE on /rest/v1/<table>; RPC mount for SQL + Wasm functions. |
| Realtime (SSE + WebSocket + presence) | ✅ Shipped, harness-gated | Implementation complete; some integration harness slices #[ignore]-gated pending un-gate. |
Blob storage (basin-blob) |
✅ v1 | Catalog-backed object storage (ADR 0021). Full /storage/v1/ REST surface in basin-rest: bucket CRUD, object upload/download/list/bulk-delete, public fast path, signed-URL mint + verify (HMAC-SHA256, expiry, key rotation), per-object RLS, and per-project byte-quota counters. HEAD / COPY / resumable multipart deferred to v1.1. See CAPABILITIES.md. |
Vector search (native vector(N), HNSW, <->/<#>/<=>) |
✅ Shipped | Planner auto-routes ORDER BY x <-> $1 LIMIT k. No pg_vector install needed. |
WASM UDFs (CREATE FUNCTION … LANGUAGE wasm) |
✅ v0.1 scalar | i32/i64/f64 plus text/bytea/timestamptz args + returns (variable-length values cross a basin_alloc/basin_dealloc (ptr,len) ABI); JSONB rides the text path. Per-row invocation; a dedicated jsonb arg type and vectorized (whole-array) calls are deferred. |
| Cron, HTTP-from-SQL, trgm, geo, continuous matviews | ✅ via Basin-flavored crates | See "Postgres-extension equivalents". No CREATE EXTENSION required. |
What we don't ship — and on purpose:
- Edge functions (Cloudflare Workers / Supabase Edge Functions shape) — see ADR 0019: in-engine WASM UDFs plus declarative inbound webhooks + RPC mount solve the "compute close to data" need for Basin's wedge. Geographically distributed V8 isolates are a different concept and a maintenance burden the wedge doesn't justify.
- Triggers / PL/pgSQL — replaced by declarative lifecycle columns + SQL-bodied reactors +
LANGUAGE sqlfunctions per ADR 0012. - Postgres extensions (
.so) — see ADR 0002.
Long-form companion with code samples per piece: docs/batteries.md. Fine-grained matrix: CAPABILITIES.md.
This is the and-the-rest-of-the-stack-is-here line. The structural primitive (project = bucket prefix, ~310 KiB/conn) is the wedge; this is what closes the sale.
| Surface | Status |
|---|---|
| sqllogictest (curated PG-style suite) | 100% (50/50) as of b7114e8 |
| ORM corpus (Drizzle / Prisma / sqlx / Diesel / TypeORM, 99 representative shapes) | 100% (102/102) on the 2026-06-11 integrity run — Drizzle 100%, sqlx 95%, Diesel 95%, TypeORM 94%, Prisma 90%; all 5 failures are typed errors, 0 regressions. pg_index / pg_sequence / pg_enum introspection now populated, so ORM startup/migration introspection (Prisma, Drizzle, SQLAlchemy, ActiveRecord, sqlx, Django, Hibernate) resolves cleanly. |
Per-fragment SQL matrix (docs/sql-support.md, 975 fragments tested) |
~88.5% Default config / ~91.5% non-excluded (863/975) |
| Wasm UDFs | i32 / i64 / f64 plus text / bytea / timestamptz args + returns shipped (via a basin_alloc/basin_dealloc + linear-memory (ptr,len) ABI); JSONB rides the text path. Deferred: a dedicated jsonb arg type and vectorized (whole-Arrow-array) args |
| Wire protocol | pgwire v3, simple + extended query, TLS (rustls), COPY FROM STDIN / TO STDOUT (CSV + PG binary COPY format), prepared statements with binary parameters (native JSONB / UUID / BYTEA / ARRAY binary wire formats). NUMERIC rides the text wire format (lenient drivers handle it fine; binary varlena encoding deferred to v0.2). |
| Differential PG-oracle harness | tests/integration/tests/differential_pg.rs — every release runs identical SQL against Basin and a real PostgreSQL; build fails on any cell-level divergence |
Per-statement breakdown with every red row linked to its planner / parser / executor owner: docs/sql-support.md. Public capability matrix: CAPABILITIES.md.
Intentionally out of scope (per ADR 0002): CREATE TRIGGER, CREATE OPERATOR, composite CREATE TYPE, multirange column types / OID / REGCLASS / BIT / PG_LSN. (Multirange expressions and operators work on literals — only the DDL column type is rejected.)
LISTEN / NOTIFY / UNLISTEN are supported — SQL-level pub/sub via a per-engine notify registry, with PG-accurate transaction buffering (NOTIFY inside a transaction is queued and fanned out only on COMMIT, discarded on ROLLBACK). See tests/integration/tests/listen_notify.rs.
Want to skip the build and start querying right now?
See the 5-Minute Docker Quickstart — one
docker run command, no Rust toolchain required.
Ready to go deeper? The Getting Started / Tutorial walks you through CRUD, auth, RLS policies, the REST API, a React/Vite frontend snippet, and the first-deployment path — about 15 minutes end-to-end.
Want to see a complete app?
Two reference apps are in examples/:
examples/saas-starter/— multi-tenant SaaS app: Drizzle ORM, basin-auth, RLS policies, basin-rest auto-generated REST surface.examples/ai-rag-app/— AI/RAG app: document chunking + embedding pipeline, basin-vector similarity retrieval, Wasm function calling an inference endpoint.
Install basin, point it at a data dir, run. No external object store is needed for local development.
BASIN_DATA_DIR=/tmp/basin cargo run -p basin-serverThat gives you pgwire on 127.0.0.1:5433, durable WAL + Vortex columnar
files (the catalog default; Parquet is opt-in per-table) under
/tmp/basin/, and a volatile in-memory catalog for fast local iteration.
Set BASIN_CATALOG=postgres://... for restart-safe metadata.
The full production-shaped boot layers WAL, shard owner, connection pool, JWT auth, and REST in one process:
BASIN_BIND=127.0.0.1:5433 \
BASIN_CATALOG=postgres://postgres@127.0.0.1:5432/postgres \
BASIN_DATA_DIR=/tmp/basin \
BASIN_WAL_DIR=/tmp/basin/wal \
BASIN_PROJECTS='alice=*,bob=*' \
BASIN_SHARD_ENABLED=1 \
BASIN_POOL_ENABLED=1 \
BASIN_AUTH_ENABLED=1 \
BASIN_AUTH_JWT_SECRET=$(openssl rand -hex 32) \
BASIN_AUTH_SMTP_HOST=smtp.example.com BASIN_AUTH_SMTP_PORT=587 \
BASIN_AUTH_SMTP_USERNAME=u BASIN_AUTH_SMTP_PASSWORD=p \
BASIN_AUTH_SMTP_FROM=noreply@example.com BASIN_AUTH_SMTP_TLS=starttls \
BASIN_REST_ENABLED=1 BASIN_REST_BIND=127.0.0.1:5434 \
cargo run -p basin-serverBASIN_PROJECTS is the project-list env var — name is historical, projects in the
public API. Required vars for production-shaped durability: BASIN_BIND,
BASIN_CATALOG=postgres://..., BASIN_DATA_DIR or BASIN_STORAGE_BACKEND,
BASIN_WAL_DIR, BASIN_PROJECTS, and BASIN_AUTH_ENABLED (if you want auth).
Everything else is optional.
To run the same binary against object storage, set
BASIN_STORAGE_BACKEND=s3|tigris plus the S3-compatible endpoint, bucket,
region, and credentials documented by basin-storage.
Connect with any Postgres driver:
psql -h 127.0.0.1 -p 5433 -U aliceRun real SQL:
-- Standard tables, standard SQL.
CREATE TABLE events (id BIGINT NOT NULL, body TEXT NOT NULL);
INSERT INTO events VALUES (1, 'hello'), (2, 'world');
SELECT * FROM events WHERE id = 2;
-- Native vector search, no pg_vector required.
CREATE TABLE docs (id BIGINT, embedding vector(384));
INSERT INTO docs VALUES (1, '[0.01, 0.02, ...]');
SELECT id FROM docs ORDER BY embedding <-> '[...]' LIMIT 10;Confirm the data hit object storage under the project prefix:
find /tmp/basin/projects -name '*.vortex' # default format
# /tmp/basin/projects/01HABCD…/tables/events/data/2026/05/01/01HEFG….vortex
# Tables created with WITH (basin.file_format='parquet') write *.parquet instead.That's a real bucket-native database. The prefix is the IAM boundary; one bucket policy revokes all access to a project's data even if every other layer is bypassed.
Four layers, each with one job:
pgwire clients (any Postgres driver — psql, tokio-postgres, asyncpg, JDBC)
│
▼
Routers (stateless) parses SQL, applies RLS, routes by project
│
▼
Shard owners (stateful) in-memory state for many projects per process,
│ eviction on idle, lazy load from WAL + Parquet
▼
WAL durable append path; flushes to object storage
│
▼
Object storage + catalog /projects/{id}/... Vortex (default) or Parquet + Iceberg-style metadata
local FS, S3, Tigris (S3-compatible) — same binary, different bucket
The full architecture document is in docs/architecture.md. Every "no" we've recorded is in docs/decisions/.
Built on: Apache Arrow · Apache Iceberg (table format) · Vortex (default columnar format, LFAI incubation) · Apache Parquet (opt-in, interchange) · Apache DataFusion (SQL planner) · Tokio · pgwire-rs · openraft (single-process Raft WAL simulation today; cross-process distributed WAL is v0.2). Pure Rust, #![forbid(unsafe_code)] across every crate.
Basin's query engine is built on Apache DataFusion, the open-source SQL query engine from the Apache Software Foundation. Basin does not fork DataFusion — every query plan runs through upstream operators with Basin-shaped rules layered on top (RLS injection, project isolation, partition pruning).
- Postgres-compatible SQL — pgwire v3, simple + extended query protocol, TLS (rustls),
COPY FROM STDIN/COPY TO STDOUT(CSV + binary). Works withpsql,tokio-postgres,asyncpg, JDBC, Diesel, SeaORM, any Postgres ORM. A curated ORM/driver-compat suite plus a PG-oracle differential harness (differential_pg.rs) gate every release against a real PostgreSQL. - CRUD + DDL —
CREATE TABLE(incl.CREATE TABLE AS … WITH NO DATA), multi-rowINSERT,SELECT,UPDATE,DELETE(Iceberg copy-on-write today; hot-tier tombstone fast path env-gated in87ef24b),ON CONFLICT DO NOTHING / DO UPDATEupsert,ALTER TABLE … CLUSTER BY (…) / SET BLOOM FILTERS ON / SET row_group_rows / SET cold_after / ENABLE ROW LEVEL SECURITY / CREATE POLICY,SHOW TABLES. Prepared statements with parameter bind (text + binary, including native JSONB / UUID / ARRAY binary; NUMERIC binds over the text wire format — binary varlena deferred to v0.2). - Honest enforcement, not silent no-ops —
CREATE UNIQUE INDEXactually enforces uniqueness,VARCHAR(n)/CHAR(n)length is enforced, RLSWITH CHECKis enforced on write,TABLESAMPLEactually samples, advisory locks are real, and unsupportedCREATE TRIGGER/MERGEhonest-reject with a SQLSTATE instead of silently doing nothing. A wave of silent-corruption CRITICALs surfaced by the differential harness were fixed. - Expanded SQL surface — JSONPath (
jsonb_path_query,@?,@@,jsonb_path_query_array); JSONB mutators (jsonb_set/insert/strip_nulls/pretty/typeof);json_build_object/json_build_array; INET/CIDR containment;regexp_match/matches/split_to_array/split_to_table,format,encode/decode; datetimeage/to_char/to_date/date_bin; windowIGNORE NULLS;SAVEPOINT/ROLLBACK TO; data-modifying CTEs; correlated +LATERALjoins (incl.CROSS JOIN LATERAL generate_series); bounded full-text search (tsvector/tsquery/@@); ordered-set aggregates (percentile_disc,mode() WITHIN GROUP); range/multirange arithmetic; real transaction semantics (deferred commits,ROLLBACKundo, SAVEPOINT stack, aborted state). - Snapshot-isolated transactions + HTAP fast-path DML — explicit transactions take snapshot-stable reads (an open txn sees a consistent table view across statements even as other sessions commit; hot-tier MVCC sequence prevents overlay writes from leaking into open transactions). In-transaction single-row DML routes through a transaction-scoped overlay fast path, and untouched-table in-tx reads are served from a pinned snapshot without re-planning through DataFusion. RMW
SET col = <expr>has a fast path; single-key UPDATE prunes its cold pre-image read. - Prepared-statement AST execution — bound statements execute without re-parsing on every call; parameter types are inferred through UDF argument positions for clients that don't pre-declare them.
- Auto-index advisor — the engine observes query shapes and recommends/auto-builds secondary B-tree indexes for repeated point/IN-list probes; sorted-PK skip handles IN-lists in O(log n) zone-prune.
- Time travel — Iceberg-style snapshots.
Catalog::rollback_to_snapshot(project, table, snapshot_id)rewinds;Catalog::fork_table(project, src, dst)clones a table's metadata + snapshot history into a new sibling that diverges on next commit. Zero data copy until divergence. - Native vector search —
vector(N)+<->/<#>/<=>operators, HNSW per file segment.CREATE INDEX … USING hnsw WITH (m = …, ef_construction = …)accepts pgvector-style build params, and the planner routesORDER BY x <-> $1 LIMIT kto the index whose opclass (vector_l2_ops/vector_cosine_ops/vector_ip_ops) matches the query's distance operator. Nopg_vector. - Postgres-extension equivalents —
pg_cron(basin-cron),pg_net+http(basin-net),pg_trgm(basin-trgm),PostGISsubset (basin-geo, incl.ST_AsGeoJSON/ST_GeomFromGeoJSONGeoJSON I/O and R-tree envelope candidates for&&/ST_DWithin),TimescaleDBcontinuous aggregates (basin-cv),pgcrypto+uuid-osspUDFs. - Auth + REST in the OSS bundle — basin-auth (signup, JWT, refresh-token rotation, email-link login, per-project API keys) + basin-rest (PostgREST-shape CRUD, cursor pagination + NDJSON streaming, OpenAPI 3.0 schema generation at
GET /rest/v1/_openapi.json).auth.uid(),auth.role(),auth.jwt()SQL session functions let you write Supabase-style RLS policies. - Per-project connection URLs —
POST /admin/v1/projectsreturnspostgres://<user>:<password>@host:5433/<db>. Password bcrypt-validated on every pgwire startup; mismatch → SQLSTATE28P01. Rotate viaPOST /admin/v1/projects/{user}/rotate. - Durable catalog — Iceberg-style catalog backed by Postgres when
BASIN_CATALOG=postgres://...; tables, snapshots, project credentials, andbasin-auth's identity tables survive process restart. - Cheap retention — Vortex (default, ~1.95× smaller than ZSTD Parquet on audit-log) or Parquet, 1.9× smaller than Postgres heap on the mixed 100k SaaS+OLAP shape (parity on the latest 1M card) and 102× smaller on real S3 at 100k on the append-only audit-log shape; per-file catalog
column_stats+ per-file bloom filters onbasin.sort_bycolumns skip footer fetches and file opens when the predicate prunes the file. - Analytical path — a single DataFusion engine with Vortex/Parquet projection + predicate pushdown, catalog-statistics file pruning, per-file blooms, and incremental continuous materialized views. Approximate-cardinality and approximate-quantile UDFs (
APPROX_COUNT_DISTINCT,APPROX_PERCENTILE) sit alongside exact counterparts for dashboard workloads. Heavy scans use stateless pooled compute over shared object storage. - Multi-schema isolation (phase A) —
SchemaName/QualifiedTableNametypes, a schema-aware in-memory and Postgres-backed catalog, abasin_schemastable, andCREATE/DROP SCHEMA+ cross-schema queries with differential coverage. Phases B–E (full name resolution / search_path semantics / wider DDL) are still in progress — see Status. - Operations — connection pooling, per-project pgwire rate limiting (token-bucket via
governor), cost-based query rejection (BASIN_QUERY_COST_LIMIT_ROWS), per-project counters (ops / bytes_read / bytes_written / errors / p99), OpenTelemetry traces wired through router → engine → shard → storage → WAL.
The full capability matrix (with what's planned and what's deferred): CAPABILITIES.md. The fine-grained per-syntax matrix derived from automated tests: docs/sql-support.md.
| Phase | Description | Status |
|---|---|---|
| 0 | Validate the wedge — customer interviews, design partners | open (the gate; engineering is mature enough to need customer signal next) |
| 1 | Storage substrate — Vortex (default) / Parquet on object_store, Iceberg-style catalog | shipped |
| 2 | WAL service — sub-5 ms write acks | v0.1 shipped (single-node; Raft is v0.2) |
| 3 | Shard owners — per-project state, eviction, compactor | v0.1 shipped (in-process; placement service is v0.2) |
| 4 | Routers + SQL — pgwire v3, extended query, TLS, COPY, native JSONB / UUID binding | shipped — real single-shard transaction semantics (deferred commits, ROLLBACK undo, SAVEPOINT stack, aborted state) landed; cross-shard 2PC remains v0.2 (ADR 0011) |
| 4.5 | PostgreSQL SQL-compatibility push — silent-corruption CRITICAL fixes, JSONPath / JSONB-mutating / INET-CIDR / regexp / datetime function families, ARRAY binary wire format (NUMERIC stays text-wire pending v0.2 varlena binary), PG-oracle differential harness (differential_pg.rs) |
shipped — Default config at ~88.5% / ~91.5% non-excluded (863/975); long-tail exotic-DDL parser gaps remain v0.2 |
| 5 | Analytical path — single DataFusion engine, Vortex/Parquet pushdown + per-file bloom + catalog pruning, continuous pre-aggregation, APPROX_COUNT_DISTINCT/APPROX_PERCENTILE UDFs |
v0.1 shipped |
| 5.0a | Vortex storage format — ~1.95× smaller than ZSTD Parquet; aggregate_full ~15–40× via catalog-stats metadata path; per-file blooms flip point_eq from a loss to a win at every scale |
shipped as the DEFAULT (ADR 0015), zero-regression vs Parquet baseline. Parquet first-class per-table via WITH (basin.file_format='parquet'). HTAP hot-tier (ADR 0016) is Phase 5.14.C — closes the residual OLTP point-read and the UPDATE/DELETE write floor. |
| 5.14 | Durable Basin moat — per-file catalog blooms (shipped), APPROX_COUNT_DISTINCT + APPROX_PERCENTILE UDFs (shipped), catalog-aware WindowExec sort-elision (shipped), HTAP hot tier on by default (bed431c) — DELETE + UPDATE fast paths default-ON, kill-switch BASIN_HOTTIER_FASTPATH_DISABLE=1, merge-on-read via TombstoneFilterExec + UpdateOverlayExec wired in both DataFusion (HtapUnionTable::scan) and fast_select paths, gate-matrix locked by 16 tests, C6 differential harness extended with Mode D fastpath-on, TxCommit WAL marker (ADR 0020 §6) emitted explicitly with backward-compat replay. The 3-month investment that is not subsumed by upstream Vortex / DataFusion improvements. |
shipped |
| 5.15 | Unified docs platform — OSS-repo markdown with YAML frontmatter (spec), basin-cloud webapp consumes via npm run dev:docs build-time fetch |
OSS side shipped (5.15.A/B/C, frontmatter spec + 24-doc migration + top-level index + CI gate); basin-cloud webapp side (5.15.E–I) deferred to that repo |
| 5.5 | Sharding axes — partitioning, compute sharding, tiered storage | shipped |
| 5.6 | RLS with CREATE POLICY (UNION / CTE coverage) |
shipped |
| 5.7 | Caches + bloom + A4 catalog stats + B1 secondary B-tree indexes + B2 cluster-by + B3 row-group sizing | shipped; the per-row-group GIN-on-tsvector (@@, 5.20.E) and GIN-on-jsonb (@>) read-path wiring has since shipped too |
| 5.8 | pg_cron + pg_net SQL surfaces |
shipped |
| 5.9 | Postgres-extension equivalents (basin-geo / -trgm / -cv, JSONB, UUID, pgcrypto) | shipped |
| 5.10 | Identity + REST (basin-auth, basin-rest, OpenAPI, pagination, streaming, API keys, refresh rotation, per-project connection URLs, auth.uid() / auth.role() / auth.jwt() session functions) |
shipped |
| 5.11 | Multi-schema isolation | phase A shipped — SchemaName/QualifiedTableName types, schema-aware in-memory + Postgres catalog, basin_schemas table, CREATE/DROP SCHEMA + cross-schema queries with differential coverage. Phases B–E in progress — full qualified-name resolution, search_path semantics, wider schema-scoped DDL |
| 6 | Production hardening | partial — telemetry / pooling / rate-limit / cost-rejection / catalog-PITR / fork shipped; multi-region (ADR 0009), catalog replication (ADR 0010), cross-shard 2PC (ADR 0011) all locked architecturally and gated on customer demand |
| 6.x | SQL long-tail (still pending) | planned — COPY FROM STDIN ergonomics, server-side PREPARE/EXECUTE over text protocol edge cases, plpgsql DO blocks, full MERGE, exotic types (BIT / OID / REGCLASS), and the parser-refused exotic DDL forms |
| 7 | Launch | gated on Phase 0 |
Six-month wedge slice: WEDGE.md. Full plan: TASK.md. Decision log: docs/decisions/.
Postgres is the right answer for single-project, high-frequency OLTP and for workloads that need microsecond point-mutation latency at 1M+ rows. Basin's hot-tier UPDATE/DELETE fast paths are on by default as of Phase 5.14 closure, and three work waves of OLTP scale fixes cut point UPDATE at 1M to 1.24ms (vs PG 0.012ms) — far better than the prior ~162ms, but PG's index-and-heap is still structurally faster on that shape. Basin is not trying to be Postgres on those shapes. Where Basin wins: many-isolated-projects (per-environment / per-customer / per-region), append-shaped workloads where the columnar bytes-at-rest savings compound (102× smaller on real S3 at 100k), bulk ingest (bulk INSERT 3.9× faster at 1M, with the durability disclosure), the columnar-scan analytical shapes (113× faster correlated subquery, 462× LATERAL JOIN, 81× range scan at 1M), and the structural RAM-per-connection economics for connection-heavy front ends.
Neon is serverless Postgres with branching — terrific for single-DB workloads that want copy-on-write forks. Basin matches the branching story (Iceberg forks are zero-copy too) but stores data on plain S3 rather than a managed page server. Neon's per-project minimum scales with project count (O(provisioned pool)); Basin's per-project cost is O(bytes), so many isolated projects stay cheap.
Supabase is "BaaS in a box" — Postgres + Auth + Edge Functions + Storage + Realtime. Basin covers the SQL + Auth + REST surface in one binary, with auth.uid() / auth.role() / auth.jwt() working identically. Where Basin differs is the data-layer economics: Vortex/Parquet on S3 instead of Postgres heap on block storage. Multi-project SaaS that has outgrown Supabase's per-project pricing can migrate the database to Basin via pgwire and keep Supabase Auth / Edge Functions / Realtime for the parts of the stack they handle well. Edge Functions / Realtime / Storage are out of scope per ADRs 0005/0006.
Nile is "Postgres for multi-tenant SaaS" — same problem space as Basin, but built directly on real PostgreSQL with per-tenant virtual databases. That choice gives Nile real PG semantics, real OLTP, real JSONB, real extensions, and real PL/pgSQL for free, which is exactly where Basin still trails today. Basin's structural answer is the substrate economics: Nile's per-tenant cost is bounded by the underlying PG instance's per-tenant cost (heap pages, connection slots, autovacuum overhead); Basin's per-tenant cost is O(bytes-on-S3) with shared compute, so cold or low-traffic tenants stay near-zero. The right card to read is cost per tenant per month at p99 latency, at scale — single-instance PG-based multi-tenancy gets expensive at 10k+ tenants in a way object-storage-native multi-tenancy doesn't. If your workload is point-mutation-heavy and JSONB-heavy and you have <1k tenants, Nile is probably the easier answer today. If you have many idle or low-traffic tenants, append-shaped data, or want columnar bytes-at-rest economics, Basin is the cheaper substrate.
Turso is the right answer for edge-distributed apps with many tiny SQLite-class databases. Basin is for centralized apps that want Postgres SQL on cheap object storage with a real wire-protocol surface that ORMs already speak.
ClickHouse and DuckDB are analytical engines — phenomenal at OLAP scans, not designed for transactional point reads or per-row inserts. Basin will lose to a dedicated warehouse on heavy GROUP BY / window / recursive-CTE shapes, and inherits DataFusion's upstream limits on a handful of shapes (recursive CTE, exact COUNT(DISTINCT)). Basin's pitch is the unified path: one engine, one binary, pgwire on the wire, columnar substrate underneath — useful when "Snowflake plus a Postgres" is overkill for the workload size. The HTAP hot-tier (ADR 0016) closes the OLTP point-read and point-write floor on TB-scale tables — the missing piece that lets one engine span OLAP and OLTP without a second system.
Per the ADRs:
- Single-project high-frequency OLTP → Postgres / Aurora / Neon / Nile
- Edge / local-first → Turso / libSQL / Cloudflare D1
- Geospatial primary store → PostGIS
- Embedding-only workload → dedicated vector DB (Qdrant, Pinecone) — but Basin's native
vector(N)works fine alongside tabular data - Embedded SQLite-class library → SQLite
- Globally strongly-consistent writes across regions → Spanner / CockroachDB
- Heavy OLAP workhorse → Snowflake / ClickHouse / DuckDB
benchmark/BENCHMARKS.md— methodology, caveats, reproduction stepsbenchmark/RESULTS_localfs.md— 10k / 100k / 1M SaaS+OLAP cards (Vortex + Parquet), every shapebenchmark/RESULTS_seaweedfs.md— local S3-compatible (SeaweedFS) batterybenchmark/RESULTS_real.md— real-cloud S3benchmark/index_localfs.html/benchmark/index_real.html— live dashboards (no server needed)CAPABILITIES.md— public capability matrixdocs/sql-support.md— per-syntax SQL matrix (auto-generated)docs/decisions/— ADRs (every "no" with the trigger that would change our mind)docs/architecture.md— the four-layer stack in detail
Basin (this repo) is the data plane. Three sibling repos sit around it:
bas-in/basin-cloud— control plane and dashboard (Rust axum + sqlx backend + React/Vite SPA, Apache-2.0). Manages orgs, projects, billing; runs Basin engines on Fly Machines per project. Operators who want a managed UI use it. Operators running a single self-hosted engine do not — basin-server alone is sufficient.bas-in/basin-cli— operator daily-driver (Go, Apache-2.0, stdlib-only).basin login,basin projects list,basin sql run, release artefacts are Sigstore-signed. Talks to basin-cloud's/v1/*API.bas-in/basin-js— TypeScript SDK (MIT). Supabase-shapedcreateClient(url, anonKey)that talks directly to a Basin engine (pgwire + REST), not through basin-cloud. Browser, Node, Deno, Bun, Cloudflare Workers.jsr:@bas-in/basin-jsandnpm:@bas-in/basin-js.- Planned client SDKs — basin-py, basin-rs, basin-go, basin-dart, basin-swift, basin-kotlin. All will follow the same engine-direct shape as basin-js.
Licensing rationale. Server-side projects (basin engine, basin-cloud, basin-cli) are Apache-2.0 to carry the patent grant operators expect from infrastructure. Client SDKs (basin-js and future siblings) are MIT to match the norm of the SDK ecosystems they sit in.
crates/
basin-common shared types, errors, telemetry
basin-storage Vortex (default) / Parquet + object_store under project prefixes
basin-catalog Iceberg-style catalog (in-memory + Postgres-backed durable)
basin-wal file-backed WAL (Raft-backed in v0.2)
basin-shard in-process shard owner with WAL → Vortex/Parquet compactor
basin-engine single DataFusion engine — point reads + analytical pool, per-project sessions
basin-router pgwire v3 (simple + extended query)
basin-vector native HNSW vector search
basin-hottier in-memory hot tier (read-merge + tombstone fast path; on-by-default rollout in 5.14.C5/C6)
basin-placement (Phase 3 v0.2) (project, partition) → owner mapping
services/
basin-server single-process binary
benchmark/ dashboard + auto-regenerated RESULTS_*.md
docs/
architecture.md the four-layer stack, in detail
multi-project.md the multi-project SaaS story (per-project isolation, scheduler, cost math)
decisions/ ADRs — every "no" with the trigger that would change our mind
sql-compatibility.md hand-written compatibility narrative (planner / catalog scope)
sql-support.md auto-generated per-syntax matrix (sql_support_matrix.rs)
tests/integration/ cross-crate viability + scaling + Postgres comparisons
# Workspace build:
cargo build --workspace
cargo test --workspace
# Run the benchmark suite + regenerate dashboard / RESULTS_localfs.md:
cargo test -p basin-integration-tests --tests -- --nocapture
python3 benchmark/bundle.py
# Then open the dashboard (no server required):
open benchmark/index_localfs.htmlApache-2.0 — see LICENSE.
Contributions welcome. The project is opinionated about scope (docs/decisions/) — open an issue before writing a PR that adds new surface area. The OSS code is the database; commercial cloud orchestration lives in a separate private repo and never affects what OSS users get.
Basin is a Postgres-compatible, bucket-native, multi-tenant database on object storage, with Vortex (default columnar, LFAI) and Apache Parquet (opt-in, interchange) storage, an Apache Iceberg catalog, a file-backed WAL with a Raft WAL simulation toward distributed v0.2, native vector search (HNSW), per-file catalog bloom filters for point-query pruning, HTAP hot tier shipped on by default (ADR 0016), and pgwire protocol support that works with psql, tokio-postgres, asyncpg, JDBC, Diesel, SeaORM, and any other Postgres driver. Basin compares to Postgres, Neon, Supabase, Nile, Turso, PlanetScale, Aurora, ClickHouse, SingleStore, DuckDB, and CockroachDB for cheap-storage SaaS, audit-log, RAG / vector, HTAP, and multi-project use cases. Self-hostable, Apache-2.0 licensed, written in Rust.