Skip to content

Multi-tenant Buzz relay: community_id as a server-resolved key (comprehensive rewrite)#1321

Merged
tlongwell-block merged 106 commits into
mainfrom
rewrite/relay-mt-comprehensive
Jun 29, 2026
Merged

Multi-tenant Buzz relay: community_id as a server-resolved key (comprehensive rewrite)#1321
tlongwell-block merged 106 commits into
mainfrom
rewrite/relay-mt-comprehensive

Conversation

@tlongwell-block

@tlongwell-block tlongwell-block commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

Multi-tenant Buzz relay — community_id as a first-class, server-resolved key

Makes community_id a first-class, server-resolved key on every scoped row: the relay derives a connection's tenant from durable data (the request host → communities row), never from caller input, and threads a &TenantContext through every scoped DB read and every Redis publish. This is the foundation for hosting Buzz multi-tenant on shared infra with provable cross-community isolation, checked against the machine-proven safety spec landed in #1285.

The fence

TenantContext can only be minted on the host-resolution path (bind_community / the reaper's per-row TenantContext::resolved). Everywhere downstream takes &TenantContext by reference and reads it — nothing else constructs one from client input. Read-path caches take CommunityId; write/invalidate publishers take &TenantContext (the Redis topic key needs the host).

What's in this PR

Path-partitioned lanes, each scoping a subsystem to its tenant:

  • coreTenantContext + CommunityId, the server-resolved tenant fence every other lane threads.
  • dbcommunity_id-native schema, EventQuery::for_community, by-id/by-channel reads scoped, reaper RETURNING (community, host) per archived row, archived-identities composite-PK fix, api_tokens lookups scoped.
  • auth — community-scoped rate limiter, shared community-scoped NIP-98 replay seen-set (any-pod minting, Redis-backed), host-binding side door + access-checker fence, TTL/key-case hardening; NIP-98 u-URL host is per-tenant, not config-global.
  • pubsub — Redis topics scoped by community (buzz:{community}:channel:{channel} / :global), tenant-scoped topic refcounts; presence/typing isolation covered.
  • search — Typesense removed; Postgres FTS, community-scoped; ChannelScope enum closes the channel-less fence hole; privacy-kind exclusions restored at the FTS storage layer.
  • audit — per-community hash chain on the frozen audit_log DDL; NewAuditEntry.community_id widened to CommunityId.
  • workflow — workflow execution and approval lookups scoped to their community (community-scoped get_workflow); community-scoped durable claim wired into the scheduler; cold-start anchor seeded so new schedules fire.
  • relay wiring — every door (WS/bridge/media/NIP-05) fails closed on an unmapped host (no default tenant, no host echo); NIP-42 AUTH relay tag bound to the per-tenant host; media + git substrate scoped by tenant; local-echo dedup, DM command writes, and reminder claim/release scoped by community; background loops get tenant from the DB row they act on (reaper per-row) or the configured relay_url (dev/CI reconciler, reminder scheduler); deployment-community cases with no connection tenant (git hook/finalize, workflow sink) resolve via the same seam.
  • NIP-28 proxy removal — removed the unused buzz-proxy crate, proxy just recipes/scripts/docs, and the proxy:submit auth-scope bypasses; relay ingest now keeps the stricter direct-auth rule for non-gift-wrap events. The desktop localhost media proxy and token consumer paths remain intact.
  • CI/schema hardening — CI setup now re-attaches Postgres partitions after pgschema apply, and schema/schema.sql matches the privacy-kind FTS exclusions.
  • desktop scroll compatibility — after merging latest origin/main, reconciled the anchored-scroll/load-older virtualizer changes and fixed the bottom target for timeline rows that include dividers.
  • admin CLIresolve_admin_tenant reads RELAY_URL host → lookup_community_by_host, fail-closed; membership-list publish, reconcile, and existence query scoped.
  • conformance harness — a runtime trace schema + an independent replay checker that observes the live ingest seam and proves isolation obligations end-to-end (two-host A/B), plus the conformance row series filling each obligation (host-binding, search FTS, channel membership, users/NIP-05, api-tokens/NIP-98, pubsub presence, workflow trigger). Obligations whose wire path isn't landed yet (e.g. approval-token minting, gated on WF-08) are left as explicit pending_lane breadcrumbs rather than faked green.
  • Dropped the Typesense-only reindex_kind0 backfill binary and the Typesense subsystem from the Helm chart, local dev/test stacks, and CI (obsolete under Postgres FTS).

Behavior changes called out for review

  • NIP-05 .well-known/nostr.json is now host-bound (was single-tenant off config.relay_url): binds community from the request Host header, falls through to empty {names,relays} on an unmapped host.
  • BUZZ_RECONCILE_CHANNELS reconciles the configured community only (dev/CI single-community). In a multi-community deploy the safe failure mode is incomplete reconcile, never cross-tenant access.
  • Huddle audio is "unavailable" under horizontal scaling (§5b, Tyler-decided): the relay surfaces a clear client-handleable signal on huddle join in multi-pod deploys; single-pod keeps old behavior.

Verification

  • Current PR head (8356aa4d2143f734e7c3c7a375c258b736acf929) has GitHub status checks green: 26 SUCCESS, 4 expected SKIPPED, no pending/failed checks.
  • Local follow-up validation included pnpm --dir desktop check and focused desktop/tests/e2e/scroll-history.spec.ts smoke coverage (12/12) after the origin/main merge.
  • cargo check --workspace green; cargo fmt --check clean; clippy clean across the touched crates (-D warnings).
  • Per-crate test suites green against local Postgres (buzz-db, buzz-audit, buzz-relay incl. --include-ignored --test-threads=1, buzz-admin compiles).
  • Provenance guard test added — audit_records_caller_actor_not_relay_signer_for_relay_signed_event — proven to bite by reverting the fix (recorded the relay signer instead of the caller actor; restored, green).
  • Conformance rows are wire-live two-host isolation tests, each with a documented mutate-bite (drop the community fence → the row goes red).

Provenance

Every commit carries a Signed-off-by trailer, so the branch is DCO-clean. Authorship/trailers vary across the branch: earlier subsystem lanes are authored and self-signed-off by tlongwell-block; later agent-authored fix/test lanes generally carry the responsible human operator in Co-authored-by + Signed-off-by: Tyler Longwell <tlongwell@block.xyz>, with the remaining agent-authored commits self-signed by the committing agent.

Based on the #1285 safety floor (main@2ecdcce7b) and later merged current origin/main into the branch with a normal merge commit (16f955760, no rebase/force-push). Supersedes #1259 (Typesense removal folded in here).

tlongwell-block and others added 24 commits June 26, 2026 20:36
…fence

buzz-core gets the zero-I/O tenant identity types every scoped layer
shares. TenantContext encodes conformance row-zero in the type system:
no Default, no Deserialize, no public constructor except resolved(),
which is meant to be called only from host resolution. Downstream code
holds &TenantContext and can read but not mint a community, so
client-chosen-community cannot type-check outside resolution.

Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
The frozen base for the multi-tenant rewrite. Consolidated 0001 schema
makes community_id a first-class, server-resolved key on every scoped
row, mapped table-by-table to docs/multi-tenant-conformance.md.

Schema highlights:
- channels PK is (community_id, id): the same channel UUID may legitimately
  co-exist in two communities; child FKs (channel_members, workflows,
  thread_metadata) are composite (community_id, channel_id) so a child can
  never reference a cross-community channel — DB-enforced, not by handler
  discipline. channels.community_id is immutable (BEFORE UPDATE trigger).
- communities.host uniqueness is UNIQUE(lower(host)); normalize_host applies
  the same rule on the resolution side, so case/dot/default-port variants
  can never split one tenant into two.
- every scoped unique/PK leads with community_id; cross-community dedup of
  the same signed event is allowed, within-community dup rejected.
- new tables: communities (host map), scheduled_workflow_fires (the cron
  at-most-once claim), audit_log (per-community chain), and an explicit
  _operator_global_tables registry the migration lint reads.

buzz-core:
- normalize_host(host): the one shared host-canonicalization rule.
- TenantContext fence doc corrected to say plainly it is a lint-and-review
  fence, not a compiler fence (resolved()/from_uuid are pub) — honest about
  the guarantee the API actually gives.

Schema proven against Postgres with an adversarial fence suite (re-tenant
rejected, cross-community FKs rejected, same-UUID/same-event cross-community
allowed, host-case collision rejected). buzz-core: 189 tests + 2 doctests
green.

Folds in review round 1 from Mari (channel global-uniqueness leak, host
normalization, fence-claim honesty) and Sami (NIP-98 localhost normalization
to be dropped in the auth lane).

Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Closes the last Lane-0 schema items before the frozen base:

- events.search_tsv TSVECTOR GENERATED ALWAYS AS to_tsvector('simple',
  content) STORED + GIN idx_events_search_tsv. The Typesense->Postgres FTS
  data shape, landed in Lane 0 because it touches the just-locked events
  table (Quinn option A). GENERATED ALWAYS = single source of truth: proven
  against PG that a client cannot forge search_tsv out of sync with content
  (generated_always rejection). Index left minimal single-column GIN; the
  search lane picks the final spelling after EXPLAIN (Max's caveat).
- Delete stale 0002_backfill_d_tag.sql / 0003_event_reminders.sql. In the
  consolidated-from-scratch model 0001 already carries d_tag, not_before,
  delivered_at, and idx_events_not_before; re-running the old additive
  migrations would error (duplicate column / duplicate index name).

audit_log DDL shape confirmed for the audit-crate collapse (Dawn's lane):
PRIMARY KEY (community_id, seq), UNIQUE (community_id, hash), community_id
NOT NULL on every row. 0001 is the single source; buzz-audit drops its own
schema.rs / AUDIT_SCHEMA_SQL / ensure_schema() in the audit lane.

Re-proven against real Postgres — full fence suite green: T1 re-tenant
rejected, T6 cross-community member FK rejected, T6b same-community ok, T7
same channel UUID in two communities allowed, T8 host case-collision
rejected, T9 same event id in two communities allowed, plus the FTS
generated+GIN match and the forge-rejection. buzz-core: 189 + 2 doctests.

Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Co-authored-by: Mari <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Co-authored-by: Mari <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Add EventQuery::for_community so relay call sites can keep concise
struct updates without restoring a tenantless Default. The constructor
requires the server-resolved CommunityId and preserves the old optional
filter defaults everywhere else.

Return the owning community host from the ephemeral-channel reaper by
joining communities in the archive UPDATE. Reaper consumers can now build
TenantContext per archived row from DB-resolved community+host instead of
hoisting or forging a batch-level tenant.

Co-authored-by: Mari <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
RateLimiter::check_and_increment now takes &TenantContext, and
rate_limit_key emits buzz:{community}:ratelimit:{pubkey_hex}:{suffix}.
Same pubkey active in two communities consumes two independent quotas,
matching the S1 cross-community isolation fence in the buzz-relay
rewrite spec.

check_ip_connection stays operator-global by design. The IP fence runs
at connection acceptance, before host->community resolution has
completed (or, on resolve failure, instead of it). Threading
&TenantContext through it would invert the order of operations. Per-
(community, IP) caps, if ever needed as a tenant-fairness signal,
belong in an additive LimitType keyed on (community, ip) — not in this
trait.

RedisRateLimiter in buzz-pubsub follows the new trait signature.
AlwaysAllowRateLimiter test impl mirrors it. Two new tests pin the
behavior: the key includes the community prefix, and same-pubkey-two-
communities yields two distinct Redis keys.

Local cargo test -p buzz-auth: 36 passed. Local cargo test -p
buzz-pubsub: 3 passed, 6 Redis-required ignored. Workspace-wide check
not run locally (sqlx 0.9.0 requires rustc 1.94, local toolchain is
1.89 — same constraint Max hit on the pubsub lane); relying on CI for
the full integration compile.

(cherry picked from commit 6a92f0b)

Co-authored-by: Sami <f4a42a97e594b77bdbd8ee35191c8b28a94a4cb871d96f32921558275421fb68@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Adds the §5 pre-build gate for multi-tenant replay protection.

buzz-auth gains a Nip98ReplayGuard trait plus the
nip98_replay_key(ctx, event_id) helper. The trait's try_mark contract
requires atomic set-if-absent semantics; an in-process cache (moka,
DashMap) does not carry the freshness proof across pods under the
"any pod, any connection" architecture (§4B), so the production
implementation MUST be shared state. The Redis-backed impl lives in
buzz-pubsub as RedisNip98ReplayGuard and uses a single SET key 1 NX
EX <ttl> per claim.

Key shape: buzz:{community}:nip98:{event_id_hex}. Event ids are
content-addressed so natural cross-community collision is zero, but
the gate is fail-closed isolation — a same-id replay across
communities must consult two distinct seen-set rows, not one shared
row. Tests pin both the prefix and the cross-community isolation
guarantee.

TTL floor is DEFAULT_REPLAY_TTL_SECS = 120, matching the §5 gate
requirement and the doubled NIP-98 ±60s timestamp tolerance.
Implementations MAY clamp sub-floor TTLs up to the floor; they MUST
NOT honor smaller values. The Redis impl clamps.

Caller contract documented in the trait: verify first, then mark.
Burning a seen-set slot on a forgery would let an attacker who learns
a future event id DoS the legitimate event. On Err (Redis
unreachable) callers MUST fail closed.

Not wired into a call site in this commit — there is no NIP-98 HTTP
handler in Lane 0 yet. Eva's relay-wiring lane will consume the trait
when the HTTP path lands; the contract is documented for that
integration.

Validation:
- cargo test -p buzz-auth --lib ✅ 40 passed (4 new in nip98_replay).
- cargo test -p buzz-pubsub --lib ✅ 3 passed, 9 Redis-required
  ignored (3 new in nip98_replay).
- cargo test -p buzz-pubsub --lib nip98_replay -- --ignored against
  local Redis ✅ 3 passed: first-claim/replay, cross-community
  isolation, sub-floor TTL lifted to floor.
- Workspace check not run locally (sqlx 0.9.0 / rustc 1.94 vs local
  1.89); CI catches it.

(cherry picked from commit a2a9ef4)

Co-authored-by: Sami <f4a42a97e594b77bdbd8ee35191c8b28a94a4cb871d96f32921558275421fb68@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
…acing

Red-team pass against the auth lane surfaced one real bug and two
robustness gaps. All three caught by tests, the bug verified by
temporarily reverting the fix and watching the test fail with the
real Redis error.

1. Real bug: a caller passing ttl_secs > i64::MAX (e.g. u64::MAX from
a config bug) caused Redis to return "ResponseError: value is not an
integer or out of range" from `SET NX EX <ttl>`. RedisNip98ReplayGuard
then returned Err, the trait contract forces callers to fail closed,
and every NIP-98-gated request from that point would have errored
with no visible link back to the bad config. Fix: introduce
MAX_REPLAY_TTL_SECS (1 hour — 30× the natural physical maximum, well
inside i64::MAX) and clamp ttl_secs into [DEFAULT, MAX] before the
SET. New ignored-Redis test `above_ceiling_ttl_is_clamped` exercises
the path with u64::MAX and asserts the claim+replay sequence
succeeds, which it only does with the clamp.

2. Robustness: pin "all rate-limit and replay key components are
lowercase ASCII" as a unit-level invariant. If pubkey::to_hex,
Uuid::Display, or LimitType::key_suffix ever started emitting
uppercase, the same logical (community, pubkey/event_id) would map to
two distinct Redis keys — silently doubling the rate-limit quota or
breaking the seen-set's identity. Two new tests
(`rate_limit_key_components_are_lowercase`,
`key_components_are_lowercase`) catch the regression in CI rather
than production.

3. Robustness: structured tracing on every Redis failure path with
`community = %ctx.community()` as a structured field, so ops can
group log alerts by tenant without needing the community id to be
embedded in the AuthError string. The user-facing AuthError::Internal
payload stays the existing convention (consistent with rate_limit.rs
neighbors); the per-tenant context lives in tracing fields, not in
the error string.

Also: add `ttl_floor_below_ceiling` and `max_ttl_fits_in_redis_signed_ex`
unit tests so the two TTL constants can't drift past each other or
above Redis's signed-EX limit in a future edit.

Out of scope for this lane (flagged to other lane owners):
- AuthError::Internal generally embeds raw downstream error strings
  (existing pattern across rate_limit.rs and nip98_replay.rs). Could
  leak community/tenant identifiers if those strings ever surface to
  clients. Audit lane (Quinn) owns the error-message safety rule per
  Eva's [6] lane split.
- check_ip_connection MUST be called before host resolution / on
  every connection (including failed-host-resolution attempts).
  Otherwise an attacker who picks a non-matching host header bypasses
  the IP cap. Wiring lives in the relay-wiring lane (Eva).

Validation:
- cargo test -p buzz-auth --lib: 44 passed (4 new red-team tests).
- cargo test -p buzz-pubsub --lib: 3 passed, 10 Redis-required
  ignored.
- cargo test -p buzz-pubsub --lib nip98_replay -- --ignored against
  local Redis: 4 passed (1 new ceiling-clamp test).
- Bug verified: with the clamp temporarily reverted, the
  above_ceiling_ttl_is_clamped test fails with the real Redis error
  "value is not an integer or out of range" — proving the test
  catches the regression, not just the fix.

(cherry picked from commit f54d728)

Co-authored-by: Sami <f4a42a97e594b77bdbd8ee35191c8b28a94a4cb871d96f32921558275421fb68@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Two adversarially-proven multi-tenant fences for the auth lane on the frozen Lane 0 SHA:

1. NIP-98 verifier: drop loopback aliasing unconditionally.
   normalize_url() collapsed localhost / ::1 -> 127.0.0.1 — a testing convenience that
   becomes a row-zero side door under multi-tenant. The u-tag host is the community
   binding (docs/multi-tenant-conformance.md, NIP-98 row); collapsing the three would
   let an event signed for localhost pass against a 127.0.0.1-resolved community (or
   vice versa). Inverted the localhost test to bite the new strict rule: signed-for-one
   vs expected-other now REJECTS, identity still passes. Adversarial: re-introduced
   the aliasing -> test goes red -> restored.

2. ChannelAccessChecker: thread &TenantContext through every method.
   Frozen 0001 has channels PK (community_id, id), so the same UUID legitimately
   co-exists across communities. A bare WHERE id =  implementation would be a
   cross-community existence oracle. Mirror of buzz-db rule 4a.1 on the auth side.
   MockAccessChecker keyed on (community, pubkey, channel_id); new test
   access_does_not_cross_communities bites the bare-id direction. Adversarial:
   dropped the community filter from the mock -> test goes red -> restored.

No external impl of ChannelAccessChecker in-tree (DB uses a separate free function
under Mari's lane), so the trait signature change is contained.

cargo test -p buzz-auth: 45 passed / 0 failed.

Lane: auth (buzz-auth). Base: e349d76 (frozen Lane 0).
(cherry picked from commit 3df6179)

Co-authored-by: Sami <f4a42a97e594b77bdbd8ee35191c8b28a94a4cb871d96f32921558275421fb68@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
(cherry picked from commit 69237ef)

Co-authored-by: Max <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
(cherry picked from commit 4af7348)

Co-authored-by: Max <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
The Lane-0 freeze landed `events.search_tsv TSVECTOR GENERATED ALWAYS AS
(to_tsvector('simple', content)) STORED` + `GIN (search_tsv)` directly in
the schema. With that in place the entire Typesense apparatus is dead
weight: there is nothing to index out-of-band, no consistency window to
reason about, no client-forgeable index/content drift. Indexing is the
SQL write.

This rewrites `crates/buzz-search/` from scratch around that:

  - `query.rs`: one SQL builder. `community_id = $ctx` is the first
    predicate of every executed statement and is unconditional —
    `SearchQuery` requires a `CommunityId` at the type level (no
    construction path omits it). `search_tsv @@ websearch_to_tsquery(...)`
    is the FTS predicate; `ts_rank_cd DESC, created_at DESC, id` is the
    order. Channel scope replaces today's `__global__` sentinel with
    `channel_id IS NULL`. Empty query short-circuits without a roundtrip.

  - `lib.rs`: thin `SearchService { pool }`. Takes `&PgPool` directly so
    the crate stays a leaf — no buzz-db dependency. Re-exports
    `CommunityId` for callers that need to mint the fence.

  - `error.rs`: collapsed to one variant (`Db(sqlx::Error)`); empty
    queries are not errors.

  - Deleted `collection.rs` and `index.rs` (Typesense HTTP client and
    indexer). Dropped `reqwest`/`serde`/`serde_json`/`chrono`/`nostr`
    from `Cargo.toml`.

  - Added `tests/fts_integration.rs` — 8 integration tests against real
    Postgres, each on its own throwaway schema applying the frozen
    `migrations/0001_initial_schema.sql` via `include_str!`. The
    load-bearing one is `search_does_not_return_other_community_events`:
    mutating the `community_id = $ctx` predicate to `1=1` makes that
    test go red (verified, then reverted) — the fence bites where it
    has to.

Conformance row 50 — search re-auth and one-shot NIP-50 — is unchanged
in shape: the relay refetches canonical events per hit through buzz-db's
scoped fetcher and runs the access predicate. Search is never the
access boundary; this crate just returns candidate ids. The row's
Typesense prose rewrite is owned by Eva's integration lane (one writer
per path).

EXPLAIN ANALYZE evidence on a 200k-row community confirms the planner
picks `Bitmap Index Scan on events_p<...>_search_tsv_idx` for the
populated partition (full plan in RESEARCH/SEARCH_LANE_FTS_EXPLAIN.md
in the workspace). Single-column `GIN (search_tsv)` is sufficient at
this scale — no `btree_gin` needed (Max's caveat holds).

Cross-lane removals owed to Eva (relay-wiring lane, not this commit):
  - relay state.rs: remove `search_index_tx` mpsc + worker
  - relay main.rs: remove `search.ensure_collection()` call
  - relay handlers/event.rs: remove `search_index_tx.send()`
  - relay api/bridge.rs::handle_bridge_search: rewrite to new API
  - relay handlers/req.rs::handle_search_req: rewrite to new API
  - relay handlers/req.rs::build_search_channel_scope_filter: delete
  - relay bin/reindex_kind0.rs: delete
  - docker-compose.yml: drop typesense service + volume
  - docs/multi-tenant-conformance.md row 50: rewrite Typesense prose

Tests: `cargo test -p buzz-search --test fts_integration --
--include-ignored --test-threads=1` — 8 passed, 0 failed.
Clippy: `cargo clippy -p buzz-search --all-targets -- -D warnings` — clean.

(cherry picked from commit e31c098)

Co-authored-by: Quinn <96f056ad5f2305c8ddf637dc65d048aa4c12d7daeb8867690e34fca46b0ef64c@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
The legacy 2x2 `(channel_ids: Option<Vec<Uuid>>, include_channel_less: bool)` shape could not unambiguously express "channel-less events only" — both `Some(vec![]) + true` and `None + true` fell into the no-constraint branch, silently broadening to all community channels rather than restricting to `channel_id IS NULL`. That matched the legacy Typesense `channel_id:=__global__` sentinel one way (per-channel + global) but not the other (global only).

Replace with a single `ChannelScope` enum whose four variants are 1-to-1 with the legacy `(accessible_channels, include_global)` matrix:

  - non-empty + true  -> ChannelsOrChannelLess(accessible)
  - non-empty + false -> Channels(accessible)
  - empty     + true  -> ChannelLessOnly  (the variant the old shape could not express)
  - empty     + false -> caller short-circuits to EOSE, doesn't call search

Emitted SQL fragments are byte-identical to the legacy match for the three carry-over cases; `ChannelLessOnly` adds `AND channel_id IS NULL` — the fence the old type could not express.

Verification:
- Full package `cargo test -p buzz-search -- --include-ignored --test-threads=1`: 9/9 green (8 existing + 1 new `channel_less_only_excludes_per_channel_events`).
- Adversarial mutation: replaced the `ChannelLessOnly` SQL emission with a no-op (the buggy semantic the old shape produced); new test went RED with 3 hits instead of 1, restored, green again. The fix is the emitted predicate, not the variant name.
- clippy -D warnings clean; fmt clean.
- Empty-vec edge cases are intentionally not special-cased: `Channels(vec![])` emits `channel_id = ANY('{}')` (false-for-all, zero hits, preserves the old early-return semantic via SQL); `ChannelsOrChannelLess(vec![])` is equivalent to `ChannelLessOnly`.

Coordinated with Eva ahead of relay-wiring sweep at req.rs and bridge.rs so call sites land against the final type, not the buggy one.

(cherry picked from commit c8cd333)

Co-authored-by: Quinn <96f056ad5f2305c8ddf637dc65d048aa4c12d7daeb8867690e34fca46b0ef64c@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Convert the audit log from one global hash chain to an independent
per-community chain, conforming to the frozen Lane-0 0001 schema.

- Collapse to one DDL: delete schema.rs / AUDIT_SCHEMA_SQL and their
  lib.rs exports. The 0001 migration is the sole owner of audit_log.
- Chain shape: PK (community_id, seq), seq monotonic per-community,
  UNIQUE (community_id, hash); hash/prev_hash/actor_pubkey as BYTEA;
  object_id TEXT generalizes the old event_id/channel_id; detail JSONB.
- community_id is folded into the SHA-256 (leads the hash) so a row
  cannot be lifted out of one community's chain and re-verified in
  another. Per-community advisory lock — communities never serialize
  each other's audit writes (no throughput bottleneck, no timing oracle).
- verify_chain / get_entries scoped to a CommunityId.
- Error variants carry only per-community seq (meaningless without its
  chain) — never community_id, hash values, or raw action strings.
- AUTH-body protection becomes caller discipline + the AuditAction enum
  (AuthSuccess/AuthFailure carry outcome metadata, never the token);
  the dropped event_kind column is not persisted.

13/13 green (7 unit + 6 Postgres isolation). Adversarial: disabling the
community_id line in compute_hash turns community_id_is_part_of_identity
RED (two communities hash identically); restored to green.

(cherry picked from commit ba11d66)

Co-authored-by: Dawn <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Make the provenance fence visible in the type signature, not a
per-call-site convention. `NewAuditEntry.community_id` becomes
`CommunityId` (the server-resolved newtype) instead of a raw `Uuid`,
so a wiring call site can no longer pass an arbitrary UUID off the
event/channel being acted on — the only doors to a `CommunityId` are
host resolution or a server-scoped DB row, never client input.

The DB-row type `AuditEntry` stays `Uuid`: sqlx reads/writes it directly
and `compute_hash` does `.as_bytes()` on it, so the stored hash bytes are
byte-for-byte identical and the already-integrated chain stays valid — no
migration, no re-hash. The `as_uuid()` dereference moves inside
`AuditService::log` at the DB boundary, where the column is written; the
advisory-lock key is unchanged (CommunityId's Display delegates to Uuid).

Drop the now-orphaned `Serialize`/`Deserialize` derive (and the
`#[serde(default)]` on `detail`) from `NewAuditEntry`: it has no
serde consumer — it travels only through the in-process audit sink
(mpsc), never a wire/DB boundary. Keeping it non-deserializable
reinforces the fence: no client blob can mint a NewAuditEntry.

Full package green (13/13, incl. the 6 PG isolation tests and the
community_id_is_part_of_identity fence); clippy -D warnings + fmt clean.
Adversarially verified the fence is non-vacuous: dropping community_id
from compute_hash turns community_id_is_part_of_identity RED, restored.

(cherry picked from commit 284cc69)

Co-authored-by: Dawn (sprout agent) <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
…ind)

Conformance row zero: req.community = resolve_host(connection.host), bound
before any handler observes tenant data. This lands the relay-side seam:

- HostResolver trait (native async fn, no async-trait dep) — buzz-db's
  Db::resolve_host satisfies it; the relay depends on the trait, not the
  query, so the binding is testable without a database. Callers are generic
  over R, no dyn dispatch (the relay holds a concrete Db).
- bind_community(): normalizes the host with the one shared rule, resolves
  it, and fails closed on BOTH unmapped host AND lookup error — there is no
  path that yields a default/fallback community. UnmappedHost is a distinct
  variant the call site turns into a GENERIC reject (no host echo, no
  unmapped-vs-error distinction) so an unauthenticated caller can't probe
  which hosts exist.
- TenantContext carries the normalized host, so downstream NIP-05/audit
  labelling and the NIP-98 u-host check all see the canonical form the
  community was resolved from.

Tests (4, green) cover known-host bind, variant normalization (case/dot/
default-port can't split a tenant), unmapped fail-closed, and lookup-error
fail-closed-not-default. Adversarially verified: mutating the None arm to
fall through to a nil default community turns unmapped_host_fails_closed RED.

Seam contract for the buzz-db lane (Mari): Db::resolve_host(&self,
normalized_host: &str) -> Result<Option<CommunityId>, DbError>, a SELECT id
FROM communities WHERE host = $1 on the normalized key. Router call site
(nip11_or_ws_handler) + threading TenantContext through handle_connection
land next in this lane.

(cherry picked from commit 0be8532e0e94e5ecd6529f2f3f52255dd36f6009)

Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
…ing (§5b)

Plan §5b, decided by Tyler: rather than sticky-route huddles or ship a
silent split-room, a horizontally-scaled deployment surfaces a clear,
client-handleable unavailable signal on huddle join.

- config: huddle_audio_available bool, env BUZZ_HUDDLE_AUDIO_AVAILABLE.
  Defaults true so single-pod (N=1) deployments keep today's huddle
  behavior unchanged. Operators running multiple relay pods set it false.
- audio handler: after auth + membership pass and BEFORE get_or_create joins
  a room, if huddle_audio_available is false we send
  {type:error, code:huddle_audio_unavailable, message:...} and return — no
  silent room join whose frames never cross pods.

Why a config flag and not pod-count self-detection: the relay can't reliably
count its own pods; an explicit operator flag is the honest model and keeps
the §4 fork-B (any-pod-any-connection) generic routing free of huddle
stickiness. The real fix is the out-of-relay media/SFU service (Tyler's
long-term target), out of scope for this rewrite.

Tests: default-true (N=1 compat) and env-false-disables, both green. Full
buzz-relay --lib green at --test-threads=1 (374). Note for this lane: there
is a pre-existing parallel-run env-var race (global_presence_pubsub test
calls Config::from_env without the config tests' ENV_MUTEX guard) — not a
regression from this change; flagged to fix in the wiring lane.

(cherry picked from commit cc2bc29d4429da9e1a3e80217936340a4c1ca721)

Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Executable form of docs/multi-tenant-conformance.md: one module per
obligation-table surface row (14 surfaces, 18 isolation tests) plus the
N=1 parity gate documented against the existing e2e suites.

Each A/B isolation test addresses two hosts (RELAY_URL_A/RELAY_URL_B)
on the SAME relay process — one binary, one Postgres, one Redis, two
communities — proving no tenant-observable state crosses a boundary
derived from host, never caller input. All #[ignore] (need a running
two-host relay) so a normal cargo test run reports 0 passed / 18 ignored;
they cannot fake-pass.

Rows the lane hasn't landed yet panic via pending_lane(lane, obligation),
which names the exact obligation for the owner to fill in and makes the
remaining work one grep. Lane ownership tagged per module.

(cherry picked from commit 9d6d35f07a17fcf5ccd8a6f20fdede3349e67024)

Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
…row)

The conformance obligation for the NIP-11 surface: RelayInfo::build must
not grow unscoped DB/search/audit inputs, so an unauthenticated NIP-11
read can never become a cross-community enumeration oracle.

Binds RelayInfo::build to its exact allowed signature via a const fn
pointer. Adding a &Db / &AppState / search / audit input makes the
function-pointer type stop matching and breaks the build at the fence —
a silent cross-tenant leak becomes a hard compile error, deny-lint style.

Adversarially proven: injecting a &AppState param into build() produces
error[E0308] mismatched types at the fence const (plus E0061 at the call
sites); reverted to confirm the fence, not the call sites alone, is the
guard. buzz-relay package 374 green at --test-threads=1.

(cherry picked from commit 76a4044c7cfb1c96a6817be1e81c7ae42d1ea3da)

Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Archived identity state is tenant-local; a pubkey archived in one community must not read as archived in another. Thread CommunityId through the archived identity queries and DB wrappers, and bind the composite key used by the migration.

Co-authored-by: Mari <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Threads server-resolved `community_id`/`TenantContext` through the whole
relay call graph and the operator CLI against the v3 DB/pubsub API, so
every scoped row read and every Redis publish names a community the relay
derived from data, never from caller input.

Relay (`crates/buzz-relay`):
- Read-path caches take `CommunityId`; write/invalidate publishers take
  `&TenantContext` (the Redis topic key needs the host). The cross-node
  fan-out path only has the community, so caches stay constructible there.
- Doors fail closed: WS/bridge/media/NIP-05 bind community from the request
  host via `bind_community`, falling through to an empty/404 response on an
  unmapped host — no default tenant, no host echo.
- Background loops get tenant from the DB row they act on: the reaper builds
  `TenantContext::resolved(row.community_id, row.host)` per archived channel
  from the reaper RETURNING; the dev/CI reconciler and reminder scheduler
  resolve the one configured community from `relay_url`, fail-closed.
- Deployment-community cases with no connection tenant (git hook/finalize,
  workflow sink) resolve via the same host-resolution seam.
- Drop the Typesense-only `reindex_kind0` backfill binary, obsolete under
  the Postgres FTS migration and referenced nowhere.

Admin CLI (`crates/buzz-admin`):
- New `resolve_admin_tenant` reads `RELAY_URL` host (the CLI runs
  `compose exec relay buzz-admin`, sharing the relay's env) and resolves it
  via `lookup_community_by_host`, fail-closed on an unmapped host.
- Scope the NIP-43 membership-list publish (`EventTopic::Global`), channel
  reconcile, `get_members`, and the kind:39000 existence `EventQuery`
  (`..EventQuery::for_community`). Drop the now-dead `uuid` dep.

Workspace gate: `cargo check --workspace` green; buzz-db 97/97, buzz-audit
13/13, buzz-relay 375 + main 1 (`--include-ignored --test-threads=1`),
buzz-admin compiles, fmt + buzz-admin clippy clean.

Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Relay E2E applies schema/schema.sql as declarative desired state before the relay starts. The multi-tenant migration added FKs to communities, but the snapshot did not define the table, so pgschema failed before tests ran.

Co-authored-by: Mari <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
CI Rust Lint + Windows Rust run `cargo clippy --workspace --all-targets -- -D warnings`; the community_id/tenant args pushed six fns to 8/7 and the new NIP-98 replay code tripped clamp/const-assert lints. Resolve at the bar, matching existing repo conventions:

- 6x #[allow(clippy::too_many_arguments)] on the fns that gained a tenant/community arg (same convention already used across buzz-db/relay).
- buzz-pubsub replay TTL: .max().min() -> .clamp() (floor 120 < ceiling 3600, cannot panic; behavior identical, incl. the u64::MAX clamp test).
- buzz-auth replay const-drift tripwires: scoped #[allow(clippy::assertions_on_constants)] — the assert-on-constant IS the design (fails if someone drifts the TTL constants).

Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Relay E2E builds the database from schema/schema.sql via pgschema apply, while the rewrite migration had moved the first-class community_id schema forward. The snapshot was still mostly pre-MT, so it produced unscoped tables such as channels(id) instead of channels(community_id, id).

Make the declarative snapshot match migrations/0001_initial_schema.sql exactly so the schema path and migration path create the same tenant-scoped shape.

Co-authored-by: Mari <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
@tlongwell-block tlongwell-block force-pushed the rewrite/relay-mt-comprehensive branch from 3fb25c5 to f1b459b Compare June 27, 2026 01:15
tlongwell-block and others added 3 commits June 26, 2026 21:24
The relay resolves each connection's tenant from the durable communities
host map and fails closed on an unmapped host. Under the MT schema,
channels.community_id is NOT NULL with a FK to communities, so the
pre-MT e2e seed (unscoped channel/member INSERTs against an empty
communities table) fails, and every e2e client connection 404s at
host-binding. The relay never auto-seeds a community
(ensure_configured_community has no callers).

Seed the deployment community (host=localhost:3000, matching
RELAY_URL=ws://localhost:3000 after normalize_host keeps the non-default
port) and thread community_id through the channel/member INSERTs:

- setup-desktop-test-data.sh: insert the community row first, then scope
  every channel/member INSERT (Desktop E2E Integration).
- start-relay-for-tests.sh: seed the community after schema apply
  (Relay E2E); psql-or-docker fallback since psql is not on PATH in hermit.
- ci.yml backend-integration: seed after relay start (reconciler retries
  for 2min), before the NIP-ER reminder suite.

ON CONFLICT targets lower(host) to match idx_communities_host, keeping
the seed idempotent.

Verified against live PG: schema apply clean (165 stmts), seed inserts
9 scoped channels + 19 scoped members with zero nulls, host resolves,
re-run is idempotent. Adversarial: an unscoped channel INSERT fails
not-null and a channel against a nonexistent community fails the FK,
proving the community row is load-bearing.

Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Two CI-honesty follow-ups after the first seed surfaced host/ordering
mismatches in the MT e2e path (no product behavior):

1. Desktop E2E 404: the seed-readiness helper queried 127.0.0.1:3000
   while the relay reconciles + the community is seeded for localhost:3000.
   normalize_host keeps the non-default port and 127.0.0.1 != localhost,
   so the inbound host resolved to no community and every /query 404'd.
   Default the helper to http://localhost:3000, matching the rest of the
   desktop e2e suite (e2eBridge.ts / bridge.ts already use localhost) and
   the relay's RELAY_URL.

2. Backend Integration UnmappedHost: the reminder scheduler binds the
   deployment community once at boot and exits permanently on an unmapped
   host (no retry, unlike the channel reconciler). The community was being
   seeded after relay start, leaving the scheduler dead. Apply the schema
   and seed the community BEFORE starting the relay (dropping
   BUZZ_AUTO_MIGRATE since the schema is now applied up front), so the
   scheduler binds on its single boot-time attempt.

Both are test/CI wiring. The Relay E2E suite stays red on a separate,
gated body-level bug (command_executor.rs inserts events without
community_id) tracked for the §4 scoping slice.

Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
The desktop e2e integration relay-boot still used BUZZ_AUTO_MIGRATE with no
pre-boot community seed, so the channel reconciler bound the deployment
community ONCE at boot (outside its retry loop) before setup-desktop-test-data.sh
seeded it, hit UnmappedHost, and exited permanently. The reconciler's retry
loop only covers late-seeded channels, not a late-seeded community — so the
9 seeded channels were never reconciled and 'loads channels from the relay'
saw 0 channels (60s timeout). Both Desktop E2E Integration shards red.

Mirror the proven backend-integration ordering: apply schema + seed the
localhost:3000 community BEFORE the relay starts, and drop BUZZ_AUTO_MIGRATE
(schema is now applied pre-boot). setup-desktop-test-data.sh's own idempotent
community seed becomes a no-op; its channel INSERTs are then picked up by the
reconciler's retry loop.

Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr and others added 8 commits June 27, 2026 20:53
Co-authored-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
lookup_community_by_host runs on every relay handshake (WS/bridge/media/git/nip05). The query used `WHERE host = $1`, but the only host index is `idx_communities_host ON communities (lower(host))` — so Postgres seq-scanned communities on every connection: ~5.4ms scanning all rows at 100k tenants.

Match the indexed expression with `WHERE lower(host) = lower($1)` so the lookup uses idx_communities_host (~0.037ms index scan at 100k rows). Adds an ignored Postgres regression test asserting case-insensitive host lookup resolves against the lower(host) index.

Found by Max while load-testing #1321 at 100k communities.

Co-authored-by: Max <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
The buzz-conformance crate is the independent replay checker for the multi-tenant isolation contract (15 tests: 9 checker unit tests + 6 golden replay fixtures). It has no production buzz-crate deps and needs no infra — pure in-process trace replay — but `just test-unit` only ran buzz-core and buzz-auth, so this gate ran nowhere in CI.

Add `cargo nextest run -p buzz-conformance` to the nextest path of test-unit, and the equivalent `cargo test -p buzz-conformance` to the scripts/run-tests.sh fallback, so the conformance gate runs on every CI unit-test pass. Runs all targets (lib + tests/replay_fixtures.rs), ~0.01s.

Note: the live two-host A/B suite (buzz-test-client conformance_multitenant) still needs a running multi-tenant relay and has todo!()-stubbed rows, so it is not wired into CI here.
Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Add proptest-generated action sequences exercising the conformance
checker beyond the hand-built fixtures, closing the skill's
"property/fuzz-generated action sequences where feasible" gap
(skill-runtime-formal-compliance). Test-only: no production or checker
behavior change.

The tests assert spec-derived invariants about check_trace's verdict —
NOT a parallel oracle re-deriving the verdict (which would just clone
check_step and test the code against itself). Six properties, each
honoring check_trace's fail-fast contract by constructing traces where
the targeted violation is the first/only one:

- non-interference soundness: any read (ReadMessageRows / ReadByIdRows /
  ReadHostFeedRows) carrying a foreign row label is rejected
- non-interference completeness: a fully clean trace is accepted
- AuthCheck Allow + foreign claim bites IllegalTransition; Deny is in-spec
- ImplBug bites CoverageBreach
- a mid-trace state flip bites StateMismatch
- check_trace is deterministic and never panics

proptest is added as a dev-dependency only; the property tests touch
only the crate's public check_trace API and depend on no production
crate, preserving the checker's independence rule.

128 cases, trace length 1..=12. The new tests run in the existing
just test-unit gate (now 22 buzz-conformance tests, was 15) at
negligible cost.

Co-authored-by: Max <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Co-authored-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
@tlongwell-block

Copy link
Copy Markdown
Collaborator Author

Follow-up fix push: feed/user scoping + HA huddle default

Pushed commit bdf48ac1 with the blocker fixes from the review thread:

  • Feed DB helpers now require CommunityId for mentions, needs_action, and activity.
  • Feed SQL scopes events.community_id; mention-backed feeds join event_mentions on (community_id, event_id) and bind m.community_id.
  • Empty accessible-channel lists now mean community-global only (channel_id IS NULL), not all tenant channels.
  • Bridge feed call sites pass tenant.community().
  • get_users_bulk is retained but now community-scoped.
  • Removed dead unscoped-ish Db::query_mentions/query_needs_action/query_activity façade aliases; callers use query_feed_* only.
  • Chart renders BUZZ_HUDDLE_AUDIO_AVAILABLE, defaulting true for one replica and false for HA/multi-replica. Explicit relay.huddleAudioAvailable=true in HA is documented/tested as the operator accepting/owning external multi-pod audio/SFU behavior.
  • Fixed chart whitespace.

Added regression coverage:

  • Ignored Postgres adversarial feed tests for mentions, needs_action, and activity/global-only cross-community behavior.
  • Ignored Postgres adversarial get_users_bulk same-pubkey/two-community test.
  • SQL-shape unit tests for community predicates, composite mention join, and empty-channel semantics.
  • Helm unittest cases for auto-defaults and explicit override precedence.

Validation run locally:

  • ./bin/cargo run -p buzz-admin -- migrate
  • cargo fmt --check
  • git diff --check / git diff --check origin/main...HEAD
  • bin/cargo check -p buzz-db
  • bin/cargo test -p buzz-db --lib feed
  • bin/cargo test -p buzz-db --lib -- --ignored --test-threads=1 — 42 passed
  • helm unittest . — 27 passed
  • bin/cargo test -p buzz-conformance
  • just test-unit
  • push pre-push hook also ran successfully before upload

Review status from the Buzz thread: Mari and Sami both cleared this implementation for commit/push after the behavioral DB tests and Helm override tests were added.

Known remaining deferral: the live two-host/read-side feed conformance row is still not added in this push. The DB-level adversarial tests now cover the exact leak shape; the live conformance lane remains a follow-up runtime coverage item rather than part of this blocker-fix commit.

npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr and others added 14 commits June 28, 2026 12:17
Co-authored-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Companion to author_only_kinds_are_storage_level_unsearchable (a3f407c
extended the hardcoded skip-set; 46ba39e added the AUTHOR_ONLY drift
tripwire). Closes the parallel drift surface for P_GATED_KINDS: today the
schema NULL-tsvector CASE happens to cover every persistent p-gated kind,
but a new entry in P_GATED_KINDS without a matching schema migration would
silently reduce search privacy to L2 (the filter-level #p gate) alone.

Move P_GATED_KINDS from a private const in
crates/buzz-relay/src/handlers/req.rs to a pub const in
crates/buzz-core/src/kind.rs, mirroring AUTHOR_ONLY_KINDS's shape and
placement. The relay handler keeps its identical usage; buzz-search's
integration test now imports the canonical const and iterates it.

Why move rather than re-export: P_GATED_KINDS is a privacy-classification
constant about kinds, not a relay implementation detail. AUTHOR_ONLY_KINDS
already lives in buzz-core::kind for exactly this reason. Adding buzz-relay
as a dev-dependency of buzz-search would create a crate cycle (buzz-relay
depends on buzz-search).

The new tripwire skips ephemeral kinds via buzz_core::kind::is_ephemeral:
ephemeral events (20000-29999) are never stored, so the storage-layer
search defense does not apply to them by category.

Co-authored-by: npub17jjz49l9jjmhhk7cac63j8yt9z555n9cw8vk7v5jz4vzw4ppld5qgj57cc <f4a42a97e594b77bdbd8ee35191c8b28a94a4cb871d96f32921558275421fb68@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub17jjz49l9jjmhhk7cac63j8yt9z555n9cw8vk7v5jz4vzw4ppld5qgj57cc <f4a42a97e594b77bdbd8ee35191c8b28a94a4cb871d96f32921558275421fb68@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Adds the one-off cutover migration that takes a pre-1321 single-community
Postgres to the 1321 multi-tenant schema. Rather than hand-describe the target
(which is how the earlier draft introduced single-tenant PKs on multi-tenant
tables and dropped ~13 community_id FKs), 0002 renames the pre-1321 tables and
enum types aside into a legacy schema, runs 1321's own 0001 verbatim into a
clean public schema, copies every row forward stamped with a default
community_id derived from the deployment host, then drops legacy. Runs in one
transaction with an idempotency guard that refuses a non-pristine DB, so the
migrated schema is identical to a fresh 1321 DB by construction.

Verified end-to-end on a throwaway Postgres seeded with pre-1321 data: exit 0,
zero NULL community_id, search_tsv regenerated, structural diff vs fresh-1321
all-empty (constraints/triggers/indexes/columns), relay boots with zero errors,
and the previously boot-fatal allowlist->relay_members backfill succeeds.
Desktop integration suite is 99/99 green against this script's output.

See OUTBOX/1321_CUTOVER_RUNBOOK.md for the operator runbook (snapshot-is-rollback,
required constraint-level diff gate, RELAY_URL + BUZZ_AUTO_MIGRATE=false caveats).

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
0002_backfill_default_community.sql is a psql operator script (\set, \ir,
:'var' interpolation) but lived in migrations/, which sqlx::migrate! embeds
wholesale. sqlx cannot run psql meta-commands, so every fresh deploy broke:
BUZZ_AUTO_MIGRATE=true exited the relay with 'syntax error at or near "\"',
buzz-admin migrate (the just setup path) failed identically with exit 5, and
both left a wedged half-migrated DB. This also reddened the buzz-db unit test
embedded_migrator_contains_consolidated_initial_schema (asserts exactly the
consolidated 0001), which CI never ran.

The module doc and that test already declare the intended invariant: the
embedded migrator is the consolidated 0001 only; cutover/backfill is a separate
operator script, not startup migration state. Restore it by relocating the
script to scripts/cutover/1321_backfill_default_community.sql (dropping the
0002_ prefix so it never reads as sqlx state again), switching its schema
include from \i to \ir so it resolves relative to the script rather than the
operator's cwd, and adding scripts/cutover/README.md as the operator runbook.

Wire the pure (infra-free) buzz-db tests into the unit path so this class of
regression can't hide again. CI installs nextest, so just test-unit takes the
nextest branch and bypasses run-tests.sh entirely — patch BOTH paths.

Verified locally on a pristine Postgres: buzz-admin migrate -> exit 0, one
migration row (v1); relay with BUZZ_AUTO_MIGRATE=true -> migrations complete,
readiness 200, 36 tables, no wedge. buzz-db --lib goes from RED (left:2 right:1)
to green. The relocated script's \ir resolves 0001 from a non-repo cwd.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…ied push

Two pre-merge blockers found by the PR #1321 merge-gate exercise, both
multi-tenant correctness bugs under the confirmed "many tenant-hosts per
relay" topology.

Media — non-primary tenant upload (conformance row 52):
The Blossom `server`-tag check validated against a single process-global
`server_domain` derived from RELAY_URL, so only the primary host accepted
stock-CLI (`buzz upload file`) uploads; every other tenant 401'd with
`server mismatch`. Validate instead against the per-request bound tenant
host (`TenantContext::host()`), normalized through the shared
`normalize_host` rule so a `server` tag and the bound host agree across
case, trailing dot, default ports, and an optional URL scheme/path. The
upload extractor now binds the tenant before auth verification (binding
only reads the Host header, so the pre-body auth-rejection guarantee is
preserved). Removes the now-dead `MediaConfig.server_domain` field, its
RELAY_URL derivation, its three config tests, and the dead
`BUZZ_MEDIA_SERVER_DOMAIN` env line in the Helm chart.

Git — denied push published a 30618:
`run_git_at` swallowed git's non-zero exit, so `finalize_push` ran the CAS
publish and emitted a relay-signed kind:30618 ref-state event even when the
pre-receive hook declined — falsely attributing the ref state to the denied
pusher and breaking "rejected push -> no published state". Carry the
receive-pack exit status on `PackOutput.ok` and short-circuit `finalize_push`
on a non-zero exit: skip the CAS publish and the derived 30618, returning
git's in-band rejection verbatim so the client still prints the decline.

Adds a regression unit test for the normalized tenant-host server-tag match.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
The denied-push fence in 3e677c9 relied on `git receive-pack` exiting
non-zero on a pre-receive hook decline. It does not: `git receive-pack
--stateless-rpc` exits 0 even when the hook rejects every ref, reporting
the rejection only in-band as report-status (`unpack ok` then `ng
refs/... pre-receive hook declined`). So `output.status.success()` stayed
true, the publish fence never fired, and a denied push still emitted a
relay-signed kind:30618 — the exact invariant Wren's live retest caught
still failing.

Confirmed empirically against real git (2.50.1): a pre-receive hook
exiting 1 yields a server-side `receive-pack` exit of 0 and an `ng` line
in the report-status, nested inside side-band-64k (the band-1 channel
carries its own pkt-line stream).

Fix: scan the buffered report-status for an `ng` ref-status line and fold
it into `PackOutput.ok`, so `finalize_push` skips the CAS publish and the
derived kind:30618 on any rejected ref. The parser de-frames one level
into the band-1 channel (a naive band-byte-strip + split-on-\n misses the
nested inner pkt length prefix, surfacing the line as `0031ng refs/...`).
The exit-code check is retained as a belt-and-suspenders guard for genuine
subprocess failures.

Adds report-status parser tests covering the real nested side-band deny
shape, the success shape, the non-side-band shape, band-2 progress noise,
and malformed/truncated input. The prior tests passed only because they
omitted the inner pkt framing (synthetic shape != real git wire format).

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
@tlongwell-block tlongwell-block marked this pull request as ready for review June 29, 2026 14:28
npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr added 2 commits June 29, 2026 11:04
Co-authored-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: npub1t2tgm7d8f995uqvmnm8h88sg3wnpp9a5xysjf6dg3tjmgt3ltulqdp8ehr <5a968df9a7494b4e019b9ecf739e088ba61097b4312124e9a88ae5b42e3f5f3e@sprout-oss.stage.blox.sqprod.co>
@tlongwell-block tlongwell-block merged commit 14fba21 into main Jun 29, 2026
30 checks passed
@tlongwell-block tlongwell-block deleted the rewrite/relay-mt-comprehensive branch June 29, 2026 16:39
tlongwell-block pushed a commit that referenced this pull request Jun 29, 2026
Brings the merged #1321 multi-tenant relay rewrite (and other main
commits) under the chart-fix branch so #1348 renders and tests against
current main. No conflicts.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>

* origin/main:
  Multi-tenant Buzz relay: community_id as a server-resolved key (comprehensive rewrite) (#1321)
  Disable persona start while runtime discovery runs (#1353)
  chore(deps): update dependency @tanstack/react-virtual to v3.14.4 (#1342)
  Fix sidebar unread indicator placement (#1319)
  fix(desktop): un-clip hover action bar's upward bleed under content-visibility (#1354)
  Allow Huddle between 2 humans in DM (#1347)
  chore(release): release Buzz Desktop version 0.3.36 (#1352)
  Polish agent runtime cards (#1327)
  Rework desktop message-timeline scrolling: de-virtualize + native overflow-anchor (#1338)
  Keep wave huddles pending for placeholder profiles (#1349)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant