fix(workflows): in-place updates, deletion-forgery fix, webhook-secret & channel immutability by AaronGoldsmith · Pull Request #1340 · block/buzz

AaronGoldsmith · 2026-06-29T01:52:18Z

Summary

Workflows that were edited or deleted kept running concurrently. The root cause: the relay called create_workflow, which generated a random DB primary key on every publish, so each edit inserted a new active row (the old one kept firing) and client-side deletions never matched a row.

This PR makes the client-supplied UUID (the Nostr parameterized-replaceable-event d tag) the stable primary key, so edits update in place and deletions resolve correctly. It also fixes three security/stability issues found in review:

In-place updates — handle_workflow_def resolves the existing row by UUID and calls update_workflow instead of unconditionally inserting. No more duplicate concurrent cron jobs.
Deletion forgery (security) — NIP-09 a-tag deletion previously only checked the actor matched the coordinate pubkey, then deleted by UUID without confirming the DB row's owner. A forged 30620:<attacker>:<victim_uuid> could delete another user's workflow. Now the row is loaded and owner_pubkey is verified before deleting.
Webhook secret preservation — editing a webhook workflow regenerated its _webhook_secret, breaking existing callers. The existing secret is now extracted from the stored row and re-injected on update.
Channel immutability & info-leak prevention — channel membership is checked before any DB lookup (so unauthorized users can't probe for valid UUIDs via differing errors), and record.channel_id != Some(channel_id) rejects moving a workflow between channels, including the legacy NULL-channel case.
NIP-33 stale-write protection on the command path — workflow-def events are routed to the command executor and never reached replace_parameterized_event, so an older event delivered after a newer one (reconnect/retry) could overwrite the row with stale YAML. The lock + (created_at, id) dominance check + soft-delete is now extracted into a shared buzz_db::nip33_stale_write_guard, called by both replace_parameterized_event and persist_command_event so the two paths can't drift. Dominated events become idempotent duplicates and skip the mutation.

Changes

buzz-db/src/workflow.rs — create_workflow now takes id: Uuid instead of generating one internally.
buzz-db/src/lib.rs — extracted nip33_stale_write_guard(&mut conn, …); replace_parameterized_event now calls it (behavior unchanged) so the command path can share it.
buzz-relay/src/handlers/command_executor.rs — handle_workflow_def parses the d-tag UUID, looks up any existing row (after the membership gate), enforces owner + channel-immutability, preserves the webhook secret, and routes to update_workflow vs create_workflow. persist_command_event applies the shared NIP-33 stale-write guard for parameterized-replaceable command kinds.
buzz-relay/src/handlers/side_effects.rs — handle_a_tag_deletion loads the workflow and asserts owner_pubkey before deleting; missing rows are logged and treated as a no-op.
buzz-test-client/tests/e2e_workflows.rs — test_workflow_update_and_delete expanded to assert in-place update, webhook-secret stability, channel-change rejection, and deletion-forgery rejection (asserting the row persists).

Verification

cargo fmt --check                       # clean
cargo clippy (relay, db, test-client)   # no warnings
just test-unit                          # all passed

# Integration (against a relay built from THIS branch):
BUZZ_BIND_ADDR=0.0.0.0:3001 RELAY_URL=ws://localhost:3001 \
  BUZZ_HEALTH_PORT=8081 BUZZ_METRICS_PORT=9103 cargo run -p buzz-relay
RELAY_URL=ws://localhost:3001 \
  cargo test -p buzz-test-client --test e2e_workflows \
  test_workflow_update_and_delete -- --ignored
# test test_workflow_update_and_delete ... ok

Manual before/after: a 1-minute cron edited to v2/v3 previously left multiple duplicate jobs firing and could not be deleted; after the fix only the latest definition fires and a NIP-09 deletion stops it immediately.

Reviewer note — pre-existing duplication left out of scope

While extracting nip33_stale_write_guard, I deduped the two NIP-33 replacement copies the reviewer flagged (replace_parameterized_event + the new command path). A third copy of the FNV-1a advisory-lock-key snippet still lives in Db::replace_addressable_event (relay-signed NIP-29 group metadata, kinds 39000–39002). It predates this PR (present on main) and was intentionally left untouched here:

It keys on channel_id (the relay is the author, so pubkey isn't the natural key), so it isn't a drop-in for the same helper.
Folding it in would widen this PR's blast radius beyond the workflow fix.

Flagging it as a good standalone follow-up (factor the lock-key into a shared fnv_coordinate_lock_key and have all three call it).

…ent UUID Signed-off-by: Aaron Goldsmith <aargoldsmith@gmail.com>

…nd enforce channel immutability on updates Signed-off-by: Aaron Goldsmith <aargoldsmith@gmail.com>

…hannel_id update scenario Signed-off-by: Aaron Goldsmith <aargoldsmith@gmail.com>

AaronGoldsmith

Self-review: flagging the non-obvious / higher-blast-radius bits so they're easy to scrutinize.

AaronGoldsmith · 2026-06-29T01:52:51Z

+                match state.db.get_workflow(wf_id).await {
+                    Ok(wf) => {
+                        if wf.owner_pubkey != owner_bytes {
+                            return Err(anyhow::anyhow!("forbidden: deletion owner mismatch"));


🔒 Security (deletion forgery). NIP-09 a-tag deletion previously only checked the actor matched the coordinate pubkey, then deleted by UUID. A forged 30620:<attacker>:<victim_uuid> could delete another user's workflow. We now load the row and assert owner_pubkey before deleting. This extra lookup is the ownership gate, not a redundant query.

AaronGoldsmith · 2026-06-29T01:52:51Z

    }

+    // Check if workflow already exists to perform update or create checks
+    let existing = state.db.get_workflow(workflow_id).await;


🔍 Ordering is load-bearing (info-leak). The is_member_cached gate above (line 580) runs before this lookup on purpose, so unauthorized users can't probe for valid workflow UUIDs via differing error responses. Please don't hoist this lookup earlier for efficiency — it silently reintroduces the leak.

AaronGoldsmith · 2026-06-29T01:52:51Z

+                    "forbidden: cannot update a workflow owned by another user".into(),
+                ));
+            }
+            if record.channel_id != Some(channel_id) {


⛓️ Channel immutability + legacy NULL. record.channel_id != Some(channel_id) is deliberate: it rejects both some→some channel moves and the legacy NULL→some case. A naive unwrap/compare would let legacy global workflows be silently re-scoped.

AaronGoldsmith · 2026-06-29T01:52:51Z

-        let secret = webhook_secret::generate_webhook_secret();
-        webhook_secret::inject_secret(&mut definition_json, &secret);
-        Some(secret)
+        let existing_secret = existing_record


🔑 Webhook secret preservation. On update we re-inject the existing secret from the stored row (returning None so it isn't rotated/re-surfaced). Previously every edit regenerated _webhook_secret, breaking existing webhook callers.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e34ed4b9e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-29T01:58:00Z

+        state
+            .db
+            .update_workflow(workflow_id, &workflow_name, &definition_json_final, &hash)
+            .await
+            .map_err(|e| IngestError::Internal(format!("error: db update_workflow: {e}")))?;


Ignore stale workflow definition replays

When an older kind:30620 event for the same workflow is delivered after a newer one (for example after a reconnect/retry, while still inside the relay's timestamp window), this unconditional update_workflow overwrites the workflows table with the stale YAML. The command path uses persist_command_event rather than the NIP-33 replace_parameterized_event stale-write check, so clients will still see the newer event as the latest while the workflow engine reads and runs the older definition from the DB. Please compare the incoming event's (created_at, id) against the current live coordinate before applying the update, or route workflow defs through the same replacement logic.

Useful? React with 👍 / 👎.

🤖 Addressed in efa13ed.

The command path now goes through the same NIP-33 replacement logic as the normal store path. Rather than duplicate the check, I extracted the lock + (created_at, id) dominance test + soft-delete into a shared buzz_db::nip33_stale_write_guard(&mut conn, …) and call it from both Db::replace_parameterized_event and persist_command_event, so the two paths cannot drift.

A dominated (older/retried) event now returns PersistResult::Duplicate, so the handler skips update_workflow entirely — the stale YAML never reaches the DB, and clients no longer see a newer event than the engine runs.

Among command kinds, only KIND_WORKFLOW_DEF (30620) is parameterized-replaceable (DM/approval kinds are 41xxx/46xxx), so behavior is unchanged for the rest.

Covered by test_workflow_update_and_delete (passing against a branch-built relay). Note: Nostr created_at is second-granularity, so same-second edits are resolved by the id tie-break — documented inline where the test sleeps 1s to stay deterministic.

👍 Useful catch — thanks.

Workflow def events (kind 30620) are routed to the command executor and never reach Db::replace_parameterized_event, so persist_command_event inserted them with plain ON CONFLICT DO NOTHING and no (created_at, id) replacement check. An older event delivered after a newer one (reconnect /retry within the timestamp window) was inserted and applied, overwriting the workflows row with stale YAML while clients still saw the newer event as latest. Extract the lock + dominance check + soft-delete into a single shared buzz_db::nip33_stale_write_guard(&mut conn, ...) and call it from both Db::replace_parameterized_event and persist_command_event, so the command and normal store paths apply identical NIP-33 replacement rules and cannot drift. A dominated event is reported as an idempotent duplicate, so the handler skips its mutation. Only KIND_WORKFLOW_DEF among command kinds is parameterized-replaceable (DM/approval kinds are 41xxx/46xxx), so behavior is unchanged for the rest. Document in the e2e test why a 1s sleep is required: created_at is second-granularity and same-second edits are resolved by the id tie-break. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Aaron Goldsmith <aargoldsmith@gmail.com>

wesbillman

These comments are from my buzz agent :)

Adversarial review: I found one CHANGE blocker.

CHANGE: workflow deletion removes the DB row but leaves the live kind:30620 definition event queryable

The PR fixes the DB-side workflow delete path, but the NIP-09 a-tag workflow branch still does not soft-delete the parameterized-replaceable event row. In handle_a_tag_deletion, the KIND_WORKFLOW_DEF UUID branch loads the workflow, checks owner, and calls state.db.delete_workflow(wf_id), but then stops there (crates/buzz-relay/src/handlers/side_effects.rs). The generic NIP-33 soft_delete_by_coordinate branch is explicitly skipped for workflow defs.

That means an authorized workflow deletion can stop the scheduler/webhook DB record, but the latest kind:30620 definition event remains live in events.

Why this matters:

Desktop/CLI workflow reads are event-backed, not DB-backed. Desktop get_channel_workflows queries relay events with kinds: [30620], "#h": [channel_id], and get_workflow queries kinds: [30620], "#d": [workflow_id] (desktop/src-tauri/src/commands/workflows.rs). CLI does the same (crates/buzz-cli/src/commands/workflows.rs).
Since the workflow definition event is not soft-deleted, a deleted workflow can still appear in the workflows list/detail after refresh even though the DB row is gone.
The new e2e coverage only checks db.get_workflow(workflow_id) returns NotFound after delete, so it misses the client-facing read model regression.

Suggested fix: when a workflow delete is authorized, also soft-delete the live kind:30620 coordinate (KIND_WORKFLOW_DEF, owner_pubkey, d_tag) inside the workflow branch (or fall through/share the generic NIP-33 coordinate delete after the owner check). Then add a regression assertion that a relay query for kind:30620 + #d=<workflow_id> returns no live definition after authorized deletion.

Otherwise

The in-place update model, owner/channel checks before update, webhook secret preservation, and shared NIP-33 stale-write guard are the right direction. I specifically like moving workflow-def command events onto the same (created_at, id) dominance rule as normal parameterized-replaceable events.

Verdict: Request changes until workflow deletion also removes the live definition event from event queries.

After delete_workflow removes the DB row, also call soft_delete_by_coordinate so relay REQs stop returning the live kind:30620 event. Clients (Desktop/CLI) read workflows from events, not the DB, so a DB-only delete left deleted workflows visible. Adds regression assertion in test_workflow_update_and_delete: after authorized NIP-09 a-tag delete, a relay REQ for kind:30620 + #d=<id> must return no events. Fixes: wesbillman CHANGES_REQUESTED blocker on PR block#1340 Co-authored-by: Aaron Goldsmith <aargoldsmith@gmail.com> Signed-off-by: Aaron Goldsmith <aargoldsmith@gmail.com>

The update_workflow SQL only set name/definition/definition_hash, leaving updated_at frozen at create time. There is no DB trigger to auto-update it, so DB consumers and diagnostics always saw stale modification times. Add updated_at = NOW() to the SET clause. Co-authored-by: Aaron Goldsmith <aargoldsmith@gmail.com> Signed-off-by: Aaron Goldsmith <aargoldsmith@gmail.com> Signed-off-by: Aaron Goldsmith <aarong@squareup.com> Co-authored-by: Codex <noreply@openai.com>

AaronGoldsmith added 3 commits June 28, 2026 18:26

fix(workflows): update workflows in-place and support deletion by cli…

67f8c53

…ent UUID Signed-off-by: Aaron Goldsmith <aargoldsmith@gmail.com>

fix(workflows): resolve deletion forgery, preserve webhook secrets, a…

e6956dd

…nd enforce channel immutability on updates Signed-off-by: Aaron Goldsmith <aargoldsmith@gmail.com>

security(workflows): prevent information leak and cover legacy NULL c…

9e34ed4

…hannel_id update scenario Signed-off-by: Aaron Goldsmith <aargoldsmith@gmail.com>

AaronGoldsmith commented Jun 29, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 29, 2026

View reviewed changes

AaronGoldsmith force-pushed the bugfix/workflow-update-delete branch from 8889232 to efa13ed Compare June 29, 2026 02:15

wesbillman requested changes Jun 29, 2026

View reviewed changes

npub15xw7rrn3u0cy0uq76ze2x9xvd99ezrcpjerhstttl8sqq23rnnsqu75v77 and others added 2 commits June 29, 2026 07:54

AaronGoldsmith requested a review from wesbillman June 29, 2026 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(workflows): in-place updates, deletion-forgery fix, webhook-secret & channel immutability#1340

fix(workflows): in-place updates, deletion-forgery fix, webhook-secret & channel immutability#1340
AaronGoldsmith wants to merge 6 commits into
block:mainfrom
AaronGoldsmith:bugfix/workflow-update-delete

AaronGoldsmith commented Jun 29, 2026 •

edited

Loading

Uh oh!

AaronGoldsmith left a comment

Uh oh!

AaronGoldsmith Jun 29, 2026

Uh oh!

AaronGoldsmith Jun 29, 2026

Uh oh!

AaronGoldsmith Jun 29, 2026

Uh oh!

AaronGoldsmith Jun 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Uh oh!

AaronGoldsmith Jun 29, 2026

Uh oh!

wesbillman left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

AaronGoldsmith commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Verification

Uh oh!

AaronGoldsmith left a comment

Choose a reason for hiding this comment

Uh oh!

AaronGoldsmith Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

AaronGoldsmith Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

AaronGoldsmith Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

AaronGoldsmith Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

AaronGoldsmith Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

wesbillman left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

CHANGE: workflow deletion removes the DB row but leaves the live kind:30620 definition event queryable

Otherwise

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AaronGoldsmith commented Jun 29, 2026 •

edited

Loading

wesbillman left a comment •

edited

Loading