Skip to content

etl: add full entity manager handler suite with prod parity#200

Merged
raymondjacobson merged 43 commits intomainfrom
rj/etl-parity-and-block-compat
Apr 4, 2026
Merged

etl: add full entity manager handler suite with prod parity#200
raymondjacobson merged 43 commits intomainfrom
rj/etl-parity-and-block-compat

Conversation

@raymondjacobson
Copy link
Copy Markdown
Contributor

@raymondjacobson raymondjacobson commented Apr 4, 2026

Summary

Implements the complete set of entity manager handlers for the Go ETL indexer, matching the Python discovery-provider's behavior. This enables running the Go ETL against a production database clone to validate parity.

Entity Manager Handlers (53 total)

  • User: Create, Update, Verify, Mute, Unmute
  • Track: Create, Update, Delete, Download, Mute, Unmute
  • Playlist: Create, Update, Delete
  • Social: Follow/Unfollow, Save/Unsave, Repost/Unrepost, Subscribe/Unsubscribe, Share
  • Comment: Create, Update, Delete, React/Unreact, Pin/Unpin, Report, Mute/Unmute
  • Notification: Create, View, PlaylistSeen
  • DeveloperApp: Create, Update, Delete
  • Grant: Create, Delete, Approve, Reject
  • Event: Create, Update, Delete
  • Email: EncryptedEmail Create, EmailAccess Update
  • AssociatedWallet: Create, Delete
  • DashboardWalletUser: Create, Delete
  • Tip: Reaction

Indexer Improvements

  • Sequential em_block assignment matching Python's pattern (only for blocks with EM txs)
  • Auto-resume from core_indexed_blocks / etl_blocks — no --start flag needed on prod clones
  • Block hash threading through all handlers
  • Sequential transaction processing within blocks (prevents race conditions)
  • Block prefetcher for overlapping RPC fetch with DB processing
  • Improved logging: WARN for validation rejections, throughput stats, -v flag for debug

Prod Parity Fixes

  • Relaxed entity existence checks to match Python (is_current only, not is_delete)
  • All migration index names match production schema exactly
  • FanClub entity type support for comments
  • Removed ON CONFLICT on subscriptions (prod table has no unique constraint)

Parity Testing Tooling

  • pkg/etl/parity/ — cross-DB compare tool for validating Go vs Python output
  • 15 new migrations for domain tables matching production schema
  • Comprehensive test suite for all handlers

Test plan

  • go build ./... and go test ./... pass
  • ETL runs against production database clone, processes blocks and EM transactions
  • Compare tool validates field-level parity against live production database
  • Let ETL run for extended period, review validation rejection rates

🤖 Generated with Claude Code

raymondjacobson and others added 30 commits March 31, 2026 11:35
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement remaining entity manager handlers for indexing parity:

- Playlist Create/Update/Delete with route generation
- Follow/Unfollow, Save/Unsave, Repost/Unrepost
- DeveloperApp Create/Update/Delete
- Grant Create/Delete/Approve/Reject

Each handler follows the existing pattern: stateless validation,
stateful validation, domain table write. All handlers have tests
covering success, duplicate/conflict rejection, and ownership checks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…er dispatch

Discovery-provider dispatches (User, Mute) and (User, Unmute), not (MutedUser, Mute/Unmute).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…in/Report/Mute/Unmute)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Mute/Unmute handlers

Track Mute/Unmute dispatches as (Track, Mute/Unmute) and writes to
comment_notification_settings with entity_type from the transaction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Social features (Follow, Save, Repost) receive the entity being acted on as
entity_type (User, Track, Playlist), not the action name. Add EntityTypeAny
wildcard with dispatcher fallback so these handlers match any entity_type for
their action. Derive save/repost type from params.EntityType first, matching
Python discovery-provider behavior.

Follow/Unfollow now also creates/deletes a Subscription record to match
Python parity (action_to_record_types maps Follow -> [Follow, Subscription]).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Standalone subscription handlers for explicit subscribe/unsubscribe actions.
Note: Follow/Unfollow also implicitly create subscription records (see
social_follow.go) matching Python parity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Share records content shares (track/playlist/album). Unlike Save/Repost,
shares allow duplicates and have no Unshare counterpart.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Records track downloads with optional geo metadata (city, region, country).
Resolves parent_track_id for stems and deduplicates by txhash.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Event handlers support remix contests and general events with end_date
validation, ownership checks, and remix contest rules (no duplicate
active contests, no contests on remix tracks).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EncryptedEmail (AddEmail) stores encrypted user emails with initial
access grants. EmailAccess (Update) enables chain-of-custody access
delegation where grantors must already have access to grant others.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dashboard wallet users link external wallets to Audius user accounts.
Supports dual-signer authorization (either user or wallet can sign).
Full ETH ecrecover signature verification is TODO pending go-ethereum
dependency addition to the ETL module.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tip reaction handler (TIP + Update) records emoji reactions to tips.
Looks up tip sender via user_tips table signature and records a reaction
with validated value (1-4).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Associated wallets link external ETH/SOL wallets to user accounts.
Enforces exclusive ownership (one user per wallet per chain).
Full chain-specific signature verification (ecrecover/ed25519) is TODO
pending crypto dependency additions to the ETL module.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a CLI tool for comparing Go ETL output against existing
discovery-provider data. Three commands:
- snapshot: captures baseline row counts and max block height
- diff: compares domain table growth, validates structural integrity
- cleanup: drops parity metadata

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wrap etl_proof_status enum creation in DO/EXCEPTION block
- Add DROP TRIGGER IF EXISTS before CREATE TRIGGER
- Add DROP MATERIALIZED VIEW IF EXISTS before CREATE MATERIALIZED VIEW
- Name materialized view indexes for IF NOT EXISTS support

This ensures migrations can run safely against databases where
these objects already exist (e.g., production clones).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Insert into blocks and core_indexed_blocks tables for each chain block
  so domain table FK constraints (blocknumber → blocks.number) are satisfied
- Compute em_block offset from core_indexed_blocks at startup to continue
  the existing blocks.number sequence on production databases
- Add 0015 migration for core_indexed_blocks table (idempotent)
- Fix parity snapshot to use core_indexed_blocks.height (chain height)
  instead of blocks.number (Python indexer numbering)
- Add INFO-level progress logging every 10 seconds
- Update FEATURE.md with all 53 handlers, migration list, block numbering
  docs, parity testing workflow, and chain rollover limitation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lace

Unwrap nested "data" envelope in ManageEntity metadata (production format
is {"cid":"...", "data": {actual fields}}). Convert all update/delete
handlers from markNotCurrent+INSERT to UPDATE since is_current is legacy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diff now resolves em_block from core_indexed_blocks for domain table
queries. Added cleanup.sql for resetting test data by em_block/chain_height.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add parity compare subcommand that connects to both ETL clone and
production DB, comparing rows field-by-field with prod-ahead skipping
and known-divergence tracking. Add ValidateAccessConditions matching
Python's gating rules (stem gating, USDC splits, stream/download parity).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Decouples block fetching from indexing by running a prefetcher goroutine
that fills a buffered channel (50 blocks ahead). The indexer reads from
the channel instead of making synchronous RPC calls, so RPC latency and
DB writes overlap. Progress log now shows prefetch buffer depth.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…mments

Set VerifiedAddress default to 0xbeef8E42e8B5964fDD2b7ca8efA0d9aef38AA996.
Replace COALESCE-to-empty-string with proper NULL handling via
sql.NullString and *string throughout user handlers. Simplify
getCurrentUserForVerify to only fetch verification fields. Remove all
comments referencing discovery-provider.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…st social

Add 13 unit tests for ValidateAccessConditions covering all validation
rules (stem gating, USDC splits, condition count, stream/download parity).
Add DB-backed tests for track create/update with gating rejection,
release date handling, playlist save/repost types, album saves, and
user nullable field preservation through create and update.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
raymondjacobson and others added 8 commits April 3, 2026 23:29
The cleanup script caused data loss (deleted users can't be recovered),
and snapshot/diff are redundant given the field-by-field compare tool.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Index names in our migrations differed from production (01_schema.sql),
causing CREATE INDEX IF NOT EXISTS to build new indexes on large tables
instead of skipping. This made migrations hang on prod database clones.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Downloads are events that happened — they should be recorded even if
the track was later deleted. Use trackExists (checks is_current only)
instead of trackExistsActive (which also checks is_delete = false).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…not is_delete)

Python's discovery indexer only checks is_current=True for entity existence,
never is_delete. Our Go handlers were overly strict, rejecting valid operations
on soft-deleted entities. Also adds FanClub entity type support for comments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resume now falls back to core_indexed_blocks.height (chain height) instead
of blocks.number when etl_blocks is empty. Offset computation handles NULL
em_block by deriving it from MAX(blocks.number) - MAX(chain height).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… txs

Python only assigns a blocks.number (em_block) for chain blocks that contain
ManageEntity transactions, incrementing sequentially. Our Go ETL was using a
constant offset applied to every chain block, causing number divergence.

Now matches Python exactly:
- Scan each block for EM txs before assigning em_block
- Only increment and insert into blocks table when EM txs present
- Set parenthash and is_current matching Python's pattern
- Write em_block=NULL to core_indexed_blocks for non-EM blocks
- Resume from MAX(blocks.number) on startup

Also fixes compare tool to derive the boundary from core_indexed_blocks
instead of the removed _parity_meta table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…unique constraint)

Production's subscriptions table has no PK or unique constraint, so
ON CONFLICT DO NOTHING fails. The follow handler already validates
for duplicates before inserting, making the clause unnecessary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use etl_blocks (Go-only table) to find Go ETL's chain height range
- Determine prod cutoff by mapping Go's max chain height to prod's
  em_block via prod's core_indexed_blocks
- Only skip entities that prod modified beyond the shared window

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@raymondjacobson raymondjacobson merged commit 75dc4b4 into main Apr 4, 2026
6 checks passed
@raymondjacobson raymondjacobson deleted the rj/etl-parity-and-block-compat branch April 4, 2026 09:03
raymondjacobson added a commit that referenced this pull request Apr 4, 2026
Generated files were out of sync after the entity manager handler suite
was added in #200.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant