Intern queued query text in shared HTAB to bound DSA usage#92
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a shared-memory query-text interner for the pg_stat_ch queue so repeated identical normalized queries share a single DSA-backed copy, bounding DSA usage to the number of distinct live query texts instead of the number of queued events.
Changes:
- Add a partition-locked shared HTAB + refcounted DSA objects for interned query text (
query_intern.{h,c}). - Route queue slot query text through the interner (acquire on enqueue, resolve+release on dequeue) and adjust shmem sizing/lock tranche usage (
shmem.cc). - Add a TAP test that stresses tight DSA settings with many repeated long EXECUTEs (
t/032_query_intern.pl).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| t/032_query_intern.pl | New TAP test to validate that repeated long normalized query text does not exhaust a tight DSA pool. |
| src/queue/shmem.cc | Integrates query interning into enqueue/dequeue and adjusts shared memory sizing + LWLock tranche allocation. |
| src/queue/query_intern.h | Declares the shared query-text interner API and documents its design. |
| src/queue/query_intern.c | Implements partition-locked HTAB interning with refcounted DSA-backed query bodies. |
| src/queue/psch_dsa.h | Adds a forward typedef so PschSharedState can be referenced cleanly from C translation units. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| hashcode = get_hash_value(psch_query_intern_htab, &key); | ||
| partition = PartitionLockFor(hashcode); | ||
|
|
||
| LWLockAcquire(partition, LW_EXCLUSIVE); |
There was a problem hiding this comment.
I really don't like doing a lock acquire during a lock release operation. We can check or assert that we are not the lock holder, but I'm torn on whether we'd want to do either, and then, which one?
|
|
||
| // Lost the race AND the winner stored different bytes (collision). | ||
| // Don't disturb the winner; back out and report miss. | ||
| LWLockRelease(partition); |
There was a problem hiding this comment.
I also loathe this, but it's a really rare case (like literally never happen rare) and the handling is graceful, so I'm electing to leave it alone.
Without interning, every queued event owned a private DSA copy of the normalized query text — live DSA usage grew as `queued_events * query_len` and exhausted the bounded DSA pool well before the queue reached capacity. Repeated long normalized queries were the worst case. Add a shared, partition-locked HTAB whose entries point at refcount-managed DSA bodies, and route TryEnqueueLocked / PschDequeueEvent through it for query text. Live DSA usage drops to `distinct_live_query_texts * query_len`. Error messages stay per-event for now (separate optimization). The pattern mirrors pg_stat_statements (shared HTAB sized via hash_estimate_size + ShmemInitHash) and pgstat_shmem (refcounted DSA bodies freed only after the HTAB entry is removed). Adds t/032_query_intern.pl: 6000 EXECUTEs of a long normalized query through an 8MB DSA pool exit with dsa_oom_count == 0; the same workload without interning would push ~12MB through an 8MB pool and OOM.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…you think Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 686c413. Configure here.

Summary
queued_events * query_lentodistinct_live_query_texts * query_len.pg_stat_statements(shared HTAB sized viahash_estimate_size+ShmemInitHash) andpgstat_shmem(refcounted DSA bodies freed only after the HTAB entry is removed).Why
Without interning, every queued event owned a private DSA copy of the normalized query text. With
query_len ~= 2047bytes, the DSA string pool was exhausted well before the queue itself reached capacity, especially under workloads that repeatedly executed the same long normalized query.What changed
src/queue/query_intern.{h,c}— pure-C interner: 32-partition LWLock HTAB → DSA-allocatedPschQueryInternObject(key + magic + bytes), refcount on entry.Acquireallocates outside the partition lock and re-checks under it;ResolveAndReleasecopies bytes (caller's slot is the live reference) then drops the refcount; on last release the entry is removed and the DSA body freed outside the partition lock.src/queue/shmem.cc— split shmem sizing into[state + ring + DSA](passed toShmemInitStruct) and the HTAB pool (allocated byShmemInitHashfrom the sameRequestAddinShmemSpacereservation). Request1 + 32LWLocks in the existingpg_stat_chnamed tranche. Init the interner underAddinShmemInitLock. Replace the per-eventPschDsaAllocString/PschDsaResolveStringcalls for query text with the newAcquire/ResolveAndRelease.src/queue/psch_dsa.h— addtypedef struct PschSharedState PschSharedState;so the bare type is usable from pure C.Failure modes (best-effort telemetry preserved)
InvalidDsaPointer, caller setsquery_len = 0(numeric data preserved).dsa_oom_countstill bumped.HASH_ENTER_NULL) → free loser allocation,InvalidDsaPointer,query_len = 0.refcount++.(dbid, queryid, query_hash, query_len)) → treat as miss, returnInvalidDsaPointerso we export empty rather than wrong SQL.Test plan
t/032_query_intern.pl— drives 6000 EXECUTEs of a long normalized query through an 8MB DSA pool. Assertsenqueued >= 5000anddsa_oom_count == 0. The same workload without interning would push ~12MB through an 8MB pool and OOM.001-009,015,017,020,022.010,011,021,023-025,027,031,016single-cycle) verified failing locally on a cleanmainworktree with the same deterministic checksum error — pre-existing local container/version issue, not introduced here. CI should be authoritative.028,029reference apg_stat_ch.debug_throw_in_exportGUC that doesn't exist in the tree — pre-existing, unrelated.🤖 Generated with Claude Code
Note
Medium Risk
New partitioned shared-memory hash table, refcounts, and concurrent acquire/release on the hot enqueue/dequeue path; failures drop query text but preserve numeric telemetry.
Overview
Adds a shared query-text interner so queued events no longer each allocate their own DSA copy of normalized SQL. Ring enqueue now calls
PschQueryInternAcquire(refcounted shared HTAB + DSA body keyed by dbid/queryid/hash/length); dequeue usesPschQueryInternResolveAndReleaseinstead of per-slot query DSA alloc/free.shmemreserves extra add-in shmem for the intern HTAB, requests 1 + 32 LWLocks in the existingpg_stat_chtranche (queue lock + partition locks), and initializes the interner at startup. Error messages stay on the existing per-event DSA path.New TAP test
t/032_query_intern.plpiles thousands of identical long EXECUTEs into a full queue with a tight 8MB string area and assertsdsa_oom_count == 0.Reviewed by Cursor Bugbot for commit ba3ee92. Bugbot is set up for automated code reviews on this repo. Configure here.