Skip to content

Intern queued query text in shared HTAB to bound DSA usage#92

Merged
serprex merged 9 commits into
mainfrom
dsa-query-intern
Jun 2, 2026
Merged

Intern queued query text in shared HTAB to bound DSA usage#92
serprex merged 9 commits into
mainfrom
dsa-query-intern

Conversation

@iskakaushik
Copy link
Copy Markdown
Collaborator

@iskakaushik iskakaushik commented May 7, 2026

Summary

  • Add a shared, partition-locked HTAB whose entries point at refcount-managed DSA bodies, and route queued query text through it. Live DSA usage drops from queued_events * query_len to distinct_live_query_texts * query_len.
  • Pattern mirrors pg_stat_statements (shared HTAB sized via hash_estimate_size + ShmemInitHash) and pgstat_shmem (refcounted DSA bodies freed only after the HTAB entry is removed).
  • Error messages stay per-event for now (separate optimization).

Why

Without interning, every queued event owned a private DSA copy of the normalized query text. With query_len ~= 2047 bytes, the DSA string pool was exhausted well before the queue itself reached capacity, especially under workloads that repeatedly executed the same long normalized query.

What changed

  • src/queue/query_intern.{h,c} — pure-C interner: 32-partition LWLock HTAB → DSA-allocated PschQueryInternObject (key + magic + bytes), refcount on entry. Acquire allocates outside the partition lock and re-checks under it; ResolveAndRelease copies bytes (caller's slot is the live reference) then drops the refcount; on last release the entry is removed and the DSA body freed outside the partition lock.
  • src/queue/shmem.cc — split shmem sizing into [state + ring + DSA] (passed to ShmemInitStruct) and the HTAB pool (allocated by ShmemInitHash from the same RequestAddinShmemSpace reservation). Request 1 + 32 LWLocks in the existing pg_stat_ch named tranche. Init the interner under AddinShmemInitLock. Replace the per-event PschDsaAllocString / PschDsaResolveString calls for query text with the new Acquire / ResolveAndRelease.
  • src/queue/psch_dsa.h — add typedef struct PschSharedState PschSharedState; so the bare type is usable from pure C.

Failure modes (best-effort telemetry preserved)

  • DSA OOM on a new body → InvalidDsaPointer, caller sets query_len = 0 (numeric data preserved). dsa_oom_count still bumped.
  • Hash full (HASH_ENTER_NULL) → free loser allocation, InvalidDsaPointer, query_len = 0.
  • Insert race won by another backend → free loser, return existing body, refcount++.
  • Hash key collision (different bytes for same (dbid, queryid, query_hash, query_len)) → treat as miss, return InvalidDsaPointer so we export empty rather than wrong SQL.

Test plan

  • t/032_query_intern.pl — drives 6000 EXECUTEs of a long normalized query through an 8MB DSA pool. Asserts enqueued >= 5000 and dsa_oom_count == 0. The same workload without interning would push ~12MB through an 8MB pool and OOM.
  • All 11 SQL regression tests pass.
  • Non-Docker TAP tests pass: 001-009, 015, 017, 020, 022.
  • ClickHouse-dependent TAP tests (010, 011, 021, 023-025, 027, 031, 016 single-cycle) verified failing locally on a clean main worktree with the same deterministic checksum error — pre-existing local container/version issue, not introduced here. CI should be authoritative.
  • 028, 029 reference a pg_stat_ch.debug_throw_in_export GUC that doesn't exist in the tree — pre-existing, unrelated.

🤖 Generated with Claude Code


Note

Medium Risk
New partitioned shared-memory hash table, refcounts, and concurrent acquire/release on the hot enqueue/dequeue path; failures drop query text but preserve numeric telemetry.

Overview
Adds a shared query-text interner so queued events no longer each allocate their own DSA copy of normalized SQL. Ring enqueue now calls PschQueryInternAcquire (refcounted shared HTAB + DSA body keyed by dbid/queryid/hash/length); dequeue uses PschQueryInternResolveAndRelease instead of per-slot query DSA alloc/free.

shmem reserves extra add-in shmem for the intern HTAB, requests 1 + 32 LWLocks in the existing pg_stat_ch tranche (queue lock + partition locks), and initializes the interner at startup. Error messages stay on the existing per-event DSA path.

New TAP test t/032_query_intern.pl piles thousands of identical long EXECUTEs into a full queue with a tight 8MB string area and asserts dsa_oom_count == 0.

Reviewed by Cursor Bugbot for commit ba3ee92. Bugbot is set up for automated code reviews on this repo. Configure here.

Copilot AI review requested due to automatic review settings May 7, 2026 19:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a shared-memory query-text interner for the pg_stat_ch queue so repeated identical normalized queries share a single DSA-backed copy, bounding DSA usage to the number of distinct live query texts instead of the number of queued events.

Changes:

  • Add a partition-locked shared HTAB + refcounted DSA objects for interned query text (query_intern.{h,c}).
  • Route queue slot query text through the interner (acquire on enqueue, resolve+release on dequeue) and adjust shmem sizing/lock tranche usage (shmem.cc).
  • Add a TAP test that stresses tight DSA settings with many repeated long EXECUTEs (t/032_query_intern.pl).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
t/032_query_intern.pl New TAP test to validate that repeated long normalized query text does not exhaust a tight DSA pool.
src/queue/shmem.cc Integrates query interning into enqueue/dequeue and adjusts shared memory sizing + LWLock tranche allocation.
src/queue/query_intern.h Declares the shared query-text interner API and documents its design.
src/queue/query_intern.c Implements partition-locked HTAB interning with refcounted DSA-backed query bodies.
src/queue/psch_dsa.h Adds a forward typedef so PschSharedState can be referenced cleanly from C translation units.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/queue/query_intern.h Outdated
Comment thread src/queue/query_intern.h Outdated
Comment thread t/032_query_intern.pl
Comment thread src/queue/query_intern.c Outdated
Copilot AI review requested due to automatic review settings May 26, 2026 01:10
@serprex serprex force-pushed the dsa-query-intern branch from 48c0153 to ed58e47 Compare May 26, 2026 01:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread src/queue/psch_dsa.h Outdated
@serprex serprex force-pushed the dsa-query-intern branch from 37fd9c5 to 79b08a5 Compare May 26, 2026 01:22
Copilot AI review requested due to automatic review settings May 26, 2026 01:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread src/queue/shmem.c Outdated
Comment thread src/queue/query_intern.h Outdated
Comment thread src/queue/query_intern.h Outdated
Comment thread src/queue/query_intern.c
@serprex serprex force-pushed the dsa-query-intern branch from 79b08a5 to fb778a7 Compare May 26, 2026 01:58
Copilot AI review requested due to automatic review settings June 1, 2026 21:51
JoshDreamland added a commit that referenced this pull request Jun 1, 2026
…yhash overlay), #101 (CI ClickHouse-test enablement) so PR #92 builds + tests against the corrected dep set
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread src/queue/query_intern.c
Comment thread src/queue/query_intern.c
hashcode = get_hash_value(psch_query_intern_htab, &key);
partition = PartitionLockFor(hashcode);

LWLockAcquire(partition, LW_EXCLUSIVE);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like doing a lock acquire during a lock release operation. We can check or assert that we are not the lock holder, but I'm torn on whether we'd want to do either, and then, which one?

Comment thread src/queue/query_intern.c

// Lost the race AND the winner stored different bytes (collision).
// Don't disturb the winner; back out and report miss.
LWLockRelease(partition);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also loathe this, but it's a really rare case (like literally never happen rare) and the handling is graceful, so I'm electing to leave it alone.

iskakaushik and others added 3 commits June 2, 2026 17:20
Without interning, every queued event owned a private DSA copy of the
normalized query text — live DSA usage grew as `queued_events * query_len`
and exhausted the bounded DSA pool well before the queue reached capacity.
Repeated long normalized queries were the worst case.

Add a shared, partition-locked HTAB whose entries point at refcount-managed
DSA bodies, and route TryEnqueueLocked / PschDequeueEvent through it for
query text. Live DSA usage drops to `distinct_live_query_texts * query_len`.
Error messages stay per-event for now (separate optimization).

The pattern mirrors pg_stat_statements (shared HTAB sized via
hash_estimate_size + ShmemInitHash) and pgstat_shmem (refcounted DSA bodies
freed only after the HTAB entry is removed).

Adds t/032_query_intern.pl: 6000 EXECUTEs of a long normalized query
through an 8MB DSA pool exit with dsa_oom_count == 0; the same workload
without interning would push ~12MB through an 8MB pool and OOM.
Copilot AI review requested due to automatic review settings June 2, 2026 17:20
@serprex serprex force-pushed the dsa-query-intern branch from 921a564 to c2331c4 Compare June 2, 2026 17:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread src/queue/query_intern.c Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 2, 2026 17:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread src/queue/query_intern.h
Comment thread src/queue/query_intern.h Outdated
Comment thread t/032_query_intern.pl
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 2, 2026 17:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread t/032_query_intern.pl Outdated
serprex and others added 2 commits June 2, 2026 17:41
…you think

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 2, 2026 17:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread t/032_query_intern.pl
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 686c413. Configure here.

Comment thread src/queue/query_intern.c
@serprex serprex merged commit 4f2ec55 into main Jun 2, 2026
12 checks passed
@serprex serprex deleted the dsa-query-intern branch June 2, 2026 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants