Skip to content

Fix ORDER BY with named aggregates that expand to subqueries#34

Open
nicosuave wants to merge 42 commits into
mainfrom
nicosuave/fix-orderby-subquery
Open

Fix ORDER BY with named aggregates that expand to subqueries#34
nicosuave wants to merge 42 commits into
mainfrom
nicosuave/fix-orderby-subquery

Conversation

@nicosuave
Copy link
Copy Markdown
Member

Summary

Fixes #28. When AGGREGATE(...) AT (...) expressions expand to correlated subqueries and are referenced by alias in ORDER BY, DuckDB errors with "Alias has subquery - not yet supported." This wraps the inner query in a subquery (SELECT * FROM (...) _q ORDER BY ...) when ORDER BY references a subquery-aliased SELECT item, so aliases resolve as plain column references.

The wrapping is targeted: it only activates when ORDER BY actually references a subquery alias (via word-boundary matching), avoiding breakage for queries that use qualified column names like o.prodName in ORDER BY.

Adds 3 integration tests and 3 unit tests covering expression-based ORDER BY (revenue/year_total), simple alias ORDER BY, and DESC ordering.

When AGGREGATE(...) AT (...) expressions expand to correlated subqueries
and are referenced by alias in ORDER BY, DuckDB errors with "Alias has
subquery - not yet supported." Fix by wrapping the inner query in a
subquery when ORDER BY references a subquery-aliased SELECT item.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a1b14ae6ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread yardstick-rs/src/sql/measures.rs Outdated
Comment on lines +5174 to +5175
let select_start = upper.find("SELECT").map(|p| p + 6).unwrap_or(0);
let select_text = &sql[select_start..from_pos];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Find outer SELECT before extracting subquery aliases

extract_subquery_aliases_from_select uses upper.find("SELECT"), which picks the first SELECT anywhere in the SQL text rather than the outer query's SELECT. For statements with CTEs (for example WITH cte AS (SELECT ...) SELECT ..., (SELECT ...) AS year_total ... ORDER BY year_total), this slices the wrong SELECT list, so year_total is missed (or aliases from the CTE are picked instead). In that case needs_wrap stays false and the original DuckDB Alias has subquery error still occurs for ORDER BY on the subquery alias.

Useful? React with 👍 / 👎.

Fixes a bug where extract_subquery_aliases_from_select would match a
SELECT inside a CTE rather than the outer query's SELECT, causing the
ORDER BY wrapping to be skipped for CTE-based queries.
@nicosuave
Copy link
Copy Markdown
Member Author

@codex review

1 similar comment
@nicosuave
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e25309ea64

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread yardstick-rs/src/sql/measures.rs Outdated
Comment on lines +5262 to +5266
while let Some(pos) = text_upper[search_from..].find(&ident_upper) {
let abs = search_from + pos;
let before_ok = abs == 0 || {
let c = text.as_bytes()[abs - 1];
!c.is_ascii_alphanumeric() && c != b'_' && c != b'.'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ignore literals/comments when detecting ORDER BY alias refs

identifier_appears_in does a raw substring scan and boundary check, so alias names inside string literals/comments are treated as real identifier references. In that case needs_wrap becomes true and the query is wrapped, which can invalidate otherwise-valid ORDER BY expressions that rely on original table qualifiers (for example o.col) because those qualifiers no longer exist in the outer SELECT * FROM (...) _q. This creates a regression for queries where the alias text appears only as literal/comment text.

Useful? React with 👍 / 👎.

Comment thread yardstick-rs/src/sql/measures.rs Outdated
Comment on lines +5185 to +5189
b'\'' => {
// Skip single-quoted strings
let mut j = i + 1;
while j < select_text.len() {
if bytes[j] == b'\'' {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Advance past quoted strings when scanning SELECT items

In the single-quote branch, the scanner computes j to skip a string literal but never updates the loop index, so commas inside '...' are still processed as top-level separators. That can split one SELECT item into fragments and hide a subquery alias (e.g. (SELECT 1) || 'a,b' AS x), preventing the wrap logic from triggering and leaving the original DuckDB alias/subquery error unfixed.

Useful? React with 👍 / 👎.

Fixes two issues from PR review:
1. extract_subquery_aliases_from_select did not advance past quoted
   strings, so commas inside literals could split SELECT items
2. identifier_appears_in matched alias names inside string literals
   and comments, potentially triggering unnecessary query wrapping
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50e133e095

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread yardstick-rs/src/sql/measures.rs Outdated
b'(' => depth += 1,
b')' => depth -= 1,
_ if depth == 0 => {
if i + 4 <= item.len() && upper[i..].starts_with(" AS ") {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid UTF-8 boundary panics in alias AS scan

subquery_alias_from_item walks item by byte index and slices upper[i..] at each top-level byte. If a multibyte character appears outside parentheses (for example AGGREGATE(...) AT (...) || 'é' AS year_total), i can point into the middle of a UTF-8 code point and panic with byte index ... is not a char boundary. This path is executed for any query with top-level ORDER BY, so queries that previously executed can now crash before SQL reaches DuckDB.

Useful? React with 👍 / 👎.

Comment thread yardstick-rs/src/sql/measures.rs Outdated
Comment on lines +5269 to +5271
let c = stripped.as_bytes()[abs - 1];
!c.is_ascii_alphanumeric() && c != b'_' && c != b'.'
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exclude quoted qualified names from alias detection

The before_ok boundary check only rejects matches when the immediate previous byte is .. For ORDER BY o."year_total", the previous byte at year_total is ", so this function incorrectly treats it as an unqualified alias reference. That sets needs_wrap = true, and wrapping to SELECT * FROM (...) _q removes table alias o, causing valid qualified ORDER BY clauses to fail with missing-table binder errors.

Useful? React with 👍 / 👎.

- Use char_indices() in subquery_alias_from_item to avoid slicing
  at invalid UTF-8 boundaries when multibyte characters appear
- Reject matches preceded by " or ` in identifier_appears_in to
  avoid treating o."year_total" as an unqualified alias reference
Keep both ORDER BY subquery wrap tests (#28) and expression dimension
tests (#29).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c4bb0a6d89

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread yardstick-rs/src/sql/measures.rs Outdated
'(' => depth += 1,
')' => depth -= 1,
_ if depth == 0 && ch == ' ' => {
if upper[i..].starts_with(" AS ")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid indexing upper with byte offsets from item

Fresh evidence: although this now uses char_indices(), i comes from item while the slice is taken from upper = item.to_uppercase(), whose byte length can differ for Unicode case mappings. For inputs like (SELECT 1) || 'ıı…' AS year_total, upper becomes shorter and upper[i..] can panic with an out-of-bounds/char-boundary error before SQL reaches DuckDB, so a valid query can crash the process.

Useful? React with 👍 / 👎.

Comment thread yardstick-rs/src/sql/measures.rs Outdated
let abs = search_from + pos;
let before_ok = abs == 0 || {
let c = stripped.as_bytes()[abs - 1];
!c.is_ascii_alphanumeric() && c != b'_' && c != b'.' && c != b'"' && c != b'`'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match unqualified quoted aliases in ORDER BY detection

The boundary check rejects matches when the preceding byte is " or `, which avoids o."alias" false positives but also drops legitimate unqualified references like ORDER BY "year_total" (or backtick-quoted equivalents). In that case needs_wrap stays false, so queries that alias subquery expressions with quoted identifiers still hit DuckDB's Alias has subquery failure instead of being wrapped.

Useful? React with 👍 / 👎.

- subquery_alias_from_item: compare " AS " directly on item bytes
  instead of slicing the uppercased string, which can have different
  byte offsets for certain Unicode case mappings
- identifier_appears_in: only reject " and ` before a match when
  preceded by . (qualified reference like o."alias"), allowing bare
  quoted identifiers like "year_total" to match correctly
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e630dbb883

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if before_ok && after_ok {
return true;
}
search_from = abs + 1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Advance identifier scan by char boundary

identifier_appears_in advances with search_from = abs + 1, but abs is a byte offset in UTF-8. With non-ASCII aliases (for example AS "é"), a non-boundary match attempt (like scanning ORDER BY té) can set search_from into the middle of a multibyte character, and the next stripped_upper[search_from..] slice panics. This makes otherwise valid queries crash during SQL expansion instead of returning a planner error/result.

Useful? React with 👍 / 👎.

Comment thread yardstick-rs/src/sql/measures.rs Outdated
')' => depth -= 1,
' ' if depth == 0 => {
if item[i..].len() >= 4
&& item[i..].as_bytes()[1..4].eq_ignore_ascii_case(b"AS ")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse AS aliases with general whitespace

subquery_alias_from_item only recognizes aliases when the token pattern is exactly " AS ", so valid SQL formatting like (SELECT ...) AS\nyear_total or (SELECT ...) AS\tyear_total is missed. In those cases the subquery alias is not collected, needs_wrap stays false, and ORDER BY on that alias still hits DuckDB's Alias has subquery failure that this patch is meant to avoid.

Useful? React with 👍 / 👎.

- identifier_appears_in: advance search_from to the next valid char
  boundary after a non-matching hit, preventing panics on non-ASCII text
- subquery_alias_from_item: match AS with any surrounding whitespace
  (tabs, newlines, multiple spaces), not just single spaces
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d6f736c00

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread yardstick-rs/src/sql/measures.rs Outdated
Comment on lines +5527 to +5528
if !item.to_uppercase().contains("(SELECT ") {
return None;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Relax subquery detection beyond literal "(SELECT "

subquery_alias_from_item only proceeds when the SELECT item text contains the exact substring "(SELECT ", so valid scalar subqueries formatted as ( SELECT ...), (SELECT\n...), or (WITH ... SELECT ...) are skipped. In those cases the alias is never collected, needs_wrap stays false, and ORDER BY on that alias still hits DuckDB’s "Alias has subquery" error this patch is meant to avoid.

Useful? React with 👍 / 👎.

Comment thread yardstick-rs/src/sql/measures.rs Outdated
Comment on lines +5588 to +5592
while let Some(pos) = stripped_upper[search_from..].find(&ident_upper) {
let abs = search_from + pos;
let before_ok = abs == 0 || {
let c = stripped.as_bytes()[abs - 1];
if c == b'"' || c == b'`' {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid matching aliases in ORDER BY keyword contexts

The alias check is a raw whole-word search, so aliases that are also SQL tokens (for example year) are treated as references when they appear in syntax like EXTRACT(YEAR FROM o.ts). That false positive sets needs_wrap = true and rewrites to SELECT * FROM (...) _q ORDER BY ...; after wrapping, qualified references such as o.ts are no longer valid and previously-working queries can fail with binder errors.

Useful? React with 👍 / 👎.

- item_has_subquery: detect subqueries with whitespace after opening
  paren and CTE-style (WITH ... SELECT ...) patterns
- identifier_appears_in: track parenthesis depth and only match at
  top level, preventing false positives like EXTRACT(YEAR FROM ...)
  triggering unnecessary query wrapping
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ec199d84b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread yardstick-rs/src/sql/measures.rs Outdated
Comment on lines +5627 to +5629
// Only match at top level (outside parentheses)
let at_top_level = depths[abs] == 0;
let before_ok = abs == 0 || {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Match aliases inside parenthesized ORDER BY expressions

identifier_appears_in only considers matches when depths[abs] == 0, so alias references nested in parentheses are ignored. That means common ORDER BY forms like COALESCE(year_total, 0) or (year_total) will not trigger wrapping even when year_total is a scalar-subquery alias, and DuckDB will still raise the same Alias has subquery error this patch is intended to avoid.

Useful? React with 👍 / 👎.

Comment thread yardstick-rs/src/sql/measures.rs Outdated
Comment on lines +5570 to +5573
if i + 2 <= bytes.len()
&& bytes[i..i + 2].eq_ignore_ascii_case(b"AS")
&& (i + 2 >= bytes.len() || bytes[i + 2].is_ascii_whitespace())
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Support subquery aliases declared without AS

subquery_alias_from_item only records aliases when it finds the AS token pattern, but SQL allows select-item aliases without AS (e.g. (SELECT ...) year_total). In that valid form, the subquery alias is never collected, needs_wrap stays false, and ORDER BY year_total still hits DuckDB's subquery-alias planner error.

Useful? React with 👍 / 👎.

- identifier_appears_in: remove depth=0 restriction so aliases inside
  COALESCE(year_total, 0) or (year_total) are correctly detected
- subquery_alias_from_item: detect implicit aliases without AS keyword,
  e.g. (SELECT ...) year_total
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f83ac55ebb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread yardstick-rs/src/sql/measures.rs Outdated
Comment on lines +5536 to +5540
if j + 6 <= bytes.len() && bytes[j..j + 6].eq_ignore_ascii_case(b"SELECT") {
return true;
}
if j + 4 <= bytes.len() && bytes[j..j + 4].eq_ignore_ascii_case(b"WITH") {
return true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require token boundaries for SELECT/WITH subquery checks

item_has_subquery treats any parenthesized text that starts with SELECT or WITH as a subquery, but it does not verify a keyword boundary. Expressions like (selected_value) AS x or (with_tax) AS x are therefore misclassified as subqueries, so expand_aggregate_with_at can wrap the whole query when ORDER BY x is present. That unnecessary wrap changes name resolution context and can break otherwise-valid qualified sort keys (for example ORDER BY x, o.col) because o is no longer visible outside SELECT * FROM (...) _q.

Useful? React with 👍 / 👎.

item_has_subquery now verifies the character after SELECT/WITH is not
alphanumeric or underscore, preventing identifiers like selected_value
or with_tax from being misclassified as subquery keywords.
@nicosuave
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d430c8a954

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread yardstick-rs/src/sql/measures.rs Outdated
Comment on lines +5531 to +5538
if bytes[i] == b'(' {
let mut j = i + 1;
while j < bytes.len() && bytes[j].is_ascii_whitespace() {
j += 1;
}
if j + 6 <= bytes.len()
&& bytes[j..j + 6].eq_ignore_ascii_case(b"SELECT")
&& (j + 6 >= bytes.len() || !bytes[j + 6].is_ascii_alphanumeric() && bytes[j + 6] != b'_')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip literals/comments when detecting subquery aliases

item_has_subquery scans raw bytes and treats any ("SELECT"|"WITH") sequence as a real subquery, even when it appears inside a string literal or comment. That makes extract_subquery_aliases_from_select misclassify non-subquery SELECT items as subquery aliases, which can trigger the new outer wrapping path unnecessarily. In aggregate queries this can turn previously valid ORDER BY clauses (especially ones that also reference qualified columns like t.col) into binder errors after wrapping because the original table alias is no longer in scope.

Useful? React with 👍 / 👎.

item_has_subquery now skips single-quoted strings, line comments, and
block comments so that text like '(SELECT ...)' inside a literal is
not misclassified as a real subquery.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7740e275cb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +5602 to +5604
b'(' => depth += 1,
b')' => {
depth -= 1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip string/comment parens in subquery alias scan

subquery_alias_from_item updates depth for every ( / ) byte but does not skip quoted strings or comments, so parentheses inside literals/comments after a scalar subquery can drive depth negative before the alias is parsed. For inputs like (SELECT ...) || ')' AS year_total, last_alias_start stays unset, the alias is missed, and the ORDER BY wrapper is not applied, so the original DuckDB "Alias has subquery" failure still occurs.

Useful? React with 👍 / 👎.

Parentheses inside string literals or comments (e.g. (SELECT ...) || ')'
AS year_total) no longer corrupt the depth counter, which previously
caused the alias to be missed and the ORDER BY wrapper to not apply.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: afd6e29159

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread yardstick-rs/src/sql/measures.rs Outdated
}
}
}
b',' if depth == 0 => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip comments when splitting SELECT items

extract_subquery_aliases_from_select only skips single-quoted literals while scanning for top-level commas, so commas inside --//* ... */ comments are treated as real item separators. In inputs like SELECT (SELECT 1) /* note,comma */ AS year_total ... ORDER BY year_total, the item is split before the alias, subquery_alias_from_item misses year_total, and expand_aggregate_with_at does not wrap the query, so the original DuckDB Alias has subquery failure still occurs.

Useful? React with 👍 / 👎.

extract_subquery_aliases_from_select now skips -- line comments and
/* block comments */ so commas inside comments are not treated as
SELECT item separators.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef2686efea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread yardstick-rs/src/sql/measures.rs Outdated
Comment on lines +5728 to +5731
while let Some(pos) = stripped_upper[search_from..].find(&ident_upper) {
let abs = search_from + pos;
let before_ok = abs == 0 || {
let c = bytes[abs - 1];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict alias matching to outer ORDER BY scope

identifier_appears_in does a whole-word scan across the full ORDER BY text and intentionally matches at any parenthesis depth, so alias names appearing inside nested subqueries are treated as references to the outer SELECT alias. In cases like ORDER BY (SELECT year_total FROM aux), o.prodName, this makes needs_wrap true even though the outer sort key never uses the year_total alias; the subsequent wrap to SELECT * FROM (...) _q can then invalidate qualified keys such as o.prodName. The wrap decision should ignore matches inside nested SELECT scopes.

Useful? React with 👍 / 👎.

identifier_appears_in now strips (SELECT ...) and (WITH ...) blocks
before scanning, so aliases appearing only inside nested subqueries
in ORDER BY do not falsely trigger the query wrapper.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0303c34e23

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +5740 to +5743
let after_pos = abs + ident_upper.len();
let after_ok = after_pos >= stripped_upper.len() || {
let c = bytes[after_pos];
!c.is_ascii_alphanumeric() && c != b'_'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard byte indexing after Unicode case expansion

Fresh evidence: even after advancing search_from on char boundaries, this branch still indexes cleaned.as_bytes() with offsets computed from cleaned.to_uppercase(), whose byte layout can grow for some Unicode characters (for example ʼn). In ORDER BY text like ʼnyear_totalx, after_pos is < stripped_upper.len() but equals bytes.len(), so bytes[after_pos] panics with out-of-bounds instead of safely returning false; this can crash SQL expansion on otherwise valid input.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 55134a0895

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +6793 to +6794
let order_qualifiers =
collect_order_by_wrapper_qualifiers(&from_info, primary_alias.as_deref());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include joined aliases when wrapping ORDER BY

When the query still contains AGGREGATE(...) AT (...), extract_from_clause_info_ffi(&sql) can fall back to only the primary table (as noted by the nearby fallback path for syntax the parser cannot parse), but the wrapper uses this qualifier set to decide which qualified ORDER BY keys to rewrite. In a joined query such as FROM sales_v s JOIN salesdetails_v sd ... ORDER BY year_total, sd.product, wrapping is triggered by year_total, yet sd.product is left unchanged and becomes invalid outside SELECT * FROM (...) _q. The wrapper needs joined aliases even on the fallback path, or it will break valid joined AT queries with qualified tie-breakers.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ba4f9f1: when parser FROM extraction cannot parse AGGREGATE(...) AT (...), the fallback now collects top-level FROM/JOIN table aliases and merges them into the wrapper qualifier set. Added a joined sd.product ORDER BY regression plus a focused fallback unit test.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ba4f9f1789

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +6400 to +6406
.unwrap_or_else(|| {
hidden_order_alias(
&order_text[i..name_end],
&mut hidden_selects,
&mut hidden_aliases,
&mut hidden_by_expr,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid hiding aggregate ORDER BY arguments as raw columns

When wrapping is triggered and a sort key contains an aggregate expression like ORDER BY year_total, SUM(o.revenue), this fallback treats the qualified argument o.revenue as an unprojected column and adds o.revenue AS __yardstick_order_0 to the inner grouped query. That makes the generated inner SELECT invalid because o.revenue is neither grouped nor aggregated, and the outer ORDER BY is also changed to SUM(_q.__yardstick_order_0) in a non-grouped wrapper. The requalification needs to avoid descending into aggregate-function arguments or project the aggregate expression itself instead of the raw column.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2bd4f0a: aggregate ORDER BY calls are now captured as hidden inner aggregate expressions before the qualifier scanner descends into their arguments. The outer wrapper orders by the hidden aggregate output instead of SUM(_q.raw_column). Added unit coverage for SUM(o.revenue) and a SQL regression for ORDER BY year_total, SUM(o.year).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2bd4f0a3fb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread yardstick-rs/src/sql/measures.rs Outdated
Comment on lines +6472 to +6476
if let Some(function_end) =
parenthesized_expr_end(order_text, after_function_name)
{
let output_name = hidden_order_alias(
&order_text[i..function_end],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve FILTER clauses when hiding aggregate sort keys

When wrapping is triggered and an ORDER BY key uses a filtered aggregate, e.g. ORDER BY year_total, SUM(o.revenue) FILTER (WHERE o.region = 'US'), this branch only captures the aggregate call through the closing ) and replaces it with _q.__yardstick_order_0. The remaining FILTER (...) is then emitted in the outer query after a plain column reference, while the hidden inner select contains the unfiltered aggregate, so the rewritten SQL is invalid or sorts by the wrong value. The hidden expression needs to include aggregate modifiers such as FILTER before replacing the sort key.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9105e23: hidden aggregate ORDER BY expressions now include trailing aggregate modifiers before replacement, including FILTER (...), WITHIN GROUP (...), and OVER (...)/named windows. Added unit coverage for filtered aggregates and a SQL regression for ORDER BY year_total, SUM(o.year) FILTER (...).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9105e23c9c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread yardstick-rs/src/sql/measures.rs Outdated
cursor = next;
}

while let Some(join_pos) = find_top_level_keyword(sql, "JOIN", cursor) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include comma-joined tables in wrapper qualifiers

When parser FROM extraction falls back for AGGREGATE ... AT syntax, this scanner records the first table and only subsequent JOIN targets, so comma joins like FROM sales_v s, salesdetails_v sd ... ORDER BY year_total, sd.product omit sd from the qualifier set. Wrapping is still triggered by year_total, but sd.product is left unchanged in the outer SELECT * FROM (...) _q ORDER BY ..., where sd is no longer in scope, causing a binder error for a valid tie-breaker query. The fallback needs to parse comma-separated table refs as well as explicit joins.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in affcb70: fallback FROM extraction now scans top-level comma-separated table refs as well as explicit JOIN targets, so comma-joined aliases are included in the ORDER BY wrapper qualifier set. Added unit and SQL regressions for the comma-join sd.product tie-breaker.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: affcb70cc1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +6618 to +6624
.unwrap_or_else(|| {
hidden_order_alias(
&order_text[i..name_end],
&mut hidden_selects,
&mut hidden_aliases,
&mut hidden_by_expr,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve scalar ORDER BY expressions when hiding sort keys

When wrapping is triggered and a secondary sort key is a scalar expression over a grouped dimension, this fallback hides only the qualified column it encounters instead of the whole expression. For example, SELECT LOWER(o.region) AS r, ... AS year_total ... GROUP BY LOWER(o.region) ORDER BY year_total, LOWER(o.region) gets an inner projection like o.region AS __yardstick_order_0; that raw column is not grouped, so the generated inner grouped query fails even though the original ORDER BY LOWER(o.region) is valid. The rewrite should capture/project the full scalar sort expression (or map it to an existing SELECT alias) rather than hiding the raw argument.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5e9a31f: scalar ORDER BY function calls are now matched to aliased SELECT expressions before wrapping, and unmapped scalar sort expressions are hidden as the full expression rather than as a raw argument column. Added unit coverage for alias mapping, hidden scalar expressions, semicolon-terminated ORDER BY text, and a SQL regression for LOWER(o.region).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e9a31f360

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread yardstick-rs/src/sql/measures.rs Outdated
return alias.clone();
}

let alias = format!("__yardstick_order_{}", hidden_aliases.len());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid colliding hidden ORDER BY aliases

When wrapping needs a hidden sort key and the user already projects a column named __yardstick_order_0, this fixed alias collides with the real output. For example, a query selecting o.year AS __yardstick_order_0 and ordering by an unprojected grouped key will add another AS __yardstick_order_0, then the outer * EXCLUDE (__yardstick_order_0) can drop the user's column or make the wrapped query ambiguous. Generate helper names that cannot collide with existing SELECT outputs.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in dd100e2: hidden ORDER BY helper aliases now avoid existing SELECT output names before generating __yardstick_order_N. Added unit coverage for the collision case and a SQL regression that preserves a user-projected __yardstick_order_0 column.

Comment thread yardstick-rs/src/sql/measures.rs Outdated
}
}

i >= bytes.len() || bytes[i] == b',' || bytes[i] == b';'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hide full composite scalar sort expressions

Fresh evidence: the current completeness check accepts a function sort key only when the next token is comma/semicolon/end, so valid grouped expressions such as ORDER BY year_total, LOWER(o.region) || '' or LOWER(o.region) COLLATE NOCASE are not hidden as a full expression. In those cases wrapping is already triggered by year_total, but the scanner descends into the function and hides raw o.region, adding an ungrouped column to the inner grouped SELECT and causing a binder error even though the original ORDER BY expression is valid.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

@nicosuave nicosuave May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in dd100e2: scalar ORDER BY handling now captures the full top-level sort expression, excluding only ORDER BY direction/null ordering suffixes, before deciding whether to map or hide it. Added unit coverage for concatenated and collated scalar sort expressions plus a SQL regression for LOWER(o.region) || ''.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dd100e2385

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +6943 to +6947
if let Some(output_name) = outputs.get(&name) {
out.push_str(output_name);
i = qualifier_end;
continue;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hide unqualified grouped ORDER BY keys

When wrapping is triggered, an unqualified secondary sort key that is valid only in the inner grouped query is left unchanged unless it already appears in outputs. For example, SELECT o.year, ... AS year_total FROM sales_v o GROUP BY o.year, o.region ORDER BY year_total, region is valid before wrapping because region is grouped, but this path emits an outer ORDER BY year_total, region while the inner projection exposes only year and year_total, so region no longer binds. Unqualified grouped keys need the same hidden-projection handling added for qualified keys.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 77ebcf3: unqualified ORDER BY items that are not projected outputs now get hidden as helper projections when wrapping, so grouped keys like region remain available to the outer ORDER BY. Added unit coverage for bare and composite unqualified sort keys plus a SQL regression for ORDER BY year_total, region.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 77ebcf3352

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread yardstick-rs/src/sql/measures.rs Outdated
Comment on lines +5933 to +5940
&& bytes[j..j + 2].eq_ignore_ascii_case(b"AS")
&& (j + 2 >= bytes.len() || bytes[j + 2].is_ascii_whitespace())
{
let mut end = j + 2;
while end < bytes.len() && bytes[end].is_ascii_whitespace() {
end += 1;
}
last_alias_start = Some(end);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat comments as whitespace when finding subquery aliases

When a scalar-subquery select item has comments between the subquery, AS, and the alias (for example AGGREGATE(...) AT (...) AS/*comment*/year_total, which expands to (SELECT ...) AS/*comment*/year_total), this scanner only skips ASCII whitespace after the closing ) and after AS. SQL comments are whitespace, so the valid alias is either missed or parsed as AS; identifier_appears_in then never sees year_total, the wrapper is skipped, and DuckDB still hits the subquery-alias ORDER BY failure this change is meant to avoid.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f23fdc2: subquery alias parsing now treats SQL comments as whitespace after the closing subquery, around AS, and before implicit aliases. Added unit coverage for block and line comment gaps plus a SQL regression for AGGREGATE(...) AT (...) AS/comment/year_total. This commit also switches parser FFI expression metadata reads to public DuckDB accessors to clear the next-binary protected-field CI failures.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f23fdc2661

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +6928 to +6929
hidden_order_alias(
&order_text[i..name_end],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hide full qualified-leading sort expressions

Fresh evidence: the current qualified-reference branch still hides only o.region even when the ORDER BY item is a larger expression starting with that reference. When wrapping is triggered by a subquery alias, a valid grouped query such as SELECT o.region || '' AS r, ... AS year_total FROM sales_v o GROUP BY o.region || '' ORDER BY year_total, o.region || '' gets an inner helper projection o.region AS __yardstick_order_0; because the inner query groups by o.region || '' rather than raw o.region, that added select is not grouped and DuckDB will reject the rewritten SQL. This path should capture/map the whole top-level sort expression before falling back to hiding a raw qualified column.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

@nicosuave nicosuave May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7d48314: qualified-leading ORDER BY items now capture the full top-level sort expression before falling back to a raw qualified column. If the expression is already projected, the wrapper orders by the projected alias; otherwise it adds the full expression as a hidden helper projection. Added unit coverage for mapped and hidden qualified-leading expressions plus a SQL regression for the qualified concatenation sort key. Verified with cargo test test_requalify_order_by_for_wrapper, cargo test test_subquery_alias_from_item_whitespace_variants, and make test.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d4831488b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +4143 to +4145
let mut i = skip_ws_and_comments(sql, start);
if i >= end || bytes[i] == b'(' {
return None;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Capture derived-table aliases in FROM fallback

When the parser path cannot parse the custom AGGREGATE ... AT SQL and this fallback is used, this early return skips any table reference that starts with (, so aliases for derived tables are never added to the qualifier set. In a query such as FROM sales_v s JOIN (SELECT ... ) sd ... ORDER BY year_total, sd.product, wrapping is triggered by year_total but sd.product is left unchanged in the outer SELECT ... FROM (...) _q ORDER BY ..., where sd is no longer in scope; the rewriter should consume the parenthesized table ref and record its alias.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 559c128: fallback FROM extraction now consumes parenthesized SELECT/WITH table refs and records the following alias, including derived tables introduced by JOIN. Added focused fallback coverage for JOIN (SELECT ...) sd and a SQL regression for ORDER BY year_total, sd.product through a derived table. Verified with cargo test test_extract_from_clause_info_fallback_join_with_at_syntax, cargo test test_requalify_order_by_for_wrapper, and make test.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 559c1283a8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread yardstick-rs/src/sql/measures.rs Outdated
Comment on lines +6356 to +6361
let (expr, alias_clause) = split_item_expr_and_alias(item);
let output_name = if let Some(alias_clause) = alias_clause {
parse_alias_span(alias_clause, 0).map(|(_, alias)| alias)
} else {
extract_last_qualified_identifier_token(expr)
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Recognize implicit SELECT aliases before hiding sort keys

When wrapping is triggered for a subquery-backed alias, this output map only understands explicit AS aliases. For a valid query like SELECT o.region r, ... AS year_total FROM sales_v o GROUP BY o.region ORDER BY year_total, r, r is not recorded as an exposed output, so the requalifier treats the secondary sort key as unprojected and appends r AS __yardstick_order_0 to the inner SELECT. At that point r is only a SELECT-list alias, not an input column, so the rewritten grouped query can fail to bind even though the original ORDER BY r was valid.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 351710c: collect_order_by_wrapper_outputs now recognizes parser-confirmed implicit SELECT aliases and has a fallback for simple column aliases, so SELECT o.region r reserves r as an exposed output instead of adding r AS __yardstick_order_0 to the grouped inner query. Added a focused requalifier regression and a SQL regression for ORDER BY year_total, r. Verified with cargo test test_requalify_order_by_for_wrapper and make test.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 351710cb31

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread yardstick-rs/src/sql/measures.rs Outdated
let order_end = find_first_top_level_keyword(
&result_sql,
expr_start,
&["LIMIT", "OFFSET"],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Split FETCH before hiding ORDER BY keys

When a wrapped query uses a standard FETCH FIRST ... ROWS ONLY tail, this boundary search leaves the FETCH clause inside order_text because only LIMIT and OFFSET are split out. If wrapping is triggered and a secondary sort key must be hidden, e.g. ORDER BY year_total, region FETCH FIRST 10 ROWS ONLY, the rewriter treats region FETCH FIRST 10 ROWS ONLY as the sort expression and appends it to the inner SELECT as ... AS __yardstick_order_0, producing invalid SQL instead of preserving the row limit. Include FETCH as an ORDER BY tail boundary before requalifying/hiding sort keys.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7b775fe: the ORDER BY wrapper now treats top-level FETCH as an order tail boundary alongside LIMIT and OFFSET, so hidden sort-key insertion only sees the sort expression and preserves FETCH FIRST ... ROWS ONLY outside the inner SELECT. Added a SQL regression for ORDER BY year_total, region FETCH FIRST 2 ROWS ONLY where region is a hidden sort key. Verified with cargo test test_requalify_order_by_for_wrapper and make test.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b775fea38

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +7103 to +7105
if function_call_has_unmapped_qualified_ref(
expr, qualifiers, outputs,
) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hide unqualified function sort keys

Fresh evidence: in the current code this branch only hides a scalar function sort expression when function_call_has_unmapped_qualified_ref(...) finds a table-qualified reference, and that helper ignores unqualified columns. When wrapping is triggered by a subquery alias, a valid grouped query such as SELECT o.year, ... AS year_total FROM sales_v o GROUP BY o.year, o.region ORDER BY year_total, LOWER(region) leaves the outer sort as LOWER(region) even though the wrapper subquery only exposes year and year_total, causing a bind error; unqualified function expressions over grouped but unprojected keys need the same hidden-projection handling as qualified ones.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cab1241: scalar function ORDER BY items now detect unmapped unqualified identifiers and hide the full expression, so ORDER BY year_total, LOWER(region) becomes an outer sort on a hidden projected key instead of trying to bind region outside the wrapper. Projected aliases such as LOWER(r) are left untouched. Also moved FETCH FIRST coverage into the Rust boundary test because the Linux next image rejects FETCH syntax before the wrapper can be asserted. Verified with git diff --check, cargo test test_requalify_order_by_for_wrapper, and make test.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cab12410b5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +6903 to +6904
if !is_order_item_expr_start(order_text, start) {
return None;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle parenthesized sort keys before hiding columns

When wrapping is triggered and a secondary sort key is parenthesized, the first identifier is no longer considered the start of the ORDER BY item, so the full expression is not captured. For example, ORDER BY year_total, (o.region || '') on a query grouped by o.region || '' falls through to hiding just o.region, adding an ungrouped raw column to the inner grouped SELECT even though the original parenthesized expression was valid.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

@nicosuave nicosuave May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 00de41c: parenthesized non-subquery ORDER BY items are now handled as whole sort expressions before the scanner descends into their identifiers, so a parenthesized grouped expression is hidden as the full expression instead of projecting a raw column such as o.region. Added unit coverage and a SQL regression for the parenthesized grouped expression case. Verified with git diff --check, cargo test test_requalify_order_by_for_wrapper, and make test.

Comment thread yardstick-rs/src/sql/measures.rs Outdated
Comment on lines +7042 to +7044
if let Some(next) = skip_quoted_sql_token(bytes, i, bytes.len()) {
i = next;
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Inspect quoted identifiers inside function sort keys

In function ORDER BY expressions, double-quoted identifiers are skipped together with string literals, so an unprojected grouped key like LOWER("region") is not detected and hidden before the wrapper is built. When year_total triggers wrapping, the outer query is left with ORDER BY LOWER("region") even though the inner subquery only projects the visible SELECT columns, causing quoted grouped columns that were valid in the original ORDER BY to fail to bind.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 00de41c: function sort-key scanning now skips only single-quoted string literals and parses quoted identifiers, so LOWER("region") is detected as an unprojected grouped key and hidden before wrapping. Added unit coverage and a SQL regression for ORDER BY year_total, LOWER("region"). Verified with git diff --check, cargo test test_requalify_order_by_for_wrapper, and make test.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00de41c25c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +7316 to +7317
hidden_order_alias(
&order_text[i..name_end],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hide whole unary sort expressions

When wrapping is triggered by a subquery alias and a secondary sort key starts with a unary operator, e.g. ORDER BY year_total, -o.year on a query grouped by -o.year but not projecting year, the scanner reaches o.year after the leading -, so the expression-start check above does not capture the whole sort item. This fallback then adds a hidden projection for raw o.year instead of -o.year, making the generated inner grouped query select an ungrouped column and fail to bind even though the original ORDER BY expression was valid.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

order by named aggregates results in an error

1 participant