Fix ORDER BY with named aggregates that expand to subqueries#34
Fix ORDER BY with named aggregates that expand to subqueries#34nicosuave wants to merge 42 commits into
Conversation
When AGGREGATE(...) AT (...) expressions expand to correlated subqueries and are referenced by alias in ORDER BY, DuckDB errors with "Alias has subquery - not yet supported." Fix by wrapping the inner query in a subquery when ORDER BY references a subquery-aliased SELECT item.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a1b14ae6ec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let select_start = upper.find("SELECT").map(|p| p + 6).unwrap_or(0); | ||
| let select_text = &sql[select_start..from_pos]; |
There was a problem hiding this comment.
Find outer SELECT before extracting subquery aliases
extract_subquery_aliases_from_select uses upper.find("SELECT"), which picks the first SELECT anywhere in the SQL text rather than the outer query's SELECT. For statements with CTEs (for example WITH cte AS (SELECT ...) SELECT ..., (SELECT ...) AS year_total ... ORDER BY year_total), this slices the wrong SELECT list, so year_total is missed (or aliases from the CTE are picked instead). In that case needs_wrap stays false and the original DuckDB Alias has subquery error still occurs for ORDER BY on the subquery alias.
Useful? React with 👍 / 👎.
Fixes a bug where extract_subquery_aliases_from_select would match a SELECT inside a CTE rather than the outer query's SELECT, causing the ORDER BY wrapping to be skipped for CTE-based queries.
|
@codex review |
1 similar comment
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e25309ea64
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| while let Some(pos) = text_upper[search_from..].find(&ident_upper) { | ||
| let abs = search_from + pos; | ||
| let before_ok = abs == 0 || { | ||
| let c = text.as_bytes()[abs - 1]; | ||
| !c.is_ascii_alphanumeric() && c != b'_' && c != b'.' |
There was a problem hiding this comment.
Ignore literals/comments when detecting ORDER BY alias refs
identifier_appears_in does a raw substring scan and boundary check, so alias names inside string literals/comments are treated as real identifier references. In that case needs_wrap becomes true and the query is wrapped, which can invalidate otherwise-valid ORDER BY expressions that rely on original table qualifiers (for example o.col) because those qualifiers no longer exist in the outer SELECT * FROM (...) _q. This creates a regression for queries where the alias text appears only as literal/comment text.
Useful? React with 👍 / 👎.
| b'\'' => { | ||
| // Skip single-quoted strings | ||
| let mut j = i + 1; | ||
| while j < select_text.len() { | ||
| if bytes[j] == b'\'' { |
There was a problem hiding this comment.
Advance past quoted strings when scanning SELECT items
In the single-quote branch, the scanner computes j to skip a string literal but never updates the loop index, so commas inside '...' are still processed as top-level separators. That can split one SELECT item into fragments and hide a subquery alias (e.g. (SELECT 1) || 'a,b' AS x), preventing the wrap logic from triggering and leaving the original DuckDB alias/subquery error unfixed.
Useful? React with 👍 / 👎.
Fixes two issues from PR review: 1. extract_subquery_aliases_from_select did not advance past quoted strings, so commas inside literals could split SELECT items 2. identifier_appears_in matched alias names inside string literals and comments, potentially triggering unnecessary query wrapping
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 50e133e095
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| b'(' => depth += 1, | ||
| b')' => depth -= 1, | ||
| _ if depth == 0 => { | ||
| if i + 4 <= item.len() && upper[i..].starts_with(" AS ") { |
There was a problem hiding this comment.
Avoid UTF-8 boundary panics in alias AS scan
subquery_alias_from_item walks item by byte index and slices upper[i..] at each top-level byte. If a multibyte character appears outside parentheses (for example AGGREGATE(...) AT (...) || 'é' AS year_total), i can point into the middle of a UTF-8 code point and panic with byte index ... is not a char boundary. This path is executed for any query with top-level ORDER BY, so queries that previously executed can now crash before SQL reaches DuckDB.
Useful? React with 👍 / 👎.
| let c = stripped.as_bytes()[abs - 1]; | ||
| !c.is_ascii_alphanumeric() && c != b'_' && c != b'.' | ||
| }; |
There was a problem hiding this comment.
Exclude quoted qualified names from alias detection
The before_ok boundary check only rejects matches when the immediate previous byte is .. For ORDER BY o."year_total", the previous byte at year_total is ", so this function incorrectly treats it as an unqualified alias reference. That sets needs_wrap = true, and wrapping to SELECT * FROM (...) _q removes table alias o, causing valid qualified ORDER BY clauses to fail with missing-table binder errors.
Useful? React with 👍 / 👎.
- Use char_indices() in subquery_alias_from_item to avoid slicing at invalid UTF-8 boundaries when multibyte characters appear - Reject matches preceded by " or ` in identifier_appears_in to avoid treating o."year_total" as an unqualified alias reference
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c4bb0a6d89
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| '(' => depth += 1, | ||
| ')' => depth -= 1, | ||
| _ if depth == 0 && ch == ' ' => { | ||
| if upper[i..].starts_with(" AS ") |
There was a problem hiding this comment.
Avoid indexing
upper with byte offsets from item
Fresh evidence: although this now uses char_indices(), i comes from item while the slice is taken from upper = item.to_uppercase(), whose byte length can differ for Unicode case mappings. For inputs like (SELECT 1) || 'ıı…' AS year_total, upper becomes shorter and upper[i..] can panic with an out-of-bounds/char-boundary error before SQL reaches DuckDB, so a valid query can crash the process.
Useful? React with 👍 / 👎.
| let abs = search_from + pos; | ||
| let before_ok = abs == 0 || { | ||
| let c = stripped.as_bytes()[abs - 1]; | ||
| !c.is_ascii_alphanumeric() && c != b'_' && c != b'.' && c != b'"' && c != b'`' |
There was a problem hiding this comment.
Match unqualified quoted aliases in ORDER BY detection
The boundary check rejects matches when the preceding byte is " or `, which avoids o."alias" false positives but also drops legitimate unqualified references like ORDER BY "year_total" (or backtick-quoted equivalents). In that case needs_wrap stays false, so queries that alias subquery expressions with quoted identifiers still hit DuckDB's Alias has subquery failure instead of being wrapped.
Useful? React with 👍 / 👎.
- subquery_alias_from_item: compare " AS " directly on item bytes instead of slicing the uppercased string, which can have different byte offsets for certain Unicode case mappings - identifier_appears_in: only reject " and ` before a match when preceded by . (qualified reference like o."alias"), allowing bare quoted identifiers like "year_total" to match correctly
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e630dbb883
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if before_ok && after_ok { | ||
| return true; | ||
| } | ||
| search_from = abs + 1; |
There was a problem hiding this comment.
Advance identifier scan by char boundary
identifier_appears_in advances with search_from = abs + 1, but abs is a byte offset in UTF-8. With non-ASCII aliases (for example AS "é"), a non-boundary match attempt (like scanning ORDER BY té) can set search_from into the middle of a multibyte character, and the next stripped_upper[search_from..] slice panics. This makes otherwise valid queries crash during SQL expansion instead of returning a planner error/result.
Useful? React with 👍 / 👎.
| ')' => depth -= 1, | ||
| ' ' if depth == 0 => { | ||
| if item[i..].len() >= 4 | ||
| && item[i..].as_bytes()[1..4].eq_ignore_ascii_case(b"AS ") |
There was a problem hiding this comment.
Parse AS aliases with general whitespace
subquery_alias_from_item only recognizes aliases when the token pattern is exactly " AS ", so valid SQL formatting like (SELECT ...) AS\nyear_total or (SELECT ...) AS\tyear_total is missed. In those cases the subquery alias is not collected, needs_wrap stays false, and ORDER BY on that alias still hits DuckDB's Alias has subquery failure that this patch is meant to avoid.
Useful? React with 👍 / 👎.
- identifier_appears_in: advance search_from to the next valid char boundary after a non-matching hit, preventing panics on non-ASCII text - subquery_alias_from_item: match AS with any surrounding whitespace (tabs, newlines, multiple spaces), not just single spaces
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7d6f736c00
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if !item.to_uppercase().contains("(SELECT ") { | ||
| return None; |
There was a problem hiding this comment.
Relax subquery detection beyond literal "(SELECT "
subquery_alias_from_item only proceeds when the SELECT item text contains the exact substring "(SELECT ", so valid scalar subqueries formatted as ( SELECT ...), (SELECT\n...), or (WITH ... SELECT ...) are skipped. In those cases the alias is never collected, needs_wrap stays false, and ORDER BY on that alias still hits DuckDB’s "Alias has subquery" error this patch is meant to avoid.
Useful? React with 👍 / 👎.
| while let Some(pos) = stripped_upper[search_from..].find(&ident_upper) { | ||
| let abs = search_from + pos; | ||
| let before_ok = abs == 0 || { | ||
| let c = stripped.as_bytes()[abs - 1]; | ||
| if c == b'"' || c == b'`' { |
There was a problem hiding this comment.
Avoid matching aliases in ORDER BY keyword contexts
The alias check is a raw whole-word search, so aliases that are also SQL tokens (for example year) are treated as references when they appear in syntax like EXTRACT(YEAR FROM o.ts). That false positive sets needs_wrap = true and rewrites to SELECT * FROM (...) _q ORDER BY ...; after wrapping, qualified references such as o.ts are no longer valid and previously-working queries can fail with binder errors.
Useful? React with 👍 / 👎.
- item_has_subquery: detect subqueries with whitespace after opening paren and CTE-style (WITH ... SELECT ...) patterns - identifier_appears_in: track parenthesis depth and only match at top level, preventing false positives like EXTRACT(YEAR FROM ...) triggering unnecessary query wrapping
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4ec199d84b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Only match at top level (outside parentheses) | ||
| let at_top_level = depths[abs] == 0; | ||
| let before_ok = abs == 0 || { |
There was a problem hiding this comment.
Match aliases inside parenthesized ORDER BY expressions
identifier_appears_in only considers matches when depths[abs] == 0, so alias references nested in parentheses are ignored. That means common ORDER BY forms like COALESCE(year_total, 0) or (year_total) will not trigger wrapping even when year_total is a scalar-subquery alias, and DuckDB will still raise the same Alias has subquery error this patch is intended to avoid.
Useful? React with 👍 / 👎.
| if i + 2 <= bytes.len() | ||
| && bytes[i..i + 2].eq_ignore_ascii_case(b"AS") | ||
| && (i + 2 >= bytes.len() || bytes[i + 2].is_ascii_whitespace()) | ||
| { |
There was a problem hiding this comment.
Support subquery aliases declared without AS
subquery_alias_from_item only records aliases when it finds the AS token pattern, but SQL allows select-item aliases without AS (e.g. (SELECT ...) year_total). In that valid form, the subquery alias is never collected, needs_wrap stays false, and ORDER BY year_total still hits DuckDB's subquery-alias planner error.
Useful? React with 👍 / 👎.
- identifier_appears_in: remove depth=0 restriction so aliases inside COALESCE(year_total, 0) or (year_total) are correctly detected - subquery_alias_from_item: detect implicit aliases without AS keyword, e.g. (SELECT ...) year_total
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f83ac55ebb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if j + 6 <= bytes.len() && bytes[j..j + 6].eq_ignore_ascii_case(b"SELECT") { | ||
| return true; | ||
| } | ||
| if j + 4 <= bytes.len() && bytes[j..j + 4].eq_ignore_ascii_case(b"WITH") { | ||
| return true; |
There was a problem hiding this comment.
Require token boundaries for SELECT/WITH subquery checks
item_has_subquery treats any parenthesized text that starts with SELECT or WITH as a subquery, but it does not verify a keyword boundary. Expressions like (selected_value) AS x or (with_tax) AS x are therefore misclassified as subqueries, so expand_aggregate_with_at can wrap the whole query when ORDER BY x is present. That unnecessary wrap changes name resolution context and can break otherwise-valid qualified sort keys (for example ORDER BY x, o.col) because o is no longer visible outside SELECT * FROM (...) _q.
Useful? React with 👍 / 👎.
item_has_subquery now verifies the character after SELECT/WITH is not alphanumeric or underscore, preventing identifiers like selected_value or with_tax from being misclassified as subquery keywords.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d430c8a954
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if bytes[i] == b'(' { | ||
| let mut j = i + 1; | ||
| while j < bytes.len() && bytes[j].is_ascii_whitespace() { | ||
| j += 1; | ||
| } | ||
| if j + 6 <= bytes.len() | ||
| && bytes[j..j + 6].eq_ignore_ascii_case(b"SELECT") | ||
| && (j + 6 >= bytes.len() || !bytes[j + 6].is_ascii_alphanumeric() && bytes[j + 6] != b'_') |
There was a problem hiding this comment.
Skip literals/comments when detecting subquery aliases
item_has_subquery scans raw bytes and treats any ("SELECT"|"WITH") sequence as a real subquery, even when it appears inside a string literal or comment. That makes extract_subquery_aliases_from_select misclassify non-subquery SELECT items as subquery aliases, which can trigger the new outer wrapping path unnecessarily. In aggregate queries this can turn previously valid ORDER BY clauses (especially ones that also reference qualified columns like t.col) into binder errors after wrapping because the original table alias is no longer in scope.
Useful? React with 👍 / 👎.
item_has_subquery now skips single-quoted strings, line comments, and block comments so that text like '(SELECT ...)' inside a literal is not misclassified as a real subquery.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7740e275cb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| b'(' => depth += 1, | ||
| b')' => { | ||
| depth -= 1; |
There was a problem hiding this comment.
Skip string/comment parens in subquery alias scan
subquery_alias_from_item updates depth for every ( / ) byte but does not skip quoted strings or comments, so parentheses inside literals/comments after a scalar subquery can drive depth negative before the alias is parsed. For inputs like (SELECT ...) || ')' AS year_total, last_alias_start stays unset, the alias is missed, and the ORDER BY wrapper is not applied, so the original DuckDB "Alias has subquery" failure still occurs.
Useful? React with 👍 / 👎.
Parentheses inside string literals or comments (e.g. (SELECT ...) || ')' AS year_total) no longer corrupt the depth counter, which previously caused the alias to be missed and the ORDER BY wrapper to not apply.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: afd6e29159
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| } | ||
| } | ||
| b',' if depth == 0 => { |
There was a problem hiding this comment.
Skip comments when splitting SELECT items
extract_subquery_aliases_from_select only skips single-quoted literals while scanning for top-level commas, so commas inside --//* ... */ comments are treated as real item separators. In inputs like SELECT (SELECT 1) /* note,comma */ AS year_total ... ORDER BY year_total, the item is split before the alias, subquery_alias_from_item misses year_total, and expand_aggregate_with_at does not wrap the query, so the original DuckDB Alias has subquery failure still occurs.
Useful? React with 👍 / 👎.
extract_subquery_aliases_from_select now skips -- line comments and /* block comments */ so commas inside comments are not treated as SELECT item separators.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ef2686efea
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| while let Some(pos) = stripped_upper[search_from..].find(&ident_upper) { | ||
| let abs = search_from + pos; | ||
| let before_ok = abs == 0 || { | ||
| let c = bytes[abs - 1]; |
There was a problem hiding this comment.
Restrict alias matching to outer ORDER BY scope
identifier_appears_in does a whole-word scan across the full ORDER BY text and intentionally matches at any parenthesis depth, so alias names appearing inside nested subqueries are treated as references to the outer SELECT alias. In cases like ORDER BY (SELECT year_total FROM aux), o.prodName, this makes needs_wrap true even though the outer sort key never uses the year_total alias; the subsequent wrap to SELECT * FROM (...) _q can then invalidate qualified keys such as o.prodName. The wrap decision should ignore matches inside nested SELECT scopes.
Useful? React with 👍 / 👎.
identifier_appears_in now strips (SELECT ...) and (WITH ...) blocks before scanning, so aliases appearing only inside nested subqueries in ORDER BY do not falsely trigger the query wrapper.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0303c34e23
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let after_pos = abs + ident_upper.len(); | ||
| let after_ok = after_pos >= stripped_upper.len() || { | ||
| let c = bytes[after_pos]; | ||
| !c.is_ascii_alphanumeric() && c != b'_' |
There was a problem hiding this comment.
Guard byte indexing after Unicode case expansion
Fresh evidence: even after advancing search_from on char boundaries, this branch still indexes cleaned.as_bytes() with offsets computed from cleaned.to_uppercase(), whose byte layout can grow for some Unicode characters (for example ʼn). In ORDER BY text like ʼnyear_totalx, after_pos is < stripped_upper.len() but equals bytes.len(), so bytes[after_pos] panics with out-of-bounds instead of safely returning false; this can crash SQL expansion on otherwise valid input.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 55134a0895
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let order_qualifiers = | ||
| collect_order_by_wrapper_qualifiers(&from_info, primary_alias.as_deref()); |
There was a problem hiding this comment.
Include joined aliases when wrapping ORDER BY
When the query still contains AGGREGATE(...) AT (...), extract_from_clause_info_ffi(&sql) can fall back to only the primary table (as noted by the nearby fallback path for syntax the parser cannot parse), but the wrapper uses this qualifier set to decide which qualified ORDER BY keys to rewrite. In a joined query such as FROM sales_v s JOIN salesdetails_v sd ... ORDER BY year_total, sd.product, wrapping is triggered by year_total, yet sd.product is left unchanged and becomes invalid outside SELECT * FROM (...) _q. The wrapper needs joined aliases even on the fallback path, or it will break valid joined AT queries with qualified tie-breakers.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in ba4f9f1: when parser FROM extraction cannot parse AGGREGATE(...) AT (...), the fallback now collects top-level FROM/JOIN table aliases and merges them into the wrapper qualifier set. Added a joined sd.product ORDER BY regression plus a focused fallback unit test.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ba4f9f1789
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| .unwrap_or_else(|| { | ||
| hidden_order_alias( | ||
| &order_text[i..name_end], | ||
| &mut hidden_selects, | ||
| &mut hidden_aliases, | ||
| &mut hidden_by_expr, | ||
| ) |
There was a problem hiding this comment.
Avoid hiding aggregate ORDER BY arguments as raw columns
When wrapping is triggered and a sort key contains an aggregate expression like ORDER BY year_total, SUM(o.revenue), this fallback treats the qualified argument o.revenue as an unprojected column and adds o.revenue AS __yardstick_order_0 to the inner grouped query. That makes the generated inner SELECT invalid because o.revenue is neither grouped nor aggregated, and the outer ORDER BY is also changed to SUM(_q.__yardstick_order_0) in a non-grouped wrapper. The requalification needs to avoid descending into aggregate-function arguments or project the aggregate expression itself instead of the raw column.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 2bd4f0a: aggregate ORDER BY calls are now captured as hidden inner aggregate expressions before the qualifier scanner descends into their arguments. The outer wrapper orders by the hidden aggregate output instead of SUM(_q.raw_column). Added unit coverage for SUM(o.revenue) and a SQL regression for ORDER BY year_total, SUM(o.year).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2bd4f0a3fb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if let Some(function_end) = | ||
| parenthesized_expr_end(order_text, after_function_name) | ||
| { | ||
| let output_name = hidden_order_alias( | ||
| &order_text[i..function_end], |
There was a problem hiding this comment.
Preserve FILTER clauses when hiding aggregate sort keys
When wrapping is triggered and an ORDER BY key uses a filtered aggregate, e.g. ORDER BY year_total, SUM(o.revenue) FILTER (WHERE o.region = 'US'), this branch only captures the aggregate call through the closing ) and replaces it with _q.__yardstick_order_0. The remaining FILTER (...) is then emitted in the outer query after a plain column reference, while the hidden inner select contains the unfiltered aggregate, so the rewritten SQL is invalid or sorts by the wrong value. The hidden expression needs to include aggregate modifiers such as FILTER before replacing the sort key.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 9105e23: hidden aggregate ORDER BY expressions now include trailing aggregate modifiers before replacement, including FILTER (...), WITHIN GROUP (...), and OVER (...)/named windows. Added unit coverage for filtered aggregates and a SQL regression for ORDER BY year_total, SUM(o.year) FILTER (...).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9105e23c9c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| cursor = next; | ||
| } | ||
|
|
||
| while let Some(join_pos) = find_top_level_keyword(sql, "JOIN", cursor) { |
There was a problem hiding this comment.
Include comma-joined tables in wrapper qualifiers
When parser FROM extraction falls back for AGGREGATE ... AT syntax, this scanner records the first table and only subsequent JOIN targets, so comma joins like FROM sales_v s, salesdetails_v sd ... ORDER BY year_total, sd.product omit sd from the qualifier set. Wrapping is still triggered by year_total, but sd.product is left unchanged in the outer SELECT * FROM (...) _q ORDER BY ..., where sd is no longer in scope, causing a binder error for a valid tie-breaker query. The fallback needs to parse comma-separated table refs as well as explicit joins.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in affcb70: fallback FROM extraction now scans top-level comma-separated table refs as well as explicit JOIN targets, so comma-joined aliases are included in the ORDER BY wrapper qualifier set. Added unit and SQL regressions for the comma-join sd.product tie-breaker.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: affcb70cc1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| .unwrap_or_else(|| { | ||
| hidden_order_alias( | ||
| &order_text[i..name_end], | ||
| &mut hidden_selects, | ||
| &mut hidden_aliases, | ||
| &mut hidden_by_expr, | ||
| ) |
There was a problem hiding this comment.
Preserve scalar ORDER BY expressions when hiding sort keys
When wrapping is triggered and a secondary sort key is a scalar expression over a grouped dimension, this fallback hides only the qualified column it encounters instead of the whole expression. For example, SELECT LOWER(o.region) AS r, ... AS year_total ... GROUP BY LOWER(o.region) ORDER BY year_total, LOWER(o.region) gets an inner projection like o.region AS __yardstick_order_0; that raw column is not grouped, so the generated inner grouped query fails even though the original ORDER BY LOWER(o.region) is valid. The rewrite should capture/project the full scalar sort expression (or map it to an existing SELECT alias) rather than hiding the raw argument.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 5e9a31f: scalar ORDER BY function calls are now matched to aliased SELECT expressions before wrapping, and unmapped scalar sort expressions are hidden as the full expression rather than as a raw argument column. Added unit coverage for alias mapping, hidden scalar expressions, semicolon-terminated ORDER BY text, and a SQL regression for LOWER(o.region).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5e9a31f360
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return alias.clone(); | ||
| } | ||
|
|
||
| let alias = format!("__yardstick_order_{}", hidden_aliases.len()); |
There was a problem hiding this comment.
Avoid colliding hidden ORDER BY aliases
When wrapping needs a hidden sort key and the user already projects a column named __yardstick_order_0, this fixed alias collides with the real output. For example, a query selecting o.year AS __yardstick_order_0 and ordering by an unprojected grouped key will add another AS __yardstick_order_0, then the outer * EXCLUDE (__yardstick_order_0) can drop the user's column or make the wrapped query ambiguous. Generate helper names that cannot collide with existing SELECT outputs.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in dd100e2: hidden ORDER BY helper aliases now avoid existing SELECT output names before generating __yardstick_order_N. Added unit coverage for the collision case and a SQL regression that preserves a user-projected __yardstick_order_0 column.
| } | ||
| } | ||
|
|
||
| i >= bytes.len() || bytes[i] == b',' || bytes[i] == b';' |
There was a problem hiding this comment.
Hide full composite scalar sort expressions
Fresh evidence: the current completeness check accepts a function sort key only when the next token is comma/semicolon/end, so valid grouped expressions such as ORDER BY year_total, LOWER(o.region) || '' or LOWER(o.region) COLLATE NOCASE are not hidden as a full expression. In those cases wrapping is already triggered by year_total, but the scanner descends into the function and hides raw o.region, adding an ungrouped column to the inner grouped SELECT and causing a binder error even though the original ORDER BY expression is valid.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in dd100e2: scalar ORDER BY handling now captures the full top-level sort expression, excluding only ORDER BY direction/null ordering suffixes, before deciding whether to map or hide it. Added unit coverage for concatenated and collated scalar sort expressions plus a SQL regression for LOWER(o.region) || ''.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dd100e2385
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if let Some(output_name) = outputs.get(&name) { | ||
| out.push_str(output_name); | ||
| i = qualifier_end; | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Hide unqualified grouped ORDER BY keys
When wrapping is triggered, an unqualified secondary sort key that is valid only in the inner grouped query is left unchanged unless it already appears in outputs. For example, SELECT o.year, ... AS year_total FROM sales_v o GROUP BY o.year, o.region ORDER BY year_total, region is valid before wrapping because region is grouped, but this path emits an outer ORDER BY year_total, region while the inner projection exposes only year and year_total, so region no longer binds. Unqualified grouped keys need the same hidden-projection handling added for qualified keys.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 77ebcf3: unqualified ORDER BY items that are not projected outputs now get hidden as helper projections when wrapping, so grouped keys like region remain available to the outer ORDER BY. Added unit coverage for bare and composite unqualified sort keys plus a SQL regression for ORDER BY year_total, region.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 77ebcf3352
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| && bytes[j..j + 2].eq_ignore_ascii_case(b"AS") | ||
| && (j + 2 >= bytes.len() || bytes[j + 2].is_ascii_whitespace()) | ||
| { | ||
| let mut end = j + 2; | ||
| while end < bytes.len() && bytes[end].is_ascii_whitespace() { | ||
| end += 1; | ||
| } | ||
| last_alias_start = Some(end); |
There was a problem hiding this comment.
Treat comments as whitespace when finding subquery aliases
When a scalar-subquery select item has comments between the subquery, AS, and the alias (for example AGGREGATE(...) AT (...) AS/*comment*/year_total, which expands to (SELECT ...) AS/*comment*/year_total), this scanner only skips ASCII whitespace after the closing ) and after AS. SQL comments are whitespace, so the valid alias is either missed or parsed as AS; identifier_appears_in then never sees year_total, the wrapper is skipped, and DuckDB still hits the subquery-alias ORDER BY failure this change is meant to avoid.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in f23fdc2: subquery alias parsing now treats SQL comments as whitespace after the closing subquery, around AS, and before implicit aliases. Added unit coverage for block and line comment gaps plus a SQL regression for AGGREGATE(...) AT (...) AS/comment/year_total. This commit also switches parser FFI expression metadata reads to public DuckDB accessors to clear the next-binary protected-field CI failures.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f23fdc2661
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| hidden_order_alias( | ||
| &order_text[i..name_end], |
There was a problem hiding this comment.
Hide full qualified-leading sort expressions
Fresh evidence: the current qualified-reference branch still hides only o.region even when the ORDER BY item is a larger expression starting with that reference. When wrapping is triggered by a subquery alias, a valid grouped query such as SELECT o.region || '' AS r, ... AS year_total FROM sales_v o GROUP BY o.region || '' ORDER BY year_total, o.region || '' gets an inner helper projection o.region AS __yardstick_order_0; because the inner query groups by o.region || '' rather than raw o.region, that added select is not grouped and DuckDB will reject the rewritten SQL. This path should capture/map the whole top-level sort expression before falling back to hiding a raw qualified column.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 7d48314: qualified-leading ORDER BY items now capture the full top-level sort expression before falling back to a raw qualified column. If the expression is already projected, the wrapper orders by the projected alias; otherwise it adds the full expression as a hidden helper projection. Added unit coverage for mapped and hidden qualified-leading expressions plus a SQL regression for the qualified concatenation sort key. Verified with cargo test test_requalify_order_by_for_wrapper, cargo test test_subquery_alias_from_item_whitespace_variants, and make test.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7d4831488b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let mut i = skip_ws_and_comments(sql, start); | ||
| if i >= end || bytes[i] == b'(' { | ||
| return None; |
There was a problem hiding this comment.
Capture derived-table aliases in FROM fallback
When the parser path cannot parse the custom AGGREGATE ... AT SQL and this fallback is used, this early return skips any table reference that starts with (, so aliases for derived tables are never added to the qualifier set. In a query such as FROM sales_v s JOIN (SELECT ... ) sd ... ORDER BY year_total, sd.product, wrapping is triggered by year_total but sd.product is left unchanged in the outer SELECT ... FROM (...) _q ORDER BY ..., where sd is no longer in scope; the rewriter should consume the parenthesized table ref and record its alias.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 559c128: fallback FROM extraction now consumes parenthesized SELECT/WITH table refs and records the following alias, including derived tables introduced by JOIN. Added focused fallback coverage for JOIN (SELECT ...) sd and a SQL regression for ORDER BY year_total, sd.product through a derived table. Verified with cargo test test_extract_from_clause_info_fallback_join_with_at_syntax, cargo test test_requalify_order_by_for_wrapper, and make test.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 559c1283a8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let (expr, alias_clause) = split_item_expr_and_alias(item); | ||
| let output_name = if let Some(alias_clause) = alias_clause { | ||
| parse_alias_span(alias_clause, 0).map(|(_, alias)| alias) | ||
| } else { | ||
| extract_last_qualified_identifier_token(expr) | ||
| }; |
There was a problem hiding this comment.
Recognize implicit SELECT aliases before hiding sort keys
When wrapping is triggered for a subquery-backed alias, this output map only understands explicit AS aliases. For a valid query like SELECT o.region r, ... AS year_total FROM sales_v o GROUP BY o.region ORDER BY year_total, r, r is not recorded as an exposed output, so the requalifier treats the secondary sort key as unprojected and appends r AS __yardstick_order_0 to the inner SELECT. At that point r is only a SELECT-list alias, not an input column, so the rewritten grouped query can fail to bind even though the original ORDER BY r was valid.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 351710c: collect_order_by_wrapper_outputs now recognizes parser-confirmed implicit SELECT aliases and has a fallback for simple column aliases, so SELECT o.region r reserves r as an exposed output instead of adding r AS __yardstick_order_0 to the grouped inner query. Added a focused requalifier regression and a SQL regression for ORDER BY year_total, r. Verified with cargo test test_requalify_order_by_for_wrapper and make test.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 351710cb31
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let order_end = find_first_top_level_keyword( | ||
| &result_sql, | ||
| expr_start, | ||
| &["LIMIT", "OFFSET"], |
There was a problem hiding this comment.
Split FETCH before hiding ORDER BY keys
When a wrapped query uses a standard FETCH FIRST ... ROWS ONLY tail, this boundary search leaves the FETCH clause inside order_text because only LIMIT and OFFSET are split out. If wrapping is triggered and a secondary sort key must be hidden, e.g. ORDER BY year_total, region FETCH FIRST 10 ROWS ONLY, the rewriter treats region FETCH FIRST 10 ROWS ONLY as the sort expression and appends it to the inner SELECT as ... AS __yardstick_order_0, producing invalid SQL instead of preserving the row limit. Include FETCH as an ORDER BY tail boundary before requalifying/hiding sort keys.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 7b775fe: the ORDER BY wrapper now treats top-level FETCH as an order tail boundary alongside LIMIT and OFFSET, so hidden sort-key insertion only sees the sort expression and preserves FETCH FIRST ... ROWS ONLY outside the inner SELECT. Added a SQL regression for ORDER BY year_total, region FETCH FIRST 2 ROWS ONLY where region is a hidden sort key. Verified with cargo test test_requalify_order_by_for_wrapper and make test.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b775fea38
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if function_call_has_unmapped_qualified_ref( | ||
| expr, qualifiers, outputs, | ||
| ) { |
There was a problem hiding this comment.
Hide unqualified function sort keys
Fresh evidence: in the current code this branch only hides a scalar function sort expression when function_call_has_unmapped_qualified_ref(...) finds a table-qualified reference, and that helper ignores unqualified columns. When wrapping is triggered by a subquery alias, a valid grouped query such as SELECT o.year, ... AS year_total FROM sales_v o GROUP BY o.year, o.region ORDER BY year_total, LOWER(region) leaves the outer sort as LOWER(region) even though the wrapper subquery only exposes year and year_total, causing a bind error; unqualified function expressions over grouped but unprojected keys need the same hidden-projection handling as qualified ones.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in cab1241: scalar function ORDER BY items now detect unmapped unqualified identifiers and hide the full expression, so ORDER BY year_total, LOWER(region) becomes an outer sort on a hidden projected key instead of trying to bind region outside the wrapper. Projected aliases such as LOWER(r) are left untouched. Also moved FETCH FIRST coverage into the Rust boundary test because the Linux next image rejects FETCH syntax before the wrapper can be asserted. Verified with git diff --check, cargo test test_requalify_order_by_for_wrapper, and make test.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cab12410b5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if !is_order_item_expr_start(order_text, start) { | ||
| return None; |
There was a problem hiding this comment.
Handle parenthesized sort keys before hiding columns
When wrapping is triggered and a secondary sort key is parenthesized, the first identifier is no longer considered the start of the ORDER BY item, so the full expression is not captured. For example, ORDER BY year_total, (o.region || '') on a query grouped by o.region || '' falls through to hiding just o.region, adding an ungrouped raw column to the inner grouped SELECT even though the original parenthesized expression was valid.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 00de41c: parenthesized non-subquery ORDER BY items are now handled as whole sort expressions before the scanner descends into their identifiers, so a parenthesized grouped expression is hidden as the full expression instead of projecting a raw column such as o.region. Added unit coverage and a SQL regression for the parenthesized grouped expression case. Verified with git diff --check, cargo test test_requalify_order_by_for_wrapper, and make test.
| if let Some(next) = skip_quoted_sql_token(bytes, i, bytes.len()) { | ||
| i = next; | ||
| continue; |
There was a problem hiding this comment.
Inspect quoted identifiers inside function sort keys
In function ORDER BY expressions, double-quoted identifiers are skipped together with string literals, so an unprojected grouped key like LOWER("region") is not detected and hidden before the wrapper is built. When year_total triggers wrapping, the outer query is left with ORDER BY LOWER("region") even though the inner subquery only projects the visible SELECT columns, causing quoted grouped columns that were valid in the original ORDER BY to fail to bind.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 00de41c: function sort-key scanning now skips only single-quoted string literals and parses quoted identifiers, so LOWER("region") is detected as an unprojected grouped key and hidden before wrapping. Added unit coverage and a SQL regression for ORDER BY year_total, LOWER("region"). Verified with git diff --check, cargo test test_requalify_order_by_for_wrapper, and make test.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00de41c25c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| hidden_order_alias( | ||
| &order_text[i..name_end], |
There was a problem hiding this comment.
Hide whole unary sort expressions
When wrapping is triggered by a subquery alias and a secondary sort key starts with a unary operator, e.g. ORDER BY year_total, -o.year on a query grouped by -o.year but not projecting year, the scanner reaches o.year after the leading -, so the expression-start check above does not capture the whole sort item. This fallback then adds a hidden projection for raw o.year instead of -o.year, making the generated inner grouped query select an ungrouped column and fail to bind even though the original ORDER BY expression was valid.
Useful? React with 👍 / 👎.
Summary
Fixes #28. When
AGGREGATE(...) AT (...)expressions expand to correlated subqueries and are referenced by alias inORDER BY, DuckDB errors with "Alias has subquery - not yet supported." This wraps the inner query in a subquery (SELECT * FROM (...) _q ORDER BY ...) whenORDER BYreferences a subquery-aliased SELECT item, so aliases resolve as plain column references.The wrapping is targeted: it only activates when
ORDER BYactually references a subquery alias (via word-boundary matching), avoiding breakage for queries that use qualified column names likeo.prodNameinORDER BY.Adds 3 integration tests and 3 unit tests covering expression-based ORDER BY (
revenue/year_total), simple alias ORDER BY, and DESC ordering.