Skip to content

Conversation

@kyleconroy
Copy link
Collaborator

No description provided.

claude added 29 commits January 1, 2026 18:22
Add checks for empty nested arrays in explainAliasedExpr to properly
render arrays containing empty subarrays as Function array instead of
Literal. This matches the behavior in explainLiteral.

The fix adds:
- Check for nested arrays that are empty or contain empty arrays
- containsEmptyArraysRecursive check for deeply nested empty arrays
- containsTuplesRecursive check for deeply nested tuples

This resolves 10+ failing explain tests across multiple test suites:
- 00909_arrayEnumerateUniq (10 statements)
- 00548_slice_of_nested
- 02699_polygons_sym_difference_rollup
- 02699_polygons_sym_difference_total_analyzer
Add readBinaryString() function to properly decode binary string
literals. Binary strings like b'0' should be decoded as byte values
where each bit (0 or 1) contributes to the resulting bytes.

For example:
- b'0' -> \0 (single bit 0, left-padded to 8 bits = 0x00)
- b'00110000' -> '0' (8 bits = ASCII 0x30)
- b'111001101011010110001011111010001010111110010101' -> '测试'

This fixes 9 failing explain tests in 02494_parser_string_binary_literal.
Two fixes in this commit:

1. Fix IS NOT DISTINCT FROM operator parsing:
   - Use <=> operator which maps to isNotDistinctFrom function
   - Use NOT_PREC for right side to correctly include lower-precedence operators like IN
   - IS DISTINCT FROM wraps in NOT(IS NOT DISTINCT FROM)

2. Add boolean support for IN list tuple literals:
   - Boolean literals in IN lists can now be combined into Literal Tuple_ format
   - Added allBooleansOrNull check to both explainInExpr functions

This fixes 9 statements in 02868_operator_is_not_distinct_from_priority
and 1 statement in 03214_join_on_tuple_comparison_elimination_bug.
Add checks for CAST expressions, binary expressions, and other non-literal
expressions when determining if an array should be rendered as Function array
vs Literal Array_. This ensures arrays containing CAST, function calls, etc.
are properly rendered while preserving correct behavior for:
- Simple arrays with primitive literals
- Arrays with negated numbers
- Nested arrays (handled separately)

This fixes 9 statements in 00502_sum_map and many other tests:
- 02916_analyzer_set_in_join (1 statement)
- 02708_dotProduct (2 statements)
- 02423_json_quote_float64 (2 statements)
- 02524_fuzz_and_fuss (1 statement)
- 00597_push_down_predicate_long (2 statements)
- 03727_concat_with_separator_subquery (3 statements)
Add handling for the STRICT modifier in asterisk and columns matcher
EXCEPT and REPLACE parsing functions. This ensures that queries like
`* EXCEPT STRICT i` and `* REPLACE STRICT i + 1 AS i` parse correctly.

This fixes 7 statements in 01470_columns_transformers:
- stmt12-15 (EXCEPT STRICT)
- stmt16-17, stmt22 (REPLACE STRICT)
The parser now handles RENAME DICTIONARY syntax in addition to RENAME TABLE.
This allows dictionary rename operations to be properly parsed and explained.

This fixes:
- 3 statements in 01155_rename_move_materialized_view
- 6 statements in 01191_rename_dictionary
- 3 statements in 02343_analyzer_column_transformers_strict (from STRICT fix)
Added parsing and EXPLAIN AST output for PostgreSQL-style DISTINCT ON
clause which specifies columns to determine row uniqueness.

Changes:
- Added DistinctOn field to SelectQuery AST node
- Added DISTINCT ON parsing in parseSelect() and parseFromSelectSyntax()
- Added DISTINCT ON output in explainSelectQuery (outputs Literal UInt64_1
  followed by ExpressionList of columns)

Fixed tests:
- 03363_hive_style_partition (3 statements)
- 01244_optimize_distributed_group_by_sharding_key (1 statement)
- 01952_optimize_distributed_group_by_sharding_key (8 statements)
Extended the lexer to properly handle PostgreSQL-style dollar-quoted strings
with custom tags like $doc$content$doc$. The lexer now:

1. Looks ahead to verify a matching closing tag exists before treating
   something as a dollar-quoted string
2. Falls back to identifier parsing for cases like $alias$name$ which are
   valid ClickHouse identifiers containing $ characters
3. Added peekCharN function to peek multiple characters ahead

This fixes all 8 pending statements in test 01948_heredoc.
Added parsing and EXPLAIN AST output for PARTITION ID 'value' syntax
in ALTER TABLE commands (ATTACH, DETACH, DROP, REPLACE, FETCH, FREEZE).

Changes:
- Added PartitionIsID field to AlterCommand AST node
- Parse PARTITION ID 'value' in parseAlterCommand for multiple partition operations
- Output Partition_ID format in explainAlterCommand when PartitionIsID is true

Fixed 8 statements in test 01166_truncate_multiple_partitions and many
other tests that use PARTITION ID syntax.
Skip the RECURSIVE keyword after WITH when parsing CTEs. The keyword is
handled by silently consuming it since the recursive behavior is
transparent at the AST/parsing level.

Fixed 8 statements in 03033_recursive_cte_basic and many other tests
using recursive CTEs.
Multiple dictionary-related fixes:

1. Added REPLACE DICTIONARY support (equivalent to CREATE OR REPLACE)
2. Added EXCHANGE DICTIONARIES support (similar to EXCHANGE TABLES)
3. Fixed key-value pair parsing for dictionary SOURCE clause
   - Values like TABLE test are now properly parsed
4. Fixed RENAME output to not include database names when not specified

Fixed 8 statements in 03173_check_cyclic_dependencies_on_create_and_rename
and many other dictionary-related tests.
- Add AlterMaterializeTTL constant to ast/ast.go
- Add MATERIALIZE TTL parsing after MATERIALIZE keyword in parser.go
- Fix explainAlterCommand to omit (children N) when count is 0

This fixes 01070_materialize_ttl and several other tests that use MATERIALIZE TTL.
Handle column names like ip4Map.value for nested columns in INSERT statements.
This allows parsing INSERT INTO table(id, column.subcolumn, ...) VALUES (...)
without requiring backticks around the dotted names.

Fixes many tests using nested column syntax.
- Parse FORMAT before SETTINGS in ALTER TABLE statements
- Output FORMAT identifier before Set in explain output

This matches ClickHouse's FORMAT Null SETTINGS ... syntax.
ClickHouse allows trailing commas before clauses like FROM, WHERE, etc.
For example: SELECT a, b, FROM table

Add isClauseKeyword function to detect when a token is a clause keyword
that should terminate an expression list, vs when it's being used as
an identifier (which ClickHouse allows for many keywords).

The detection is context-aware - keywords followed by (, [, or = are
treated as expression continuations rather than clause terminators.
Format QueryParameter with type as Name:Type to match ClickHouse output.
For example: QueryParameter filter:FixedString(2)

Fixes many parameterized view tests.
- Add PARALLEL token to lexer keywords
- Add ParallelWithQuery AST node for chaining statements with PARALLEL WITH
- Add parseParallelWith in parser to handle statement chaining
- Fix table expression alias handling to not consume PARALLEL when followed by WITH
- Fix ExistsTableQuery explain output to match ClickHouse format (space alignment for missing database)

Tests fixed:
- 03305_parallel_with (all statements)
- 03604_parallel_with_query_lock (all statements)
- 01048_exists_query (all statements)
- 00101_materialized_views_and_insert_without_explicit_database (exists-related)
- 01073_attach_if_not_exists
…explain

- Add Identifier handling in FormatDataType to properly format
  AggregateFunction types that contain function name identifiers
- Escape string literals in CAST explain output to properly handle
  null bytes and other control characters

Tests fixed:
- 02688_aggregate_states (all statements)
- 02477_single_value_data_string_regression
- 02689_meaningless_data_types
- 02731_nothing_deserialization
- 02885_arg_min_max_combinator
- 03011_definitive_guide_to_cast
- 03210_variant_with_aggregate_function_type
- 03254_normalize_aggregate_states_with_named_tuple_args
- 03411_iceberg_bucket
…eElement explain

- Add FillStaleness field to OrderByElement AST
- Parse STALENESS clause in ORDER BY WITH FILL
- Simplify explainOrderByElement to always use direct children (no FillModifier)
- Fix explainInterpolateElement to correctly output value OR column identifier

Tests fixed:
- 03266_with_fill_staleness (all statements)
- 03266_with_fill_staleness_cases
- 03266_with_fill_staleness_errors
- 00995_order_by_with_fill
- 02016_order_by_with_fill_monotonic_functions_removal
- 02112_with_fill_interval
- 02366_with_fill_date
- 02560_with_fill_int256_int
- 02561_with_fill_date_datetime_incompatible
- 02861_interpolate_alias_precedence
- 03043_group_array_result_is_expected
- 03093_with_fill_support_constant_expression
- Add AlterModifyOrderBy command type to AST
- Add OrderByExpr field to AlterCommand struct
- Parse MODIFY ORDER BY (expr, ...) syntax in ALTER
- Explain output wraps multiple expressions in tuple function

Tests fixed:
- 00754_alter_modify_order_by (all statements)
- 00754_alter_modify_order_by_replicated_zookeeper_long
- 00910_crash_when_distributed_modify_order_by
- 01526_alter_add_and_modify_order_zookeeper
- 01532_primary_key_without_order_by_zookeeper
- 02484_substitute_udf_storage_args
- 02710_allow_suspicious_indices
- 02863_interpolate_subquery
- 03020_order_by_SimpleAggregateFunction
- 03263_forbid_materialize_sort_key
- 03578_ttl_column_in_order_by_validation
Previously +Inf was being parsed as a unary plus function applied to
the Inf identifier, causing array literals containing +Inf to be
treated as function calls instead of literal arrays. Now +Inf and -Inf
are recognized as special Float64 infinity literals.
- Extended AttachQuery AST to support columns, engine, order by, and
  primary key clauses
- Added parsing for ATTACH TABLE with column definitions and ENGINE
  clause similar to CREATE TABLE
- Fixed PRIMARY KEY with multiple columns in CREATE TABLE to wrap in
  Function tuple
- Updated explain output for ATTACH TABLE to include columns and
  storage definitions
- Added MOVE PARTITION ... TO TABLE parsing in ALTER statements
- Added ToDatabase and ToTable fields to AlterCommand for destination
- Fixed OPTIMIZE TABLE to output database identifier for qualified names
Updated containsNonLiteralExpressions to accept unary minus of
literals (negative numbers) as literal-like expressions. This allows
nested arrays containing negative numbers to be formatted as
Literal Array_[Array_[...], ...] instead of Function array.
Previously only CONSTRAINT ... CHECK was supported. Now also supports
CONSTRAINT ... ASSUME which is used for query optimization hints.
SYSTEM STOP/START DISTRIBUTED SENDS and SYSTEM FLUSH DISTRIBUTED
commands now output the table name as both database and table in
EXPLAIN output, matching ClickHouse behavior.
When using SQL standard syntax like trim(LEADING '' FROM 'foo'),
ClickHouse simplifies this to just the literal 'foo' in EXPLAIN output.
Added SQLStandard field to FunctionCall to distinguish SQL standard
TRIM syntax from direct trimLeft/trimRight/trimBoth function calls.
Add containsCastExpressions function to check if array/tuple literals contain
CastExpr elements. This allows proper formatting of arrays like [1::UInt32, 2::UInt32]
as Function array nodes while keeping arrays with just negative numbers (like [-1, -2, -3])
formatted as string literals.

Fixes 01852_cast_operator_2 (6 statements) and related tests.
…mmands

- Add OnCluster and DuplicateTableOutput fields to SystemQuery AST
- Parse ON CLUSTER clause for SYSTEM commands (FLUSH DISTRIBUTED, etc.)
- For qualified table names (database.table), output identifiers twice
  in EXPLAIN to match ClickHouse's expected format

Fixes 01294_system_distributed_on_cluster (6 statements).
@kyleconroy kyleconroy merged commit b355de2 into main Jan 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants