Skip to content

Snowflake dialect support for Unparser #21592

@yonatan-sevenai

Description

@yonatan-sevenai

Is your feature request related to a problem or challenge?

The SQL unparser has no Snowflake dialect. Snowflake diverges from standard SQL in ways that go beyond identifier quoting and keyword support - certain logical plan nodes require structurally different SQL output.

The most impactful example is UNNEST. Snowflake has no UNNEST keyword. Its equivalent is LATERAL FLATTEN(INPUT => expr), a table function in the FROM clause whose output is accessed via alias."VALUE". This is not a syntax substitution - the generated SQL needs a different structure (a FROM-clause table factor with a CROSS JOIN, instead of a SELECT-clause expression).

Other Snowflake-specific behaviors that affect unparsing:

  • No support for empty select lists (SELECT FROM t is invalid)
  • Column aliases in table alias definitions are silently ignored in join contexts
  • Timestamp cast types differ from the default (TIMESTAMP_TZ vs TIMESTAMP)

CustomDialectBuilder cannot express these - FLATTEN requires new SQL generation logic, not just configuration knobs.

Currently, users targeting Snowflake must post-process generated SQL or avoid plan shapes involving UNNEST entirely.

Describe the solution you'd like

A SnowflakeDialect for the SQL unparser that produces valid Snowflake SQL from DataFusion logical plans. Specifically:

Dialect settings: Double-quote identifier quoting, NULLS FIRST/NULLS LAST support, no empty select lists, no column aliases in table aliases, Snowflake-appropriate timestamp cast types.

UNNESTLATERAL FLATTEN translation: When the logical plan contains UNNEST, the unparser should emit LATERAL FLATTEN(INPUT => expr, OUTER => bool) in the FROM clause and rewrite SELECT references to the FLATTEN output column (alias."VALUE").

This translation should handle the plan shapes that the DataFusion optimizer produces, including:

  • Inline array sources (UNNEST([1,2,3])) and table column sources
  • Limit or Sort nodes inserted between the outer Projection and the Unnest by the optimizer
  • SubqueryAlias nodes wrapping the unnest source (e.g. from virtual/passthrough tables)
  • Outer projections that apply functions to the unnest output (e.g. CAST(unnest_col AS INT))
  • Outer projections with multiple expressions referencing the unnest output

Struct-type unnest should be rejected with a clear error - Snowflake FLATTEN expands array elements into rows (vertical), while DataFusion expands struct fields into columns (horizontal). These are incompatible semantics.

Describe alternatives you've considered

  • String-level post-processing of generated SQL to rewrite UNNEST(...) to LATERAL FLATTEN(...). Fragile - doesn't handle the structural difference between a SELECT expression and a FROM-clause table factor.

  • Extending unnest_as_table_factor (BigQuery path) to cover Snowflake. FLATTEN's syntax (INPUT => expr, OUTER => bool) and output model (alias."VALUE") are structurally different from BigQuery's UNNEST(expr) table factor. Conflating them would complicate both code paths.

  • CustomDialectBuilder with existing knobs. Insufficient — there is no existing configuration option for FLATTEN-style rendering, and the translation requires plan-aware SQL restructuring that configuration alone cannot express.

Additional context

PR coming.

This would be the first dialect that requires the unparser to emit a fundamentally different SQL structure for a core plan node (UNNEST), rather than just varying syntax. Tried to design in a way that can support how future dialects with similar structural divergences could plug in.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions