Skip to content

feat: capture SQL security/governance policy applications and definitions#86

Closed
lustefaniak wants to merge 7 commits into
mainfrom
feature/qua-228-kernel-cll-capture-securitygovernance-policy-applications
Closed

feat: capture SQL security/governance policy applications and definitions#86
lustefaniak wants to merge 7 commits into
mainfrom
feature/qua-228-kernel-cll-capture-securitygovernance-policy-applications

Conversation

@lustefaniak

@lustefaniak lustefaniak commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Adds parser/AST support for capturing security & governance policies — which masking/access/governance policies are applied (and to which columns), plus the policy definitions themselves. Previously these clauses were parsed-and-discarded or skipped by the generic CREATE fallback, losing both the policy references and the lineage-bearing condition expressions.

What's captured

CREATE TABLE applications

  • Snowflake table-level ROW ACCESS / AGGREGATION / JOIN / STORAGE LIFECYCLE policies → new TablePolicy { kind, with, policy_name, columns }.
  • Snowflake table- and column-level TAG (k = 'v', ...) → new Tag { name, value } (was discarded).
  • Databricks WITH ROW FILTER f ON (cols)TablePolicyKind::RowFilter; column MASK f USING COLUMNS (...)ColumnMask { function, using_columns } (USING COLUMNS previously failed to parse).

Policy definitions (new Statement variants; bodies parsed as real Expr so subqueries/table refs stay visible)

  • Snowflake CREATE [OR REPLACE] {MASKING|ROW ACCESS|AGGREGATION|PROJECTION|JOIN} POLICY ... AS (sig) RETURNS type -> bodyStatement::CreatePolicy.
  • Snowflake CREATE TAG ... [ALLOWED_VALUES ...]Statement::CreateTag.
  • BigQuery CREATE [OR REPLACE] ROW ACCESS POLICY ... ON <table> [GRANT TO (...)] FILTER USING (<predicate>)Statement::CreateRowAccessPolicy.

Every new construct is justified against the vendor grammar (doc links are in each commit body). Round-trips via Display; the BigQuery form is dispatched after the Snowflake one (whose AS check reverts), so neither shadows the other.

Validation

  • Full unit suite green; new dialect tests assert the AST exposes policy/tag/table references.
  • Corpus: 0 regressions, +5 files now parse structurally (3 Snowflake JOIN/AGGREGATION policy, 2 Databricks ROW FILTER/MASK); Snowflake/BigQuery definition files that previously only "parsed" via the skip-fallback now produce real AST.

Follow-ups (later commits)

  • ALTER applications (Snowflake SET/UNSET POLICY|TAG, Databricks SET/DROP ROW FILTER|MASK, T-SQL ADD/DROP MASKED).
  • Redshift RLS/MASKING policy DDL; Postgres CREATE POLICY + RLS toggles; T-SQL CREATE SECURITY POLICY; BigQuery DROP ROW ACCESS POLICY.

Snowflake CREATE TABLE may attach security/governance policies at the table
level: ROW ACCESS, AGGREGATION, JOIN, and (dynamic tables) STORAGE LIFECYCLE.
These were parsed and discarded; this captures them in a new TablePolicy
{kind, with, policy_name, columns} on CreateTable so column-level lineage can
surface which policy guards a table and over which columns.

Each kind selects its column-list keyword: ROW ACCESS / STORAGE LIFECYCLE use
ON (cols), AGGREGATION uses ENTITY KEY (cols), JOIN uses ALLOWED JOIN KEYS
(cols); all column lists are optional (GET_DDL omits ON when the caller lacks
privilege). The optional WITH prefix round-trips.

Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/create-table
https://docs.snowflake.com/en/sql-reference/sql/create-aggregation-policy
https://docs.snowflake.com/en/sql-reference/sql/create-join-policy
https://docs.snowflake.com/en/sql-reference/sql/create-dynamic-table

Fixes 3 corpus test failures (Snowflake).
Snowflake CREATE TABLE may attach governance tags at the table level via
[WITH] TAG (tag_name = 'value', ...). These were consumed and discarded; this
captures them in a new Tag {name, value} list (table_tags on CreateTable) so
lineage can surface which tags are applied to a table.

Tag names may be qualified (db.schema.tag); values are string literals. Both
the WITH-prefixed and bare forms parse and normalize to the canonical
`WITH TAG (k = 'v', ...)` on round-trip. Removed the redundant pre-loop tag
discarder that previously swallowed a leading WITH TAG before the clause loop.

Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/create-table
Column definitions in Snowflake CREATE TABLE may carry [WITH] TAG (k = 'v', ...)
governance tags. These were consumed and discarded; this captures them in a new
`tags: Vec<Tag>` field on ColumnDef so lineage can surface per-column tagging.

Both the WITH-prefixed and bare forms parse and normalize to the canonical
`WITH TAG (...)` rendered after the column options/policy. All existing
ColumnDef constructions across the test suite gain the new (empty) field.

Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/create-table
Databricks attaches row filters and column masks via plain scalar UDFs:
  CREATE TABLE ... WITH ROW FILTER <func> ON (cols)
  <col> <type> MASK <func> [USING COLUMNS (<col>|<literal>, ...)]

ROW FILTER is captured as a new TablePolicyKind::RowFilter (uniform with the
Snowflake table policies; ON-list holds the function arguments). Column MASK is
upgraded from a bare ObjectName to a ColumnMask {function, using_columns} so the
USING COLUMNS arguments (other column names and/or constant literals) are
preserved for lineage — previously USING COLUMNS failed to parse.

Grammar per Databricks docs:
https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-row-filter
https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-column-mask

Fixes 2 corpus test failures (Databricks).
…TION/PROJECTION/JOIN)

Snowflake security/governance policy definitions share one shape:
  CREATE [OR REPLACE] <KIND> POLICY [IF NOT EXISTS] <name>
    AS ( [<arg> <type>, ...] ) RETURNS <type> -> <body>
    [COMMENT = '...'] [EXEMPT_OTHER_POLICIES = { TRUE | FALSE }]

These previously fell through the generic CREATE skip-until-semicolon fallback,
discarding the masking/row-access condition. A new Statement::CreatePolicy
variant captures kind + name + typed signature + RETURNS type + the `-> body`
expression + trailing options. Parsing the body as a real Expr keeps any
subqueries/table references (e.g. an EXISTS lookup) visible to lineage.

Dispatched via maybe_parse before the generic fallback; the `AS` check makes it
revert for non-Snowflake shapes (BigQuery's `ROW ACCESS POLICY ... ON <table>`),
so those still fall back unchanged pending dedicated handling.

Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/create-masking-policy
https://docs.snowflake.com/en/sql-reference/sql/create-row-access-policy
https://docs.snowflake.com/en/sql-reference/sql/create-aggregation-policy
https://docs.snowflake.com/en/sql-reference/sql/create-projection-policy
https://docs.snowflake.com/en/sql-reference/sql/create-join-policy
Snowflake tag definitions previously fell through the generic CREATE fallback.
A new Statement::CreateTag captures the tag name, the optional ALLOWED_VALUES
string list, and trailing key=value options (COMMENT, PROPAGATE, ON_CONFLICT),
so tag objects are represented in the AST alongside their applications.

Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/create-tag
BigQuery row-level security previously fell through the generic CREATE fallback,
discarding the target table and the filter predicate. A new
Statement::CreateRowAccessPolicy captures the policy name, the `ON <table>`
target, the optional `GRANT TO (...)` principal list, and the
`FILTER USING (<predicate>)` expression. Parsing the predicate as a real Expr
keeps any subquery table references (e.g. a lookup-table IN-subquery) visible to
lineage.

Dispatched after the Snowflake policy attempt (whose `AS` check reverts for this
`... ON <table>` shape), so Snowflake definitions are unaffected.

Grammar per BigQuery docs:
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

Corpus Parsing Report

Total: 191252 passed, 2055 failed (98.9% pass rate)

✨ No changes in test results

By Dialect

Dialect Passed Failed Total Pass Rate Delta
ansi 511 69 580 88.1% -
athena 37 1 38 97.4% -
bigquery 42295 114 42409 99.7% -
clickhouse 2488 109 2597 95.8% -
databricks 2852 212 3064 93.1% +2
doris 22 18 40 55.0% -
dremio 27 0 27 100.0% -
duckdb 1124 45 1169 96.2% -
exasol 54 7 61 88.5% -
fabric 6 0 6 100.0% -
generic 17 38 55 30.9% -
hive 35 10 45 77.8% -
materialize 6 14 20 30.0% -
mssql 2301 482 2783 82.7% -
mysql 151 37 188 80.3% -
oracle 1025 380 1405 73.0% -
postgres 1180 116 1296 91.0% -
presto 55 8 63 87.3% -
redshift 40428 60 40488 99.9% -
singlestore 141 9 150 94.0% -
snowflake 94730 151 94881 99.8% +3
spark 90 20 110 81.8% -
sqlite 51 16 67 76.1% -
starrocks 29 4 33 87.9% -
teradata 23 20 43 53.5% -
trino 1409 81 1490 94.6% -
tsql 165 34 199 82.9% -

@lustefaniak lustefaniak changed the title feat: capture security/governance policy applications & definitions (QUA-228 Phase 1) feat: capture SQL security/governance policy applications and definitions Jun 1, 2026
@lustefaniak lustefaniak closed this Jun 1, 2026
@lustefaniak lustefaniak deleted the feature/qua-228-kernel-cll-capture-securitygovernance-policy-applications branch June 1, 2026 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant