feat: capture SQL security/governance policy applications and definitions#86
Closed
lustefaniak wants to merge 7 commits into
Closed
Conversation
Snowflake CREATE TABLE may attach security/governance policies at the table
level: ROW ACCESS, AGGREGATION, JOIN, and (dynamic tables) STORAGE LIFECYCLE.
These were parsed and discarded; this captures them in a new TablePolicy
{kind, with, policy_name, columns} on CreateTable so column-level lineage can
surface which policy guards a table and over which columns.
Each kind selects its column-list keyword: ROW ACCESS / STORAGE LIFECYCLE use
ON (cols), AGGREGATION uses ENTITY KEY (cols), JOIN uses ALLOWED JOIN KEYS
(cols); all column lists are optional (GET_DDL omits ON when the caller lacks
privilege). The optional WITH prefix round-trips.
Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/create-table
https://docs.snowflake.com/en/sql-reference/sql/create-aggregation-policy
https://docs.snowflake.com/en/sql-reference/sql/create-join-policy
https://docs.snowflake.com/en/sql-reference/sql/create-dynamic-table
Fixes 3 corpus test failures (Snowflake).
Snowflake CREATE TABLE may attach governance tags at the table level via
[WITH] TAG (tag_name = 'value', ...). These were consumed and discarded; this
captures them in a new Tag {name, value} list (table_tags on CreateTable) so
lineage can surface which tags are applied to a table.
Tag names may be qualified (db.schema.tag); values are string literals. Both
the WITH-prefixed and bare forms parse and normalize to the canonical
`WITH TAG (k = 'v', ...)` on round-trip. Removed the redundant pre-loop tag
discarder that previously swallowed a leading WITH TAG before the clause loop.
Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/create-table
Column definitions in Snowflake CREATE TABLE may carry [WITH] TAG (k = 'v', ...) governance tags. These were consumed and discarded; this captures them in a new `tags: Vec<Tag>` field on ColumnDef so lineage can surface per-column tagging. Both the WITH-prefixed and bare forms parse and normalize to the canonical `WITH TAG (...)` rendered after the column options/policy. All existing ColumnDef constructions across the test suite gain the new (empty) field. Grammar per Snowflake docs: https://docs.snowflake.com/en/sql-reference/sql/create-table
Databricks attaches row filters and column masks via plain scalar UDFs:
CREATE TABLE ... WITH ROW FILTER <func> ON (cols)
<col> <type> MASK <func> [USING COLUMNS (<col>|<literal>, ...)]
ROW FILTER is captured as a new TablePolicyKind::RowFilter (uniform with the
Snowflake table policies; ON-list holds the function arguments). Column MASK is
upgraded from a bare ObjectName to a ColumnMask {function, using_columns} so the
USING COLUMNS arguments (other column names and/or constant literals) are
preserved for lineage — previously USING COLUMNS failed to parse.
Grammar per Databricks docs:
https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-row-filter
https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-column-mask
Fixes 2 corpus test failures (Databricks).
…TION/PROJECTION/JOIN)
Snowflake security/governance policy definitions share one shape:
CREATE [OR REPLACE] <KIND> POLICY [IF NOT EXISTS] <name>
AS ( [<arg> <type>, ...] ) RETURNS <type> -> <body>
[COMMENT = '...'] [EXEMPT_OTHER_POLICIES = { TRUE | FALSE }]
These previously fell through the generic CREATE skip-until-semicolon fallback,
discarding the masking/row-access condition. A new Statement::CreatePolicy
variant captures kind + name + typed signature + RETURNS type + the `-> body`
expression + trailing options. Parsing the body as a real Expr keeps any
subqueries/table references (e.g. an EXISTS lookup) visible to lineage.
Dispatched via maybe_parse before the generic fallback; the `AS` check makes it
revert for non-Snowflake shapes (BigQuery's `ROW ACCESS POLICY ... ON <table>`),
so those still fall back unchanged pending dedicated handling.
Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/create-masking-policy
https://docs.snowflake.com/en/sql-reference/sql/create-row-access-policy
https://docs.snowflake.com/en/sql-reference/sql/create-aggregation-policy
https://docs.snowflake.com/en/sql-reference/sql/create-projection-policy
https://docs.snowflake.com/en/sql-reference/sql/create-join-policy
Snowflake tag definitions previously fell through the generic CREATE fallback. A new Statement::CreateTag captures the tag name, the optional ALLOWED_VALUES string list, and trailing key=value options (COMMENT, PROPAGATE, ON_CONFLICT), so tag objects are represented in the AST alongside their applications. Grammar per Snowflake docs: https://docs.snowflake.com/en/sql-reference/sql/create-tag
BigQuery row-level security previously fell through the generic CREATE fallback, discarding the target table and the filter predicate. A new Statement::CreateRowAccessPolicy captures the policy name, the `ON <table>` target, the optional `GRANT TO (...)` principal list, and the `FILTER USING (<predicate>)` expression. Parsing the predicate as a real Expr keeps any subquery table references (e.g. a lookup-table IN-subquery) visible to lineage. Dispatched after the Snowflake policy attempt (whose `AS` check reverts for this `... ON <table>` shape), so Snowflake definitions are unaffected. Grammar per BigQuery docs: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language
Corpus Parsing ReportTotal: 191252 passed, 2055 failed (98.9% pass rate) ✨ No changes in test results By Dialect
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds parser/AST support for capturing security & governance policies — which masking/access/governance policies are applied (and to which columns), plus the policy definitions themselves. Previously these clauses were parsed-and-discarded or skipped by the generic
CREATEfallback, losing both the policy references and the lineage-bearing condition expressions.What's captured
CREATE TABLE applications
ROW ACCESS/AGGREGATION/JOIN/STORAGE LIFECYCLEpolicies → newTablePolicy { kind, with, policy_name, columns }.TAG (k = 'v', ...)→ newTag { name, value }(was discarded).WITH ROW FILTER f ON (cols)→TablePolicyKind::RowFilter; columnMASK f USING COLUMNS (...)→ColumnMask { function, using_columns }(USING COLUMNS previously failed to parse).Policy definitions (new
Statementvariants; bodies parsed as realExprso subqueries/table refs stay visible)CREATE [OR REPLACE] {MASKING|ROW ACCESS|AGGREGATION|PROJECTION|JOIN} POLICY ... AS (sig) RETURNS type -> body→Statement::CreatePolicy.CREATE TAG ... [ALLOWED_VALUES ...]→Statement::CreateTag.CREATE [OR REPLACE] ROW ACCESS POLICY ... ON <table> [GRANT TO (...)] FILTER USING (<predicate>)→Statement::CreateRowAccessPolicy.Every new construct is justified against the vendor grammar (doc links are in each commit body). Round-trips via
Display; the BigQuery form is dispatched after the Snowflake one (whoseAScheck reverts), so neither shadows the other.Validation
Follow-ups (later commits)
SET/UNSET POLICY|TAG, DatabricksSET/DROP ROW FILTER|MASK, T-SQLADD/DROP MASKED).CREATE POLICY+ RLS toggles; T-SQLCREATE SECURITY POLICY; BigQueryDROP ROW ACCESS POLICY.