Skip to content

Conversation

@amaannawab923
Copy link
Contributor

@amaannawab923 amaannawab923 commented Nov 17, 2025

SUMMARY

This PR adds SQLGlot-based SQL transpilation for AG Grid filter clauses, enabling cross-database compatibility.

Problem: AG Grid's server-side filtering generates SQL clauses using PostgreSQL-style syntax (e.g., ILIKE for case-insensitive matching, '' for quote escaping). When the
target database is MySQL, Presto, Snowflake, or another dialect, these clauses may fail or behave incorrectly.

Solution: Transpile filter SQL from generic syntax to the target database dialect using SQLGlot.

Key changes:

  1. transpile_to_dialect() function (superset/sql/parse.py): Parses SQL with SQLGlot's generic dialect and regenerates it with the correct syntax for the target database:
    - Quote escaping: 'O''Hara' → 'O'Hara' (for Snowflake, BigQuery, Databricks)
    - Operator conversion: ILIKE '%test%' → LOWER(col) LIKE LOWER('%test%') (for MySQL, Presto)
  2. transpile_to_dialect flag (QueryObjectExtras): Opt-in boolean that triggers transpilation before the existing sanitize_clause() validation.
  3. AG Grid integration (buildQuery.ts): Automatically enables transpilation when AG Grid filters are applied.
  4. Client-side escaping (stateConversion.ts): Escapes single quotes in filter values before sending to backend.

Supported dialects: PostgreSQL, MySQL, SQLite, Snowflake, BigQuery, Databricks, Presto, Trino, MSSQL, and others in SQLGLOT_DIALECTS.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@amaannawab923 amaannawab923 marked this pull request as draft November 17, 2025 11:02
@dosubot dosubot bot added the change:backend Requires changing the backend label Nov 17, 2025
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've completed my review and didn't find any issues.

Files scanned
File Path Reviewed
superset/utils/sql_sanitizer.py
superset-frontend/packages/superset-ui-core/src/query/types/Query.ts
superset-frontend/plugins/plugin-chart-ag-grid-table/src/buildQuery.ts
superset/common/query_object.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

@codecov
Copy link

codecov bot commented Nov 17, 2025

Codecov Report

❌ Patch coverage is 86.66667% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.14%. Comparing base (cac6ffc) to head (85b13a3).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
superset/common/query_object.py 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #36136       +/-   ##
===========================================
+ Coverage        0   68.14%   +68.14%     
===========================================
  Files           0      632      +632     
  Lines           0    46464    +46464     
  Branches        0     5037     +5037     
===========================================
+ Hits            0    31664    +31664     
- Misses          0    13537    +13537     
- Partials        0     1263     +1263     
Flag Coverage Δ
hive 43.78% <13.33%> (?)
mysql 67.24% <20.00%> (?)
postgres 67.29% <20.00%> (?)
presto 47.38% <20.00%> (?)
python 68.11% <86.66%> (?)
sqlite 66.91% <20.00%> (?)
unit 100.00% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…sary tests & print statements from the test file
@amaannawab923 amaannawab923 marked this pull request as ready for review November 17, 2025 15:37
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've completed my review and didn't find any issues.

Files scanned
File Path Reviewed
superset/utils/sql_sanitizer.py
superset-frontend/packages/superset-ui-core/src/query/types/Query.ts
superset-frontend/plugins/plugin-chart-ag-grid-table/src/buildQuery.ts
superset/common/query_object.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X


if use_sqlglot_escaping:
database_backend = database.db_engine_spec.engine
clause = sanitize_sql_with_sqlglot(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also handle the case where the database_backend is for some reason undefined or None?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces an opt-in SQLGlot-based SQL sanitization mechanism for WHERE and HAVING filter clauses in AG Grid queries. The solution provides database-agnostic SQL escaping and security validation to prevent SQL injection attacks through the extras parameter.

  • Adds a new sanitize_sql_with_sqlglot() function that parses SQL into an AST, validates for dangerous constructs (subqueries, set operations, DDL/DML), and regenerates with database-specific escaping
  • Integrates the sanitizer into query_object.py with an opt-in use_sqlglot_escaping flag
  • Enables SQLGlot escaping by default for AG Grid table queries with server-side filtering

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
superset/utils/sql_sanitizer.py New module implementing SQLGlot-based SQL sanitization with dialect mapping and security validation
superset/common/query_object.py Integration of SQLGlot sanitizer into filter sanitization pipeline with opt-in flag support
superset-frontend/plugins/plugin-chart-ag-grid-table/src/buildQuery.ts Enables use_sqlglot_escaping flag for AG Grid download queries with SQL clauses
superset-frontend/packages/superset-ui-core/src/query/types/Query.ts Adds TypeScript type definition for use_sqlglot_escaping boolean flag in query extras
tests/unit_tests/utils/sql_sanitizer_escaping_test.py Comprehensive test suite covering database-specific escaping, dangerous construct blocking, and edge cases
Comments suppressed due to low confidence (1)

superset/common/query_object.py:370

  • The SQLGlot sanitization result is assigned to clause but then immediately passed to the original sanitize_clause() function which performs essentially the same parsing and regeneration. This means the SQL is being parsed and regenerated twice, which is redundant and inefficient.

Since sanitize_sql_with_sqlglot already performs database-specific SQL parsing and regeneration (which is what sanitize_clause does), you should either:

  1. Skip calling sanitize_clause() when use_sqlglot_escaping is enabled, OR
  2. Only call sanitize_clause() and skip the SQLGlot sanitization when they're disabled

The current implementation performs duplicate work and may cause the clause to differ from what SQLGlot sanitized.

                    if use_sqlglot_escaping:
                        database_backend = database.db_engine_spec.engine
                        clause = sanitize_sql_with_sqlglot(
                            clause,
                            database_backend,
                            validate_structure=True,
                        )

                    engine = database.db_engine_spec.engine
                    sanitized_clause = sanitize_clause(clause, engine)
                    if sanitized_clause != clause:
                        self.extras[param] = sanitized_clause

@amaannawab923
Copy link
Contributor Author

Thanks for the review @sadpandajoe !!

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaannawab923 can you give more context on what problem we're trying to solve here?

I don't think we should try to figure out the meaning of adhoc SQL provided by users. If users provide adhoc SQL it should be valid; they should quote things correctly according to the DB dialect. In the past we had lenient parsers and that led to multiple security issues.

Can you give an example of where this would be needed?

I also don't think we need to worry about DML and other attacks in here. We should first build the whole query with whatever adhoc filters the user provided, and only then validate that it only queries tables the user has access to, is a single statement, and only includes DML if the database allows. We already do all of that, so we shouldn't need to do it here.

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks for the awesome work! One small nit, you can use r" strings in Python so that you don't need to double the backslash, it makes it easier to understand what's happening — otherwise people might think the database is adding the two backslashes instead of 1.

# SQLite - SQL-92 standard double single quotes
("name = 'O''Hara'", "sqlite", "name = 'O''Hara'"),
# Snowflake - backslash escaping
("name = 'O''Hara'", "snowflake", "name = 'O\\'Hara'"),
Copy link
Member

@betodealmeida betodealmeida Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, you can use r" strings here and in other places to make it easier to read:

Suggested change
("name = 'O''Hara'", "snowflake", "name = 'O\\'Hara'"),
("name = 'O''Hara'", "snowflake", r"name = 'O\'Hara'"),

@geido geido merged commit 186693b into apache:master Nov 25, 2025
70 checks passed
LuisSanchez pushed a commit to LuisSanchez/superset that referenced this pull request Nov 25, 2025
aminghadersohi pushed a commit to aminghadersohi/superset that referenced this pull request Nov 25, 2025
kshi020302 pushed a commit to jl141/superset that referenced this pull request Nov 30, 2025
sadpandajoe pushed a commit that referenced this pull request Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants