Skip to content

Commit 0d6bdce

Browse files
vladimirg-dbdongjoon-hyun
authored andcommitted
[SPARK-51113][SQL] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE
### What changes were proposed in this pull request? Fix correctness with UNION/EXCEPT/INTERSECT inside a view or `EXECUTE IMMEDIATE`. In the following examples the SQL Parser considers UNION/EXCEPT/INTERSECT keywords as aliases and drops the rest of the query: ``` spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4") spark.sql("SELECT * FROM v1").show() spark.sql("SELECT * FROM v1").queryExecution.analyzed spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 EXCEPT SELECT 2 EXCEPT SELECT 1 EXCEPT SELECT 2") spark.sql("SELECT * FROM v1").show() spark.sql("SELECT * FROM v1").queryExecution.analyzed spark.sql("CREATE OR REPLACE VIEW t1 AS SELECT 1 AS col1 INTERSECT SELECT 1 INTERSECT SELECT 2 INTERSECT SELECT 2") spark.sql("SELECT * FROM v1").show() spark.sql("SELECT * FROM v1").queryExecution.analyzed spark.sql("DECLARE v INT") spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO v") spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO v").queryExecution.analyzed spark.sql("SELECT v").show() ``` ![image](https://github.com/user-attachments/assets/ef726178-2375-4ebc-a7e3-88f1991d1016) ![image](https://github.com/user-attachments/assets/50b4b7ba-bc7d-4fc1-a921-f4cbfcab79a3) ![image](https://github.com/user-attachments/assets/85b65325-5dd9-4d74-b46d-8ea203ce1039) ![image](https://github.com/user-attachments/assets/c53c5e02-18c6-4e30-b834-94af619190c5) There's no correctness issue associated with regular queries (without the `VIEW` or `EXECUTE IMMEDIATE`). Apparently that's because we use `ParserInterface.parsePlan` (`singleStatement` term in Spark SQL grammar) for [regular queries](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/core/src/main/scala/org/apache/spark/sql/classic/SparkSession.scala#L490) and `ParserInterface.parseQuery` (`query` term in Spark SQL grammar) for [view bodies](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L986) and [EXECUTE IMMEDIATE with INTO](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/executeImmediate.scala#L167). The difference is that `singleStatement` [ends in EOF](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L144). Not sure what's the actual root cause, because I don't know much about the SQL Parser. ### Why are the changes needed? Correctness issue fix. ### Does this PR introduce _any_ user-facing change? Yes, the results of queries on top of aforementioned views are gonna be correct. ### How was this patch tested? New `view-correctness` suite. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49937 from vladimirg-db/vladimirg-db/fix-views-with-trivial-unions-2. Authored-by: Vladimir Golubev <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent e92e12a commit 0d6bdce

File tree

10 files changed

+4375
-3
lines changed

10 files changed

+4375
-3
lines changed

sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4

+4
Original file line numberDiff line numberDiff line change
@@ -475,6 +475,10 @@ commentSpec
475475
: COMMENT stringLit
476476
;
477477

478+
singleQuery
479+
: query EOF
480+
;
481+
478482
query
479483
: ctes? queryTerm queryOrganization
480484
;

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AbstractSqlParser.scala

+13-3
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ import org.apache.spark.sql.catalyst.parser.ParserUtils.withOrigin
2424
import org.apache.spark.sql.catalyst.plans.logical.{CompoundPlanStatement, LogicalPlan}
2525
import org.apache.spark.sql.catalyst.trees.Origin
2626
import org.apache.spark.sql.errors.QueryParsingErrors
27+
import org.apache.spark.sql.internal.SQLConf
2728

2829
/**
2930
* Base class for all ANTLR4 [[ParserInterface]] implementations.
@@ -72,9 +73,18 @@ abstract class AbstractSqlParser extends AbstractParser with ParserInterface {
7273
/** Creates LogicalPlan for a given SQL string of query. */
7374
override def parseQuery(sqlText: String): LogicalPlan =
7475
parse(sqlText) { parser =>
75-
val ctx = parser.query()
76-
withErrorHandling(ctx, Some(sqlText)) {
77-
astBuilder.visitQuery(ctx)
76+
if (!SQLConf.get.getConf(SQLConf.LEGACY_PARSE_QUERY_WITHOUT_EOF)) {
77+
val ctx = parser.singleQuery()
78+
79+
withErrorHandling(ctx, Some(sqlText)) {
80+
astBuilder.visitSingleQuery(ctx)
81+
}
82+
} else {
83+
val ctx = parser.query()
84+
85+
withErrorHandling(ctx, Some(sqlText)) {
86+
astBuilder.visitQuery(ctx)
87+
}
7888
}
7989
}
8090

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

+4
Original file line numberDiff line numberDiff line change
@@ -670,6 +670,10 @@ class AstBuilder extends DataTypeAstBuilder
670670
* ******************************************************************************************** */
671671
protected def plan(tree: ParserRuleContext): LogicalPlan = typedVisit(tree)
672672

673+
override def visitSingleQuery(ctx: SingleQueryContext): LogicalPlan = withOrigin(ctx) {
674+
visitQuery(ctx.query())
675+
}
676+
673677
/**
674678
* Create a top-level plan with Common Table Expressions.
675679
*/

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+12
Original file line numberDiff line numberDiff line change
@@ -5520,6 +5520,18 @@ object SQLConf {
55205520
.booleanConf
55215521
.createWithDefault(true)
55225522

5523+
val LEGACY_PARSE_QUERY_WITHOUT_EOF = buildConf("spark.sql.legacy.parseQueryWithoutEof")
5524+
.internal()
5525+
.doc(
5526+
"When set to true, ParserInterface#parseQuery(...) is going to use base `query` grammar " +
5527+
"term without EOF resulting in some queries (like `SELECT 1 UNION SELECT 2`) to be parsed " +
5528+
"incorrectly - `UNION` will be treated as an alias, and the rest of SQL input will be " +
5529+
"thrown away."
5530+
)
5531+
.version("4.0.0")
5532+
.booleanConf
5533+
.createWithDefault(false)
5534+
55235535
/**
55245536
* Holds information about keys that have been deprecated.
55255537
*

0 commit comments

Comments
 (0)