-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-52709][SQL] Fix parsing of STRUCT<> #51480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-52709][SQL] Fix parsing of STRUCT<> #51480
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ManosGEM Thanks for working on this issue. I have left a few comments on the PR. Some of them are just to make code more durable to future errors of this type. Please feel free to ping me for another round of review when you go through them.
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
Outdated
Show resolved
Hide resolved
@@ -1979,4 +1979,49 @@ class PlanParserSuite extends AnalysisTest { | |||
assert(unresolvedRelation2.options == CaseInsensitiveStringMap.empty) | |||
assert(unresolvedRelation2.isStreaming) | |||
} | |||
|
|||
test("SPARK-52709: Parsing of empty and nested Struct<> followed by shiftRight operator") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, if we are adding tests to specific files, we try to reuse as much boiler plate code that is there. Could you please rewrite this test to have similar structure as the tests above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By this, I mean to reuse comparePlans
. Also, I would say we can move this test to SparkSqlParserSuite
as the point of this test is to enable parsing of different queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try my best to rewrite it in the manner you propose.
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
Outdated
Show resolved
Hide resolved
nit: We usually like to name the PR the same way the ticket is named on Jira. Could you please align the PR name with Jira? |
remove the word "null" then? Because the only other difference is the [SQL] tag that I added according to the guidelines. |
Yeah, I meant the main name of the PR, the tags are all good. |
This commit addresses a parsing issue where the `STRUCT<>` data type was incorrectly handled, especially when appearing before operators like the bitwise shift (`>>`). The problem stemmed from the parser misinterpreting the angle brackets of `STRUCT<>` as the 'not equal' operator (`<>`), leading to syntax errors. The fix ensures `STRUCT<>` is correctly recognized as a data type. A new test case in `PlanParserSuite.scala` confirms that queries with `CAST(null AS STRUCT<>)`, nested structs, and `>>` now parse correctly.
88f22d9
to
81f7573
Compare
sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
Outdated
Show resolved
Hide resolved
…ta types. This commit follows SPARK-52709 by: - Removing from parsing rules. - Relocating relevant parser tests from to . - Refactoring the test setup for complex data types into a reusable helper function. - Adding comprehensive tests for valid , nested , , and their combinations to . - Adding negative tests for invalid empty and types to confirm correct behavior.
81f7573
to
76c0544
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @MaxGekk @cloud-fan Could you please have a last review/merge.
good catch! merging to master! |
@ManosGEM Can you open a branch 4.0 backport PR? thanks! |
@cloud-fan Everything should be the same in the PR but open it on the branch-4.0 ? Sorry but this is actually my first PR on Spark. |
It has merge conflicts, you will need to git cherry-pick this commit against branch-4.0 locally first, resolve merge conflicts, and open a new PR against branch-4.0 |
What changes were proposed in this pull request?
This PR fixes an issue in Spark SQL's parser where empty or nested STRUCT<> types cause incorrect parenthesis tracking and parsing failures. Previously, the parser increased the parenthesis depth counter upon encountering the keyword STRUCT. Due to the operator precedence in the lexer (e.g., NEQ is matched before LT), a construct like STRUCT<> could incorrectly be tokenized as STRUCT and NEQ. This caused the parser to increase the nesting counter without ever decreasing it, eventually resulting in a syntax error.
For example, the following valid query fails under the current logic:
SELECT cast(null as STRUCT<>), 2 >> 1;
To fix this, we adjusted the definition of the NEQ token in the SQL lexer so that it no longer matches <> when used in a complex data type. This ensures that the parser correctly interprets the angle brackets as part of a type specification rather than as a comparison operator.
Why are the changes needed?
bug fix.
By modifying the NEQ token rule to avoid incorrectly matching <> in this context, we ensure that:
This change improves Spark SQL’s compatibility with standard SQL syntax and user expectations.
Does this PR introduce any user-facing change?
No
How was this patch tested?
A new test case has been added to PlanParserSuite.scala to specifically verify that queries containing CAST(null AS STRUCT<>), nested STRUCT<> (if applicable), and the >> operator now parse successfully into the expected logical plan.
Was this patch authored or co-authored using generative AI tooling?
No
Closes SPARK-52709
Reported by : @mihailom-db