Skip to content

[SPARK-52709][SQL] Fix parsing of STRUCT<> #51480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

ManosGEM
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes an issue in Spark SQL's parser where empty or nested STRUCT<> types cause incorrect parenthesis tracking and parsing failures. Previously, the parser increased the parenthesis depth counter upon encountering the keyword STRUCT. Due to the operator precedence in the lexer (e.g., NEQ is matched before LT), a construct like STRUCT<> could incorrectly be tokenized as STRUCT and NEQ. This caused the parser to increase the nesting counter without ever decreasing it, eventually resulting in a syntax error.
For example, the following valid query fails under the current logic:
SELECT cast(null as STRUCT<>), 2 >> 1;

To fix this, we adjusted the definition of the NEQ token in the SQL lexer so that it no longer matches <> when used in a complex data type. This ensures that the parser correctly interprets the angle brackets as part of a type specification rather than as a comparison operator.

Why are the changes needed?

bug fix.
By modifying the NEQ token rule to avoid incorrectly matching <> in this context, we ensure that:

  • Empty STRUCT types like STRUCT<> are parsed correctly.
    
  • Nested and complex STRUCT types are supported without breaking parsing logic.
    
  • Queries with complex data types and bitwise operations (e.g., >>), are no longer broken due to incorrect token handling.
    

This change improves Spark SQL’s compatibility with standard SQL syntax and user expectations.

Does this PR introduce any user-facing change?

No

How was this patch tested?

A new test case has been added to PlanParserSuite.scala to specifically verify that queries containing CAST(null AS STRUCT<>), nested STRUCT<> (if applicable), and the >> operator now parse successfully into the expected logical plan.

Was this patch authored or co-authored using generative AI tooling?

No

Closes SPARK-52709
Reported by : @mihailom-db

@github-actions github-actions bot added the SQL label Jul 14, 2025
Copy link
Contributor

@mihailom-db mihailom-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ManosGEM Thanks for working on this issue. I have left a few comments on the PR. Some of them are just to make code more durable to future errors of this type. Please feel free to ping me for another round of review when you go through them.

@@ -1979,4 +1979,49 @@ class PlanParserSuite extends AnalysisTest {
assert(unresolvedRelation2.options == CaseInsensitiveStringMap.empty)
assert(unresolvedRelation2.isStreaming)
}

test("SPARK-52709: Parsing of empty and nested Struct<> followed by shiftRight operator") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, if we are adding tests to specific files, we try to reuse as much boiler plate code that is there. Could you please rewrite this test to have similar structure as the tests above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By this, I mean to reuse comparePlans. Also, I would say we can move this test to SparkSqlParserSuite as the point of this test is to enable parsing of different queries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try my best to rewrite it in the manner you propose.

@mihailom-db
Copy link
Contributor

nit: We usually like to name the PR the same way the ticket is named on Jira. Could you please align the PR name with Jira?

@ManosGEM ManosGEM changed the title [SPARK-52709][SQL] Fix parsing of null STRUCT<> [SPARK-52709][SQL] Fix parsing of STRUCT<> Jul 15, 2025
@ManosGEM
Copy link
Contributor Author

nit: We usually like to name the PR the same way the ticket is named on Jira. Could you please align the PR name with Jira?

remove the word "null" then? Because the only other difference is the [SQL] tag that I added according to the guidelines.

@mihailom-db
Copy link
Contributor

Yeah, I meant the main name of the PR, the tags are all good.

This commit addresses a parsing issue where the `STRUCT<>` data type
was incorrectly handled, especially when appearing before operators
like the bitwise shift (`>>`).

The problem stemmed from the parser misinterpreting the angle brackets
of `STRUCT<>` as the 'not equal' operator (`<>`), leading to syntax errors.
The fix ensures `STRUCT<>` is correctly recognized as a data type.

A new test case in `PlanParserSuite.scala` confirms that queries
with `CAST(null AS STRUCT<>)`, nested structs, and `>>` now parse correctly.
@ManosGEM ManosGEM force-pushed the SPARK-52709-fix-empty-struct-parsing branch from 88f22d9 to 81f7573 Compare July 18, 2025 08:23
…ta types.

This commit follows SPARK-52709 by:

- Removing  from  parsing rules.

- Relocating relevant parser tests from  to .

- Refactoring the test setup for complex data types into a reusable helper function.

- Adding comprehensive tests for valid , nested , ,  and their combinations to .

- Adding negative tests for invalid empty  and  types to confirm correct  behavior.
@ManosGEM ManosGEM force-pushed the SPARK-52709-fix-empty-struct-parsing branch from 81f7573 to 76c0544 Compare July 18, 2025 11:46
Copy link
Contributor

@mihailom-db mihailom-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @MaxGekk @cloud-fan Could you please have a last review/merge.

@cloud-fan
Copy link
Contributor

cloud-fan commented Jul 24, 2025

good catch! merging to master!

@cloud-fan cloud-fan closed this in 64cada1 Jul 24, 2025
@cloud-fan
Copy link
Contributor

@ManosGEM Can you open a branch 4.0 backport PR? thanks!

@ManosGEM
Copy link
Contributor Author

@cloud-fan Everything should be the same in the PR but open it on the branch-4.0 ? Sorry but this is actually my first PR on Spark.

@cloud-fan
Copy link
Contributor

It has merge conflicts, you will need to git cherry-pick this commit against branch-4.0 locally first, resolve merge conflicts, and open a new PR against branch-4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants