-
Notifications
You must be signed in to change notification settings - Fork 205
fix: Fallback to Spark when PARQUET_FIELD_ID_READ_ENABLED=true for new native scans #1757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if (SQLConf.get.getConf( | ||
SQLConf.PARQUET_FIELD_ID_READ_ENABLED) && scanImpl != CometConf.SCAN_NATIVE_COMET) { | ||
withInfo(plan, s"Comet $scanImpl scan does not support PARQUET_FIELD_ID_READ_ENABLED") | ||
return plan | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the change. The rest is refactoring.
This may make too many tests fall back because Spark may be enabling this by default in all tests ... investigating |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1757 +/- ##
============================================
+ Coverage 56.12% 58.56% +2.43%
- Complexity 976 1133 +157
============================================
Files 119 130 +11
Lines 11743 12686 +943
Branches 2251 2369 +118
============================================
+ Hits 6591 7429 +838
- Misses 4012 4065 +53
- Partials 1140 1192 +52 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Pre-emptively approving this. We can defer field_id support for native readers for the time being. |
private[sql] object TestSQLContext {
/**
* A map used to store all confs that need to be overridden in sql/core unit tests.
*/
val overrideConfs: Map[String, String] =
Map(
// Fewer shuffle partitions to speed up testing.
SQLConf.SHUFFLE_PARTITIONS.key -> "5",
// Enable parquet read field id for tests to ensure correctness
// By default, if Spark schema doesn't contain the `parquet.field.id` metadata,
// the underlying matching mechanism should behave exactly like name matching
// which is the existing behavior. Therefore, turning this on ensures that we didn't
// introduce any regression for such mixed matching mode.
SQLConf.PARQUET_FIELD_ID_READ_ENABLED.key -> "true")
} |
Ouch. |
val scanImpl: String = COMET_NATIVE_SCAN_IMPL.get() | ||
if (SQLConf.get.getConf( | ||
SQLConf.PARQUET_FIELD_ID_READ_ENABLED) && scanImpl != CometConf.SCAN_NATIVE_COMET) { | ||
withInfo(plan, s"Comet $scanImpl scan does not support PARQUET_FIELD_ID_READ_ENABLED") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
withInfo(plan, s"Comet $scanImpl scan does not support PARQUET_FIELD_ID_READ_ENABLED") | |
withInfo(plan, s"Comet $scanImpl scan does not support with enabled `spark.sql.parquet.fieldId.read.enabled`") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @andygrove
I will close this since this will make all Spark SQL tests fall back to Spark. We seem to mostly support this feature in |
Which issue does this PR close?
Part of #1758
Rationale for this change
Fix some Spark SQL test failures when the new native scans are enabled
What changes are included in this PR?
How are these changes tested?
I manually tested with the following test: