Upgrade Apache Calcite from 1.40.0 to 1.42.0#18658
Conversation
Bump calcite-core/babel to 1.42.0 (folds in both the 1.40->1.41 and 1.41->1.42 deltas, since master was never moved to 1.41) and pin joou-java-6 to 0.9.5 to resolve a dependency-convergence conflict between calcite-core and the transitive avatica-core 1.28.0. Sync the customized SQL parser (Parser.jj + the fmpp configs) to upstream 1.42, preserving all PINOT CUSTOMIZATION regions. The new babel feature flags (includeStarExclude, includeSelectBy, includeIntervalWithoutQualifier) are intentionally kept OFF: the grammar is synced but inactive, as the multi-stage engine has no downstream support for those features yet. The upstream colon-path field-access grammar was likewise synced and is inert under Pinot's BABEL conformance (isColonFieldAccessAllowed() returns false) - gated by conformance rather than an fmpp flag. The sync also makes the bitwise infix operators '<<', '&' and infix '^' parseable (upstream-stock 1.42 additions); they are not registered in PinotOperatorTable, so they are not yet supported end-to-end. Handle 1.42 behavioral changes: - CALCITE-7189: Validator disables non-strict GROUP BY (BABEL enables it in 1.41+, which NPEs for window functions combined with GROUP BY). - CALCITE-7379: PinotRelDecorrelator relaxes the post-decorrelation type assertion for the nullability-only divergence that still fires on some Pinot correlated-subquery shapes; it fails fast on structural changes. - CALCITE-7351: drop the now-final getMaxNumericScale/getMaxNumericPrecision overrides (they delegate to the type-specific overrides Pinot keeps). - Filtered MIN/MAX are now nullable via SqlOperatorBinding.hasEmptyGroup(). - Unsigned integer types (CALCITE-1466) parse under BABEL and are treated as their signed equivalents throughout (converters, literal folding, SUM return type, and arithmetic normalization), so e.g. SUM over an unsigned column widens to BIGINT instead of overflowing a 32-bit INT. Regenerate the EXPLAIN plan snapshots and update/extend the affected tests.
b8bf3a6 to
3cbe3eb
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18658 +/- ##
==========================================
Coverage 64.45% 64.45%
- Complexity 1282 1291 +9
==========================================
Files 3352 3366 +14
Lines 207171 208152 +981
Branches 32348 32497 +149
==========================================
+ Hits 133534 134168 +634
- Misses 62910 63201 +291
- Partials 10727 10783 +56
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
xiangfu0
left a comment
There was a problem hiding this comment.
Found 1 high-signal issue; see inline comment.
| case UINTEGER: | ||
| // UBIGINT (0..2^64-1) has no wider signed type, so values above Long.MAX_VALUE wrap (two's-complement) - this is | ||
| // unavoidable and acceptable since Pinot has no unsigned storage type. | ||
| case UBIGINT: |
There was a problem hiding this comment.
Accepting UBIGINT here turns BIGINT UNSIGNED into a lossy signed LONG. Any value above Long.MAX_VALUE will now silently wrap, so this is a wrong-result regression rather than a harmless type downgrade. Since Pinot cannot represent the full unsigned 64-bit range, this needs to fail validation/planning instead of being mapped to LONG.
Summary
Upgrades Apache Calcite from 1.40.0 to 1.42.0. Pinot's master was never moved to 1.41, so this folds in both the 1.40→1.41 and 1.41→1.42 deltas in a single bump.
The bulk of the upgrade is a faithful re-sync of Pinot's customized SQL parser grammar to upstream 1.42, plus a handful of targeted workarounds for behavioral changes Calcite introduced across these two releases. No public Pinot API or wire/segment format changes.
Changes
Dependency
pom.xml:calcite.version→1.42.0(coverscalcite-coreandcalcite-babel).joou-java-6to0.9.5independencyManagementto resolve a dependency-convergence conflict:calcite-core1.42 pullsjoou-java-60.9.5 while the transitiveavatica-core1.28.0 still wants 0.9.4.LICENSE-binary: bumped the Calcite/Avatica entries to 1.42.0 / 1.28.0, addedorg.jooq:joou-java-6:0.9.5, and dropped the now-unbundledavatica-metrics(Avatica 1.28 no longer pulls it).SQL parser codegen sync (
pinot-common/src/main/codegen)templates/Parser.jj,config.fmpp, anddefault_config.fmppto upstream Calcite 1.42, preserving everyPINOT CUSTOMIZATIONregion.includeStarExclude(SELECT * EXCLUDE/REPLACE),includeSelectBy(SELECT ... BY), andincludeIntervalWithoutQualifier— are intentionally kept OFF. The grammar is synced but inactive: the multi-stage engine has no downstream support for these features, so enabling them would parse syntax the planner/runtime can't execute. They are wired throughdefault_config.fmppso a future change can flip them on deliberately.UNSIGNEDis added as a non-reserved keyword (needed for the unsigned integer types below).Behavioral workarounds for 1.41/1.42 changes
ANY_VALUE()), but the implementation NPEs when a window function is combined with GROUP BY (e.g.SELECT MIN(col) OVER() FROM t GROUP BY col).Validatornow uses aSqlDelegatingConformanceover BABEL that overridesisNonStrictGroupBy()tofalse. This is also the semantically correct behavior for Pinot, which requires all non-aggregated columns to appear in GROUP BY. The feature remains present (un-reverted) in 1.42, so the override is retained.Litmus.THROWtype assertion still fires for a nullability-only divergence.PinotRelDecorrelator(new, see below) relaxes that one assertion: it logs a warning when the row types differ only in nullability and continues, but still fails fast on any structural type change.RelDataTypeSystem#getMaxNumericScale/getMaxNumericPrecisionbecamefinal.TypeSystemdrops the now-illegal overrides; the equivalent behavior is preserved via the type-specificgetMaxScale/getMaxPrecision(DECIMAL)overrides Pinot already defines.SqlOperatorBinding#hasEmptyGroup().PinotMinMaxReturnTypeInferencenow also treats a possibly-empty group as nullable (alongside the existinggetGroupCount() == 0/hasFilter()checks), matching the runtime's null-on-empty behavior forMIN(x) FILTER (WHERE ...).Unsigned integer types (CALCITE-1466)
UTINYINT/USMALLINT/UINTEGER/UBIGINT. Pinot has no native unsigned storage, so these are accepted and treated as their signed equivalents everywhere aSqlTypeNameswitch handles integers:RelToPlanNodeConverter/ v2PRelToPlanNodeConverter: map to the narrowest signed type that holds the full range —UTINYINT/USMALLINT→INT,UINTEGER/UBIGINT→LONG(a signedINTwould wrapUINTEGERvalues above 2³¹).PinotEvaluateLiteralRule: fold a constant unsigned cast into its signed-equivalent type — Calcite cannot build aRexLiteralof an unsigned type, so without thisCAST(5 AS INTEGER UNSIGNED)failed constant-folding.TypeSystem.deriveSumType: widen to (signed)BIGINTlike the signed integer types, soSUMover a sub-64-bit unsigned column accumulates into 64 bits instead of silently overflowing a 32-bitINT.ArithmeticFunctionUtils.normalizeNumericType: normalize to the signed integral type so arithmetic over an unsigned operand stays integral rather than widening toDOUBLE.New class
org.apache.pinot.calcite.sql2rel.PinotRelDecorrelator— a minimal subclass of Calcite'sRelDecorrelatorthat exists solely to relax the CALCITE-7379 assertion described above. It lives under theorg.apache.calcite.sql2relpackage because the relevant members (CorelMap,decorrelate, etc.) are package/protected-subclass visible in Calcite.Testing & validation
pinot-query-plannerunit suite: 1262/1262 pass.pinot-query-runtimeresult-correctness vs H2 —ResourceBasedQueriesTest3571 pass / 0 fail (6 pre-existing skips),QueryRunnerTest130/130.OfflineClusterIntegrationTestrun locally; updatedtestQueryWithRepeatedColumnsV2to reflect that 1.42 now accepts repeated columns in GROUP BY but still rejects ambiguous repeated columns in ORDER BY.SUM/arithmetic return types over unsigned operands, and single-stage unsigned casts.pinot-query-planner/src/test/resources/queries/*.jsonEXPLAIN-plan snapshots are mechanically regenerated (label/whitespace deltas from upstream rule changes), not hand-edited.Behavior & compatibility notes
SELECT x, COUNT(*) FROM t GROUP BY x, x) previously failed validation and now succeeds — Calcite 1.42 de-duplicates the repeated key. This is a benign relaxation (no previously-working query breaks), but during a mixed-version rolling upgrade the same query is rejected by a 1.40 broker and accepted by a 1.42 broker. Repeated columns inORDER BYare still rejected as ambiguous (covered by the updatedtestQueryWithRepeatedColumnsV2).JoinPlans.json/PhysicalOptimizerPlans.json). This is a sound, plan-equivalent rewrite — Calcite only applies it where the join's right input is already distinct (e.g. fed by an aggregate) — and result-equivalence is covered by the H2-comparison suites (ResourceBasedQueriesTest) and the integration tests, which all pass.<<,&,^). The faithful parser sync makes these parseable for the first time (upstream-stock 1.42 additions toBinaryRowOperator, mapping toSqlStdOperatorTable.BIT_LEFT_SHIFT/BITAND_OPERATOR/BITXOR_OPERATOR); previously they were parse errors. No operator wiring is added by this PR. End-to-end status differs by engine: in the multi-stage enginePinotOperatorTableis a curated allow-list that does not register them, so such expressions fail at operator resolution (validation) — same status as the other synced-but-inert 1.42 grammar. In the single-stage engine the canonical names of&/^(bitand/bitxor) coincide with Pinot's existingbitAnd/bitXorscalar functions, soa & b/a ^ bmay resolve to those (a MySQL-style infix alias), while<<has no Pinot equivalent and errors. This v1/v2 divergence is an inherent consequence of faithfully syncing the upstream grammar; explicitly wiring up or rejecting these operators is left as a separate, deliberate decision.CAST(x AS INTEGER UNSIGNED)(and the other unsigned types) is now parseable in v1 too.DataTypeConversionFunctions.castmaps the unsigned type-literal to its signed equivalent (UTINYINT/USMALLINT → INT, UINTEGER/UBIGINT → LONG), mirroring the multi-stage converter, so v1 and v2 treat unsigned casts consistently (previously the v1 path would have errored at execution with "Unknown data type"). Covered byDataTypeConversionFunctionsTest#testCastToUnsignedTypes.Notes for reviewers
Parser.jja faithful upstream sync rather than a divergent fork. Flipping the three fmpp flags is a separate, future decision.AddOptionalColonPath/ColonBracketSegment) was also synced and is likewise inert under Pinot — but it is gated by the upstream conformance methodSqlConformance.isColonFieldAccessAllowed()(which returnsfalsefor BABEL), not by an fmpp flag. These productions are byte-for-byte from upstream Calcite 1.42.0 (verified againstcalcite-1.42.0Parser.jj), so they intentionally carry noPINOT CUSTOMIZATIONmarkers.PINOT CUSTOMIZATIONmarkers from the prior grammar are preserved.UBIGINTliteral aboveLong.MAX) wraps modulo the target signed type, identical to the existing single-stage behavior documented inRequestUtils.getLiteral. The final stored value is the same regardless of where the narrowing happens, so the MSE literal path is left consistent with single-stage rather than diverging.switches (the two converters,DataTypeConversionFunctions,ArithmeticFunctionUtils.normalizeNumericType,TypeSystem.deriveSumType). Each maps to a different target enum and usescaselabels (which must be compile-time constants, so they can't delegate to a shared predicate), so the per-switch listing is largely inherent. The one genuinely-collapsible duplication isRelToPlanNodeConverter.convertToColumnDataTypeand the v2PRelToPlanNodeConverter.convertToColumnDataType, which are byte-for-byte identicalpublic staticcopies — a pre-existing smell this diff merely extends. Collapsing those two (have v2 delegate to v1) is worth a dedicated refactor; left out here to keep the diff scoped to the version bump.hasEmptyGroup()) is a planning-time type-nullability correction — it lets the query validate under 1.42's stricter nullability checks. Pinot'sDataSchema/ColumnDataTypeerases nullability, so this does not change runtime values; the empty-filtered-group → NULL runtime semantics are pre-existing and unchanged. The fix is covered by a compile-time regression test (testFilteredMinMaxAggregateNullability).