fix(schema): permit opt-in timestamp-precision evolution#19029
Open
yihua wants to merge 4 commits into
Open
Conversation
4 tasks
2528a56 to
30810d2
Compare
Adds a write config `hoodie.write.schema.allow.timestamp.precision.evolution` (default false) that, when true, lets the internal-schema reconcile path correct the logical type of a column between timestamp-millis and timestamp-micros (and between the local-timestamp variants), and attach a missing local-timestamp logical type on top of a bare long. Default false preserves the existing strict rejection so no caller sees a behavior change. The non-reconcile write path was already lenient via Avro reader/writer compatibility (both logical types share the same Avro long primitive). The internal-schema reconcile path, triggered when `hoodie.write.set.null.for.missing.columns=true`, instead rejected the logical-type correction through `SchemaChangeUtils.isTypeUpdateAllow`. This closes the parity gap and enables forward-fixing tables that earlier versions persisted with a timestamp-micros logical type but timestamp-millis values, or that dropped the local-timestamp logical type entirely and stored the column as bare long. Threaded through SchemaChangeUtils -> TableChanges.ColumnUpdateChange -> AvroSchemaEvolutionUtils.reconcileSchema, with HoodieSchemaUtils, BaseHoodieWriteClient, HoodieMergeHelper, and FileGroupReaderBasedMergeHandle reading the config from the write properties. Tests: - TestAvroSchemaEvolutionUtils.testReconcileSchemaTimestampPrecisionEvolution covers default-strict reject and opt-in permit for all three shapes (timestamp precision swap, local-timestamp precision swap, long -> local-timestamp logical-type attach). - testCOWLogicalRepair / testMORLogicalRepair parameterize on both setNullForMissingColumns and allowTimestampPrecisionEvolution; positive variants exercise the gated repair path on v6/v8/CURRENT fixtures; a negative variant asserts SchemaCompatibilityException when the reconcile path is on with the gate closed.
yihua
commented
Jun 17, 2026
30810d2 to
dc16a7a
Compare
yihua
commented
Jun 17, 2026
…s/SchemaChangeUtils.java
wombatu-kun
reviewed
Jun 17, 2026
| .defaultValue(false) | ||
| .markAdvanced() | ||
| .sinceVersion("1.3.0") | ||
| .withDocumentation("Controls whether schema evolution may change a column between timestamp-millis and " |
Contributor
There was a problem hiding this comment.
This omits the third behavior the flag gates: attaching a logical type to a bare long (long -> local-timestamp-millis/micros), the logical-type-loss repair the PR description calls a primary motivation. A user whose 0.x table stored the column as bare long would not learn from "precision-only evolution between these logical types" that this flag applies. Suggest documenting the long -> local-timestamp attach case explicitly.
Contributor
Author
There was a problem hiding this comment.
long -> local-timestamp is backward compatible for readers. The logical type local-timestamp is now supported on master, so we can keep this part out for simplicity.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe the issue this Pull Request addresses
Earlier Hudi versions mishandled long-backed timestamp logical types in
AvroInternalSchemaConverter:timestamp-millisandtimestamp-microsboth collapsed into a single internalTimestampTypeand were always re-emitted astimestampMicros()on serialize. A source schema declaringtimestamp-millisgot persisted in the table with the wrongtimestamp-microslogical type, while the underlyinglongvalues written to parquet remained epoch-millis. Pure logical-type drift.local-timestamp-millisandlocal-timestamp-microshad no branch at all. They fell through to the bareLongType, and the logical type was dropped from the table schema entirely. Logical-type loss.In both cases the parquet values are correct; only the logical type on the field is wrong. Current converters recognize all four logical types as distinct, so the writer schema now declares the correct logical type. On every subsequent write the reconcile path compares writer schema against the persisted table schema, finds the logical-type mismatch, and rejects it.
With
hoodie.write.set.null.for.missing.columns=falsethe table schema already self-repairs on the next commit:HoodieSchemaUtils.deduceWriterSchemaskipsreconcileSchemaentirely and letsAvroSchemaCompatibility.checkReaderWriterCompatibilityvalidate. That check is logical-type-blind (both timestamps arelongunderneath), so it accepts the corrected logical type from the writer schema and the next commit rewrites the table schema's logical type accordingly. No change is needed for this path.With
hoodie.write.set.null.for.missing.columns=truethe repair is blocked.HoodieSchemaUtils.deduceWriterSchemainstead callsAvroSchemaEvolutionUtils.reconcileSchema, which goes throughTableChanges.ColumnUpdateChange.updateColumnTypeandSchemaChangeUtils.isTypeUpdateAllow. That switch had no case forTIMESTAMPorTIMESTAMP_MILLIS, so any logical-type change fell intodefault: return falseand threwSchemaCompatibilityException. The reconcile path was strictly stricter than the non-reconcile path for the same scenario; this PR fixes only that one path.Complements the read-side repair from #14161, which handles parquet files carrying the wrong logical type transparently. This PR closes the write-side gap so the table schema itself can be brought into agreement with the writer schema even when
set.null.for.missing.columns=true.Summary and Changelog
Users gain a per-write opt-in to forward-fix tables whose persisted schema carries a wrong or missing timestamp logical type, by allowing the internal-schema reconcile path to update the column's logical type to match the writer schema. The non-reconcile path (
set.null.for.missing.columns=false) already repaired the logical type implicitly via the Avro reader/writer compatibility check; this PR brings the reconcile path to parity. Default behavior is unchanged.Changes:
hoodie.write.schema.allow.timestamp.precision.evolutiononHoodieCommonConfig(defaultfalse,sinceVersion("1.3.0")).SchemaChangeUtils.isTypeUpdateAllowgains aboolean allowTimestampPrecisionEvolutionparameter. Whentrue, the switch permits:timestamp-millis ↔ timestamp-micros(logical-type drift case, both directions)local-timestamp-millis ↔ local-timestamp-micros(precision swap among the recognized variants)long → local-timestamp-millis/long → local-timestamp-micros(logical-type loss case, attach the missing logical type)AvroSchemaEvolutionUtils.reconcileSchemagains an overload that threads the flag through toTableChanges.ColumnUpdateChange, which stores it on the change and passes it toisTypeUpdateAllow. Pre-existingreconcileSchema/ColumnUpdateChange.getoverloads kept as delegates.HoodieSchemaUtils.scala,BaseHoodieWriteClient,HoodieMergeHelper,FileGroupReaderBasedMergeHandle) read the config from the write properties and pass it through toreconcileSchema.TestAvroSchemaEvolutionUtils.testReconcileSchemaTimestampPrecisionEvolutionasserts default-strict reject and opt-in permit for all three permitted shapes.testCOWLogicalRepair/testMORLogicalRepairparameterize on bothsetNullForMissingColumnsandallowTimestampPrecisionEvolution; positive variants exercise the gated repair path on the v6/v8/CURRENT logical-repair fixtures from fix(ingest): Repair affected logical timestamp milli tables #14161; a negative variant assertsSchemaCompatibilityExceptionwhen the reconcile path is on with the gate closed.Impact
AvroSchemaEvolutionUtils.reconcileSchema, one new factory onTableChanges.ColumnUpdateChange.get. Pre-existing overloads kept as delegates, so no public-API breakage.Risk Level
low
The gate defaults off, so existing writers are unaffected. Test coverage adds positive variants on the existing logical-repair fixtures and a negative variant locking in the default-strict behavior.
Documentation Update
New config documented inline on
HoodieCommonConfig.ALLOW_TIMESTAMP_PRECISION_EVOLUTIONwithsinceVersion("1.3.0"). No website update needed.Contributor's checklist