Schema evolution at the column level. #3171
CrispyCrafter
started this conversation in
General
Replies: 1 comment
-
There is no native way for this currently. The feature you are looking for is called I think you have this strong use case, you could try to contribute this feature to our schema evolution code. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We've been running DeltaLake with S3 backing in production for a few months now with excellent results.
Yesterday we came to realise that an opinionated decision to cast one column to int as apposed to float upstream from delta-lake introduced marginal errors in a analytical workflow. To my surprise it does not seem that DeltaLake natively supports updating column type schemas in this situation. There are obviously certain data-types that cannot be cast in this way, however, int to float is a perfectly valid operation.
The options, as far as I could tell, to resolve this is to rebuild the table entirely, i.e. using
overwrite
mode.This is expensive wasteful and technically not feasible given the volume of data present in s3.
Instead I opted to manually modify both
<>.checkpoint.parquet
and_last_checkpoint
which seems to have done the trick.Here is the pseudo workflow that I used:
This workflow updated the schema for all downstream consumers such that the type is now listed as float in the
DeltaTable
interface.Side note - we use the
rust
writer, inappend
mode and schema modemerge
Surely there has to be a better, native way to support this kind of operation?
Beta Was this translation helpful? Give feedback.
All reactions