Add configurable truncation for string columns#19146
Add configurable truncation for string columns#19146jaykanakiya wants to merge 3 commits intoapache:masterfrom
Conversation
| @JsonProperty("createBitmapIndex") Boolean createBitmapIndex, | ||
| @JsonProperty("maxStringLength") @Nullable Integer maxStringLength |
There was a problem hiding this comment.
drive by comment (i'll have a closer look at rest of PR later)
instead of adding additional arguments here, I was hoping to deprecate these arguments in favor of adding a column format spec similar to was done for auto/json columns in #17762, which could serve as a reference for how this should be wired up. I was planning to move the existing createBitmapIndex and multiValueHandling into such a spec, but just haven't got to it yet. I think this would be much cleaner and less disruptive to call sites going forward. It also allows wiring up to IndexSpec to be able to define job level defaults as a middle place between per column and system wide.
There was a problem hiding this comment.
Thanks for taking a look @clintropolis. Adding something like StringCommonFormatColumnFormatSpec would make it cleaner and makes sense to consolidate the configs there. Since it seems like a bigger refactor, does it make sense to do it in a follow up? Let me know what you think.
Summary
Adds a configurable maximum string length for string dimension columns. Strings exceeding the limit are truncated during ingestion.
druid.indexing.formats.maxStringLengthmaxStringLengthfield in the dimension specRelease note
Added a new maxStringLength configuration for string dimensions that truncates values exceeding the specified length during ingestion. Can be set globally via druid.indexing.formats.maxStringLength or per-dimension in the ingestion spec.
Key changed/added classes in this PR
This PR has: