Fix float truncation in min/max derived field scripts#1292
Draft
jwils wants to merge 1 commit into
Draft
Conversation
The min/max derived field painless function coerced every incoming Number (and the stored value) to a long before comparing, to avoid a ClassCastException on mixed Integer/Long values. That truncated floating point values: a Float-sourced min_value/max_value field compared truncated values (so 2.4 vs 2.9 looked equal) and stored the truncated result (2.9 became 2), silently corrupting derived indices. Replace the coerce-then-Collections.min/max approach with a loop that tracks the winning element and a numeric-aware comparison function: integral values are compared via Long.compare (preserving full precision beyond 2^53), comparisons involving a floating point value use Double.compare (preserving fractional parts, including when JSON produces mixed Integer/Double values for the same field), and non-numeric values keep using their natural compareTo. The original (un-coerced) winning value is what gets stored. Also add a Float-sourced min/max derived field to the test schema and acceptance coverage for fractional and mixed integral/fractional values.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
min_value/max_valuederived fields sourced fromFloatfields silently corrupt data: the generated painless function coerces every incoming number (and the stored value) to along, so fractional values are stored truncated (2.9 becomes 2) and compared truncated (2.4 vs 2.9 compare as equal). The coercion was added in 8d3d664 to fix aClassCastExceptionon mixedInteger/Longvalues, but it over-coerces.What
Collections.min/maxapproach with a loop that tracks the winning element, using a new numeric-awareminOrMaxValue_compareValuespainless function: integral values compare viaLong.compare(preserving full precision beyond 2^53 forJsonSafeLong/LongStringsources), comparisons involving a floating point value useDouble.compare(handling JSON producing mixedInteger/Doublevalues for the sameFloatfield), and non-numeric values (Date/DateTime/LocalTimestrings) keep their naturalcompareTo.update_WidgetCurrency_from_Widget_*script ID changed since the script contents changed.Float-sourced min/max derived field to the test schema plus acceptance coverage for fractional values and mixed integral/fractional values (in both indexing orders), and unit coverage for the generated function text.Risk Assessment
Low. The derived-field update script contents (and therefore the MD5-derived script ID) change, so deployments will pick up the new script when schema artifacts are dumped and cluster configuration is applied. Behavior for existing integral and string-valued min/max fields is unchanged (verified by the existing regression tests).