PS-10481: Fix range optimizer full table scan for IN() with oversized values#5840
Open
percona-mhansson wants to merge 2 commits intopercona:8.4from
Open
PS-10481: Fix range optimizer full table scan for IN() with oversized values#5840percona-mhansson wants to merge 2 commits intopercona:8.4from
percona-mhansson wants to merge 2 commits intopercona:8.4from
Conversation
Fix range optimizer falling back to full table scan for oversized string values
The range optimizer returns nullptr ("always true") when a string value
exceeds the column's character capacity on strnxfrm collations
(Bug#35169384). This nullptr then poisons valid ranges via
tree_or(valid, nullptr) -> nullptr, causing a full table scan when
the optimizer should use an index.
Three fixes:
1) IN() list (Bug#118009, Bug#118486): When get_mm_parts() returns
nullptr for an IN() value, check if the value's character count
exceeds the field's char_length(). If so, skip it -- no row can
match. The check uses the table field's char_length(), not the
key_part's prefix clone, so values fitting the column but
exceeding a prefix are not incorrectly skipped.
2) Prefix index (Bug#119770): For EQ_FUNC on a prefix key
(HA_PART_KEY_SEG), the truncated value is a valid prefix lookup
key -- the prefix index stores truncated values by design. Fall
through with inexact=true so the WHERE filter rechecks full
equality, instead of bailing out.
3) OR branches (Bug#119867): When an OR branch returns nullptr for
a MULT_EQUAL predicate (produced by constant propagation from
simple col = const), check whether the value exceeds the column's
character capacity. If so, skip the branch. This check runs
before the first-branch assignment to prevent an oversized first
branch from poisoning all subsequent valid branches.
UCA contractions (e.g. 'ae' = U+00E6 'æ') can make an N-char value
match an M-char stored value where N > M. When such a value is
skipped, the contraction match is missed, but this matches the
existing MySQL 8.0 behavior where the truncated sort key also fails
to match the contraction character.
Whenever we see an oversized value in an IN predicate, bail out. We cannot know if it will match something or not. This will miss some opportunities for range scan but at least will not fall back on table scan.
yakirgb
reviewed
Feb 23, 2026
| if (!get_eq_field_and_value(down_cast<Item_func *>(&item), &eq_field, | ||
| &eq_value) && | ||
| is_oversized_string_for_field(eq_field, eq_value)) | ||
| continue; |
There was a problem hiding this comment.
@percona-mhansson Shouldn't we apply the same fix here as you added to IN() loop?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.