-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-52817][SQL] Fix Like
Expression performance
#51510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Could you add a test to cover this change? |
Hi @wangyum Have Add UTs. |
Could you add description ? |
Can you provide more context in the PR description? I don't understand what you are doing in this PR. |
Have added. |
private val endsWith = "%([^_%]+)".r | ||
private val startsAndEndsWith = "([^_%]+)%([^_%]+)".r | ||
private val contains = "%([^_%]+)%".r | ||
private val startsWith = "([^_%]+)%+".r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a single %
is the same as more than one %
? Can we leave a code comment to explain this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to automata theory, consecutive wildcard characters are equivalent to a single wildcard character.
Have added the code comment.
Like
Expression performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
cc @peter-toth
Merged to master. |
### What changes were proposed in this pull request? Make contains function to be used in like expression with multiple '%'. ### Why are the changes needed? In some customers' cases , user sometimes use multiple '%' for like expression. For Example: ``` SELECT * FROM testData where value not like '%%HotFocus%%' SELECT * FROM testData where value not like '%%%HotFocus%%%' ``` In these SQL queries, cannot convert Like expressions to contains function in logical planning. So the performance is very poor. ### How was this patch tested? Added UTs and Existed UTs Closes apache#51510 from zhixingheyi-tian/fix-like. Authored-by: zhixingheyi-tian <[email protected]> Signed-off-by: Yuming Wang <[email protected]>
What changes were proposed in this pull request?
Make contains function to be used in like expression with multiple '%'.
Why are the changes needed?
In some customers' cases , user sometimes use multiple '%' for like expression.
For Example:
In these SQL queries, cannot convert Like expressions to contains function in logical planning. So the performance is very poor.
How was this patch tested?
Added UTs and Existed UTs