Description
Is your feature request related to a problem or challenge?
Part of #10918
Initial StringView
is supported by #11667, which covers some most performance critical workloads, such as loading from parquet, reasonable aggregation, faster filtering etc. However, there are many operators not supported and using those operations will cause DataFusion to cast StringViewArray
to StringArray
, which is often unnecessary and slow.
Describe the solution you'd like
We should gradually implement native StringView
support string operators, such as length
, reverse
, etc.
Here's a list of items to help implementing the support:
-
Add
Utf8View
to the function signature, e.g., https://github.com/apache/datafusion/blob/main/datafusion/functions/src/unicode/character_length.rs#L46-L50 -
Handle
Utf8View
in the invocation. The function likely already need to handleUtf8
andLargeUtf8
, so we should just add a new case to the type match -
Write tests for the new
Utf8View
. As we now have three string types, it often need to restructure the test logic to reduce duplicated code. -
Make sure the test cases cover long (> 12 byte) and short strings.
Describe alternatives you've considered
No response
Additional context
No response