-
Notifications
You must be signed in to change notification settings - Fork 927
Optimize take
kernel for BinaryViewArray
and StringViewArray
#6168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -160,6 +160,34 @@ pub fn create_string_array_with_len<Offset: OffsetSizeTrait>( | |
.collect() | ||
} | ||
|
||
/// Creates a random (but fixed-seeded) string view array of a given size and null density. | ||
/// | ||
/// See `create_string_array` above for more details. | ||
pub fn create_string_view_array(size: usize, null_density: f32) -> StringViewArray { | ||
create_string_view_array_with_max_len(size, null_density, 400) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using a max len of 400 means most of the string values will be "long" (not inlined views which happens for values less than 12 bytes in length). I don't think that is critical I am just pointing it out There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep it's a good point. I do think that the change is going to be more impactful for outlined vs inlined strings, as validation time is linear w.r.t. length of string. Though, even for the relatively short strings in TPC-H, the validation step was really significant (~22% of execution time) I have a flamegraph for q10 in vortex-data/vortex#476 (comment). |
||
} | ||
|
||
/// Creates a random (but fixed-seeded) array of rand size with a given max size, null density and length | ||
fn create_string_view_array_with_max_len( | ||
size: usize, | ||
null_density: f32, | ||
max_str_len: usize, | ||
) -> StringViewArray { | ||
let rng = &mut seedable_rng(); | ||
(0..size) | ||
.map(|_| { | ||
if rng.gen::<f32>() < null_density { | ||
None | ||
} else { | ||
let str_len = rng.gen_range(0..max_str_len); | ||
let value = rng.sample_iter(&Alphanumeric).take(str_len).collect(); | ||
let value = String::from_utf8(value).unwrap(); | ||
Some(value) | ||
} | ||
}) | ||
.collect() | ||
} | ||
|
||
/// Creates a random (but fixed-seeded) array of a given size, null density and length | ||
pub fn create_string_view_array_with_len( | ||
size: usize, | ||
|
Uh oh!
There was an error while loading. Please reload this page.