-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Implement PoC block allocation for count accumulator #15642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@Rachelint PoC for accumulators (I also have some PoC for block-allocated GroupValues) I also made some separate changes for outputting the output from the state in batches, but left it out to keep the changes small. |
b6e8094
to
d7a8a78
Compare
d7a8a78
to
3a1733d
Compare
@@ -18,6 +18,7 @@ | |||
use ahash::RandomState; | |||
use datafusion_common::stats::Precision; | |||
use datafusion_expr::expr::WindowFunction; | |||
use datafusion_expr::groups_accumulator::BLOCK_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be based on batch size as well.
|
||
// Count is always non null (null inputs just don't contribute to the overall values) | ||
let nulls = None; | ||
let array = PrimitiveArray::<Int64Type>::new(counts.into(), nulls); | ||
// TODO: support emitting batches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
evaluate
and state
could be supported to return Result<Vec<ArrayRef>>
and Result<Vec<Vec<ArrayRef>>>
although this is making a quite large breaking change.
It seems similar as what was done in #11943 ? The problem I found after attempt in POC #11943 is we need introduce a It still work well when we enable |
Really thanks. I check the old sketch again, and found it is easy to avoid regression for disabling the optimization cases. Maybe it is actually too early to consider the cost of I plan to:
|
The plan sounds good! I think this PoC mainly shows that accumulators / Group state might be changed individually (without changing other parts). If there is a part of your code that shows improvement time and memory wise, we should take it and create some follow-up tickets! |
Let's close it for now - I'll be looking if I can contribute part of this later. |
Which issue does this PR close?
PoC to show a simple method for allocating in blocks
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?