-
Notifications
You must be signed in to change notification settings - Fork 0
Proposed changes for more flexible user defined Aggregate and window functions #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
66e0e78
6de0cc1
c613434
000fc3c
dc7edf9
84bdf28
828de45
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -80,19 +80,19 @@ pub trait PartitionState { | |
/// | ||
/// # Stateless `PartitionEvaluator` | ||
/// | ||
/// In this case, [`Self::evaluate`], [`Self::evaluate_with_rank`] or | ||
/// In this case, [`Self::evaluate_all`], [`Self::evaluate_with_rank`] or | ||
/// [`Self::evaluate_inside_range`] is called with values for the | ||
/// entire partition. | ||
/// | ||
/// # Stateful `PartitionEvaluator` | ||
/// | ||
/// In this case, [`Self::evaluate_stateful`] is called to calculate | ||
/// In this case, [`Self::evaluate`] is called to calculate | ||
/// the results of the window function incrementally for each new | ||
/// batch, saving and restoring any state needed to do so as | ||
/// [`BuiltinWindowState`]. | ||
/// | ||
/// For example, when computing `ROW_NUMBER` incrementally, | ||
/// [`Self::evaluate_stateful`] will be called multiple times with | ||
/// [`Self::evaluate`] will be called multiple times with | ||
/// different batches. For all batches after the first, the output | ||
/// `row_number` must start from last `row_number` produced for the | ||
/// previous batch. The previous row number is saved and restored as | ||
|
@@ -147,27 +147,46 @@ pub trait PartitionEvaluator: Debug + Send { | |
/// | ||
/// `idx`: is the index of last row for which result is calculated. | ||
/// `n_rows`: is the number of rows of the input record batch (Used during bounds check) | ||
fn get_range(&self, _idx: usize, _n_rows: usize) -> Result<Range<usize>> { | ||
Err(DataFusionError::NotImplemented( | ||
"get_range is not implemented for this window function".to_string(), | ||
)) | ||
/// If `uses_window_frame` flag is `false`. This method is used to calculate required range for the window function | ||
/// Generally there is no required range, hence by default this returns smallest range(current row). e.g seeing current row | ||
/// is enough to calculate window result (such as row_number, rank, etc) | ||
fn get_range(&self, idx: usize, _n_rows: usize) -> Result<Range<usize>> { | ||
if self.uses_window_frame() { | ||
Err(DataFusionError::Execution( | ||
"Range should be calculated from window frame".to_string(), | ||
)) | ||
} else { | ||
Ok(Range { | ||
start: idx, | ||
end: idx + 1, | ||
}) | ||
} | ||
} | ||
|
||
/// Called for window functions that *do not use* values from the | ||
/// the window frame, such as `ROW_NUMBER`, `RANK`, `DENSE_RANK`, | ||
/// `PERCENT_RANK`, `CUME_DIST`, `LEAD`, `LAG`). | ||
fn evaluate(&self, _values: &[ArrayRef], _num_rows: usize) -> Result<ArrayRef> { | ||
Err(DataFusionError::NotImplemented( | ||
"evaluate is not implemented by default".into(), | ||
)) | ||
fn evaluate_all(&mut self, values: &[ArrayRef], num_rows: usize) -> Result<ArrayRef> { | ||
if !self.uses_window_frame() && self.supports_bounded_execution(){ | ||
let res = (0..num_rows).into_iter().map(|idx| self.evaluate(values, &Range{start: 0, end: 1})).collect::<Result<Vec<_>>>()?; | ||
ScalarValue::iter_to_array(res.into_iter()) | ||
}else { | ||
Err(DataFusionError::NotImplemented( | ||
"evaluate_all is not implemented by default".into(), | ||
)) | ||
} | ||
} | ||
|
||
/// Evaluate window function result inside given range. | ||
/// | ||
/// Only used for stateful evaluation | ||
fn evaluate_stateful(&mut self, _values: &[ArrayRef]) -> Result<ScalarValue> { | ||
fn evaluate( | ||
&mut self, | ||
_values: &[ArrayRef], | ||
_range: &Range<usize>, | ||
) -> Result<ScalarValue> { | ||
Err(DataFusionError::NotImplemented( | ||
"evaluate_stateful is not implemented by default".into(), | ||
"evaluate is not implemented by default".into(), | ||
)) | ||
} | ||
|
||
|
@@ -210,18 +229,20 @@ pub trait PartitionEvaluator: Debug + Send { | |
)) | ||
} | ||
|
||
/// Called for window functions that use values from window frame, | ||
/// such as `FIRST_VALUE`, `LAST_VALUE`, `NTH_VALUE` and produce a | ||
/// single value for every row in the partition. | ||
/// Does the window function use the values from its window frame? | ||
/// | ||
/// Returns a [`ScalarValue`] that is the value of the window function for the entire partition | ||
fn evaluate_inside_range( | ||
&self, | ||
_values: &[ArrayRef], | ||
_range: &Range<usize>, | ||
) -> Result<ScalarValue> { | ||
Err(DataFusionError::NotImplemented( | ||
"evaluate_inside_range is not implemented by default".into(), | ||
)) | ||
/// If this function returns true, [`Self::create_evaluator`] must | ||
/// implement [`PartitionEvaluator::evaluate_inside_range`] | ||
fn uses_window_frame(&self) -> bool { | ||
false | ||
} | ||
|
||
/// Can the window function be incrementally computed using | ||
/// bounded memory? | ||
/// | ||
/// If this function returns true, [`Self::create_evaluator`] must | ||
/// implement [`PartitionEvaluator::evaluate`] | ||
fn supports_bounded_execution(&self) -> bool { | ||
false | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the idea that the special case for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When Maybe we can present to the user just a subset of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For anyone following along, I think we went with the |
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍