Skip to content

Conversation

@Jefffrey
Copy link
Contributor

@Jefffrey Jefffrey commented Jan 1, 2026

Which issue does this PR close?

Rationale for this change

Refactor try_sum to reuse existing sum code from native Datafusion version.

What changes are included in this PR?

Add new mode to Sum which allows toggling into try_sum mode, affecting which accumulator is used.

Add new try_sum accumulator for sum which returns null if sum at any point overflows.

Are these changes tested?

Existing tests.

Are there any user-facing changes?

No

@github-actions github-actions bot added functions Changes to functions implementation spark labels Jan 1, 2026
DataType::Int64 => Ok(DataType::Int64),
DataType::UInt64 => Ok(DataType::UInt64),
DataType::Float64 => Ok(DataType::Float64),
DataType::Null => Ok(DataType::Float64),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark try_sum had test case where null inputs will return a null as double, so adding that here

Comment on lines +256 to +258
if args.expr_fields[0].data_type() == &DataType::Null {
return Ok(Box::new(NoopAccumulator::new(ScalarValue::Float64(None))));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

// See COMMENTS.md to understand why nullable is set to true
Field::new_list_field(args.return_type().clone(), true),
false,
match (args.is_distinct, self.try_sum_mode) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit ugly here, would be nice if we manage to make accumulators own their state fields, see: #14701 (comment)

Comment on lines +427 to +430
// Can overflow into null
if self.try_sum_mode {
return SetMonotonicity::NotMonotonic;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain on this 🤔

Comment on lines +547 to +549
Err(e) => {
return Err(e.into());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically these errs are unreachable as I think only arithmeticoverflow error variant is returned for add_checked but keeping this in case

Comment on lines +588 to +594
// Only difference from TrySumAccumulator is that it verifies the resulting sum
// can fit within the decimals precision; if Rust had specialization we could unify
// the two types (╥﹏╥)
#[derive(Debug)]
struct TrySumDecimalAccumulator<T: DecimalType + std::fmt::Debug> {
inner: TrySumAccumulator<T>,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to any ideas to unify these structs

Comment on lines +26 to +29
/// Thin wrapper over DataFusion native [`Sum`] which is configurable into a try
/// sum mode to return `null` on overflows. We need this thin wrapper to provide
/// the `try_sum` named function for use in Spark.
#[derive(PartialEq, Eq, Hash, Debug)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if we had a simpler way to do these types of thin wrappers 🤔

}

#[cfg(test)]
mod tests {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests seemed to already be covered by SLTs so removed them

@Jefffrey Jefffrey marked this pull request as ready for review January 1, 2026 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation spark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant