Skip to content

fix: mark ScalarUDFImpl::invoke_batch as deprecated #15049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 8, 2025

Conversation

Blizzara
Copy link
Contributor

@Blizzara Blizzara commented Mar 6, 2025

Which issue does this PR close?

See #14123 (comment)

Also (main) part of #14652 and closes #13515

Rationale for this change

People should use invoke_with_args instead. Calling invoke_batch for any DF-builtin function will fail at runtime with something like Function X does not implement invoke but called

We should also consider just removing the old functions (invoke_batch, invoke, invoke_no_args), to clear up the trait.

What changes are included in this PR?

Mark invoke_batch as deprecated. It was already called so in the docstring, but the compiler doesn't read those.

Are these changes tested?

Existing tests

Are there any user-facing changes?

New deprecation

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Mar 6, 2025
@Blizzara Blizzara mentioned this pull request Mar 6, 2025
26 tasks
@github-actions github-actions bot added the functions Changes to functions implementation label Mar 6, 2025
pub fn invoke_batch(
&self,
args: &[ColumnarValue],
number_rows: usize,
) -> Result<ColumnarValue> {
#[allow(deprecated)]
self.inner.invoke_batch(args, number_rows)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace this invoke_batch with invoke_with_args? So that the allow(deprecated) can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, since for invoke_with_args we'd need the return type which isn't provided here. Also I don't see that as very important change anyways, this is also invoke_batch and deprecated so passing directly through makes sense, and both these invoke_batchs can be removed at the same time.

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice clean up! I think this pr could be merged after fix clippy issue, thanks @Blizzara 👍

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Mar 7, 2025
@@ -26,6 +28,8 @@ fn criterion_benchmark(c: &mut Criterion) {
// All benches are single batch run with 8192 rows
let character_length = datafusion_functions::unicode::character_length();

let return_type = DataType::Utf8;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to look up what the correct return type should be, but there is nothing validating that I got it correct nor that it won't get broken in the future, e.g. if some of these functions start returning stringview instead. But given these are benches, I guess that's fine? If the type is used and wrong, presumably it'll complain somewhere, if it's not used and it's wrong it's not the end of the world.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- this change is fine. Thank you

@Blizzara
Copy link
Contributor Author

Blizzara commented Mar 7, 2025

Nice clean up! I think this pr could be merged after fix clippy issue, thanks @Blizzara 👍

Thanks, clippy pointed out a whole bunch of invoke_batch usage, so I ended up cleaning them all 😅

@alamb alamb added the api change Changes the API exposed to users of the crate label Mar 7, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @Blizzara 🙏

We should also consider just removing the old functions (invoke_batch, invoke, invoke_no_args), to clear up the trait.

Yeah, I have been thinking that too -- maybe just remove them and even though it would be a breaking API change it would make things much less confusing 🤔

@@ -26,6 +28,8 @@ fn criterion_benchmark(c: &mut Criterion) {
// All benches are single batch run with 8192 rows
let character_length = datafusion_functions::unicode::character_length();

let return_type = DataType::Utf8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- this change is fine. Thank you

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I agree with @alamb; it would be better to remove invole_batch because I was confused when I first reviewed this PR.

@Blizzara
Copy link
Contributor Author

Blizzara commented Mar 8, 2025

Sure, I think let’s merge this in first, then I can followup with the removals

@Weijun-H Weijun-H merged commit f47ea73 into apache:main Mar 8, 2025
24 checks passed
@Weijun-H
Copy link
Member

Weijun-H commented Mar 8, 2025

Thanks @Blizzara and @alamb 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecate ScalarUDFImpl::invoke_batch and move everything to ScalarUDFImpl::invoke_with_args
4 participants