Skip to content

Support custom field metadata in UDF #13458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions datafusion/expr/src/expr_schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,9 @@ impl ExprSchemable for Expr {
Expr::Column(c) => Ok(schema.metadata(c)?.clone()),
Expr::Alias(Alias { expr, .. }) => expr.metadata(schema),
Expr::Cast(Cast { expr, .. }) => expr.metadata(schema),
Expr::ScalarFunction(ScalarFunction { func, args }) => {
Ok(func.metadata(args, schema))
}
_ => Ok(HashMap::new()),
}
}
Expand Down
19 changes: 19 additions & 0 deletions datafusion/expr/src/udf.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ use datafusion_common::{not_impl_err, ExprSchema, Result};
use datafusion_expr_common::interval_arithmetic::Interval;
use std::any::Any;
use std::cmp::Ordering;
use std::collections::HashMap;
use std::fmt::Debug;
use std::hash::{DefaultHasher, Hash, Hasher};
use std::sync::Arc;
Expand Down Expand Up @@ -216,6 +217,15 @@ impl ScalarUDF {
self.inner.is_nullable(args, schema)
}

/// Returns the field metadata for this function.
pub fn metadata(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels similar in usecase to #13290 -- if we merged #13290 would that solve your usecase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like not. This pr is for supporting custom metadata when converting scalar function expr to schema field during planing.

&self,
args: &[Expr],
schema: &dyn ExprSchema,
) -> HashMap<String, String> {
self.inner.metadata(args, schema)
}

/// Invoke the function with `args` and number of rows, returning the appropriate result.
///
/// See [`ScalarUDFImpl::invoke_batch`] for more details.
Expand Down Expand Up @@ -477,6 +487,15 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
true
}

/// Returns the field metadata for this function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more detailed explanation would be useful for someone adding a new UDF to know why this method exists and what use case it is trying to solve. Without looking at this PR it's pretty opaque.

fn metadata(
&self,
_args: &[Expr],
_schema: &dyn ExprSchema,
) -> HashMap<String, String> {
HashMap::new()
}

/// Invoke the function on `args`, returning the appropriate result
///
/// The function will be invoked passed with the slice of [`ColumnarValue`]
Expand Down