Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support defining UDFs in RecordMetaData #2995

Open
wants to merge 84 commits into
base: main
Choose a base branch
from

Conversation

pengpeng-lu
Copy link
Contributor

@pengpeng-lu pengpeng-lu commented Dec 5, 2024

No description provided.

This reverts commit 3a5a44b.
@pengpeng-lu pengpeng-lu added the enhancement New feature or request label Feb 21, 2025
@pengpeng-lu pengpeng-lu changed the title Resolves #3013: Support UDF Support UDF in planner Feb 21, 2025
Copy link
Collaborator

@alecgrieser alecgrieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small things. Most of the "bigger" things are probably more points for discussion/follow-up work as this matures, and then the smaller things should be hopefully pretty straightforward. Sorry for the delay in getting around to this; just been a lot going on trying to get the build up and running and 4.1 validated.

if (proto.hasMacroFunction()) {
return MacroFunction.fromProto(serializationContext, proto);
} else {
return null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this throw an error instead of returning null if we don't find the correct type? It seems like it would be hard to debug a NullPointerException if we fall into this path, and it also seems like we wouldn't want to just silently ignore an unserializable field here. (It does also mean that we need to be somewhat cautious when we roll out new classes here, or new functions, but there's not a lot we can do about that.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. return a RuntimeException instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't see this code any more. Did this become unnecessary after switching over to registering a deserializer with the auto service?

* MacroFunction that expands a body (referring to parameters) into a {@link Value} (through encapsulation) call site.
*/
public class MacroFunction extends UserDefinedFunction {
private static final long serialVersionUID = 1L;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class Serializable? I would only expect to need the serialVersionUID if it implemented the Serializable interface, which I don't think this class should

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I keep seeing this weird pmd error: MissingSerialVersionUID: Classes implementing Serializable should set a serialVersionUID, I checked all parent classes but couldn't find where it comes from...

required KeyExpression arguments = 2;
}

message KeyExpression {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like this change. It will require that any user who directly interacts with this proto file update their code, though it's a pretty straightforward fix. Still, I think it would be good to target getting this into 4.2 rather than 4.1. We should be just about to finish our final 4.1 build, so we can merge this after that is ready. (The only things I'm aware of that we want to target for 4.1 are #3204 and #3207.)

@alecgrieser alecgrieser added this to the 4.2 milestone Mar 6, 2025
Copy link
Collaborator

@alecgrieser alecgrieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, though one last thing about serialization. Also, I think this should probably update gradle.properties to set the version to 4.2.0.0, which should be the only other thing necessary to bump the version to 4.2

PlanSerializationContext serializationContext = new PlanSerializationContext(DefaultPlanSerializationRegistry.INSTANCE,
PlanHashable.CURRENT_FOR_CONTINUATION);
for (RecordMetaDataProto.UserDefinedFunction function: metaDataProto.getUserDefinedFunctionsList()) {
SerializedUserDefinedFunction func = (SerializedUserDefinedFunction)PlanSerialization.dispatchFromProtoContainer(serializationContext, function);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this is actually something that didn't occur to me at first: is it safe to re-use the same serializationContext for multiple UDFs? Amongst other things, it contains state for values, so that if the same value is used multiple times, we only include it once (in, say, a continuation). I think that means that as written, we could end up including a value by reference across different UDFs in the same meta-data. I don't think we want that, though, because for one thing, it means that you have to be careful to serializea and deserialize these functions in the same order, and removing ones means doing a full deserialization, removing it, and then fully reserializing instead of being able to remove the value at the proto level.

if (proto.hasMacroFunction()) {
return MacroFunction.fromProto(serializationContext, proto);
} else {
return null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't see this code any more. Did this become unnecessary after switching over to registering a deserializer with the auto service?

* Right now we don't have namespacing rules to separate UserDefinedFunction and BuiltInFunction, so theoretically there could be a naming collision.
*/
@API(API.Status.EXPERIMENTAL)
public abstract class SerializedUserDefinedFunction extends CatalogedFunction {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, the fact that PMD didn't like this class with the old name. If we rename the protobuf to something like PUserDefinedFunction, does that fix things? The MetaData proto class doesn't follow that convention, but it would make this more in line with the planner protos where each POJO has a proto equivalent with the P prefix. I might prefer that over to including Serialized in this class's name, though in principle either could work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. changed.

@pengpeng-lu pengpeng-lu requested a review from alecgrieser March 14, 2025 16:46
@alecgrieser alecgrieser changed the title Support UDF in planner Support defining UDFs in RecordMetaData Mar 14, 2025
@alecgrieser
Copy link
Collaborator

Let's merge this once https://github.com/FoundationDB/fdb-record-layer/actions/runs/13861613630 completes. The release workflow should be robust to things merging causing conflicts, but it would be nice to get a clean run first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants