Skip to content

[FLINK-34251][core] ClosureCleaner to include reference classes for non-serialization exception #26776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

liuml07
Copy link
Member

@liuml07 liuml07 commented Jul 9, 2025

This is a revised version of the previous (auto-)closed PR #24205
More discussions in the JIRA: https://issues.apache.org/jira/browse/FLINK-34251

What is the purpose of the change

Currently the ClosureCleaner throws exception if {{checkSerializable} is enabled while some object is non-serializable. It includes the non-serializable (nested) object in the exception in the exception message.

However, when the user job program gets more complex pulling multiple operators each of which pulls multiple 3rd party libraries, it is unclear how the non-serializable object is referenced as some of those objects could be nested in multiple levels. For example, following exception is not straightforward where to check:

org.apache.flink.api.common.InvalidProgramException: java.lang.Object@528c868 is not serializable. 

It would be nice to include the reference stack in the exception message, as following:

org.apache.flink.api.common.InvalidProgramException: java.lang.Object@72437d8d is not serializable.
Referenced via [com.mycompany.myapp.ComplexMap -> com.mycompany.myapp.LocalMap -> 
com.yourcompany.yourapp.YourPojo -> com.hercompany.herapp.Random -> java.lang.Object]

Verifying this change

This change is largely covered by existing tests, and new test case was added to ClosureCleanerTest.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@liuml07
Copy link
Member Author

liuml07 commented Jul 9, 2025

Could you review this, @davidradl ? Thanks

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 9, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

final String msg = e.getMessage();
// Verify that the error message contains the reference chain
final String regex =
".*ComplexMap -> .*LocalMap -> .*ClosureCleanerTest.* -> .*Object.*";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: this looks useful. I have some comments in case they are useful for you. You do not need to act on them.

  • I wonder about the use of -> as the classes could have children so a list does not necessarily represent the branching. Would it be worth using , as the separator?
  • I assume it is not just the class but also the value that could cause this to be not able to be serialized.. It might be that we have values with the same type at different locations in the children. I do not know how hard these issues are to debug; we could use the declared fields to serialize the object child by child to find out which one is
    not serializing?

@github-actions github-actions bot added community-reviewed PR has been reviewed by the community. and removed community-reviewed PR has been reviewed by the community. labels Jul 15, 2025
@afedulov
Copy link
Contributor

afedulov commented Jul 22, 2025

@gyfora if I recall correctly we ran into exactly the issue addressed by this PR in the Operator code. Could you please take a look?

@github-actions github-actions bot added community-reviewed PR has been reviewed by the community. and removed community-reviewed PR has been reviewed by the community. labels Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-reviewed PR has been reviewed by the community.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants