Skip to content

[FLINK-39215] [python] Clean up PythonDriver tmp dir on launch failure#28503

Open
qiuyanjun888 wants to merge 1 commit into
apache:masterfrom
qiuyanjun888:fix/flink-39215-python-driver-tmpdir
Open

[FLINK-39215] [python] Clean up PythonDriver tmp dir on launch failure#28503
qiuyanjun888 wants to merge 1 commit into
apache:masterfrom
qiuyanjun888:fix/flink-39215-python-driver-tmpdir

Conversation

@qiuyanjun888

Copy link
Copy Markdown

What is the purpose of the change

This PR fixes FLINK-39215, where PythonDriver could leave behind a generated pyflink/<uuid> temporary directory if PythonEnvUtils.launchPy4jPythonClient(...) failed after preparing the Python environment but before returning the Python process.

Root cause: PythonDriver created the shutdown hook only after launchPy4jPythonClient(...) returned successfully. When preparePythonEnvironment(...) had already populated the temporary directory and startPythonProcess(...) then failed, no shutdown hook existed to clean up that directory.

Brief change log

  • Keep the generated tmpDir and Python process reference visible to the failure path in PythonDriver.
  • Clean up the generated temporary directory and gateway when Python client launch fails before the normal shutdown hook is registered.
  • Add a focused PythonDriverTest regression test that triggers a Python client launch failure with an invalid -pyclientexec and verifies that no per-run pyflink tmp directory remains.

Verifying this change

This change added tests and can be verified as follows:

  • Added PythonDriverTest#testCleanupTmpDirWhenPythonClientLaunchFails.
  • Verified the new regression test failed before the production fix because a pyflink/<uuid> temporary directory was left behind.
  • Verified the focused regression test after the fix:
    • ./mvnw -pl flink-python -Dtest=PythonDriverTest#testCleanupTmpDirWhenPythonClientLaunchFails test -DskipITs -Drat.skip=true -Dcheckstyle.skip=true
  • Verified the full PythonDriverTest class:
    • ./mvnw -pl flink-python -Dtest=PythonDriverTest test -DskipITs -Drat.skip=true -Dcheckstyle.skip=true
  • Verified related Python client process tests with a temporary python -> python3 symlink because this environment does not provide a python command by default:
    • ./mvnw -pl flink-python -Dtest=PythonDriverTest,PythonEnvUtilsTest test -DskipITs -Drat.skip=true -Dcheckstyle.skip=true
  • Verified git diff --check.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

Was generative AI tooling used to co-author this PR?
  • Yes: Hermes Agent (OpenAI GPT-5.5)

Generated-by: Hermes Agent (OpenAI GPT-5.5)

@flinkbot

flinkbot commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants