[INLONG-12090][SDK] Build standard Python Dataproxy SDK wheels based on PEP-517#12091
Open
hzqmwne wants to merge 1 commit intoapache:masterfrom
Open
[INLONG-12090][SDK] Build standard Python Dataproxy SDK wheels based on PEP-517#12091hzqmwne wants to merge 1 commit intoapache:masterfrom
hzqmwne wants to merge 1 commit intoapache:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[INLONG-12090][SDK] Build standard Python Dataproxy SDK wheels based on PEP-517
Fixes #12090 (Partially)
Motivation
This PR improves the packaging and distribution story of the InLong DataProxy Python SDK by moving toward a PEP 517 compliant build. The goal is to make it easier for the community to publish and maintain prebuilt manylinux wheels for multiple CPython versions, instead of relying on legacy/local installation patterns.
In addition, this PR is written with the expectation that the project maintainers will eventually publish the SDK on PyPI and keep releases up-to-date with both InLong and CPython evolution (including timely rebuilds when Python or manylinux baselines move forward).
Modifications
Added PEP 517 build metadata for the Python SDK:
inlong-sdk/dataproxy-sdk-twins/dataproxy-sdk-python/pyproject.tomlNote: the metadata in this file is illustrative. Maintainers should adjust it as needed (package name, description, URLs, classifiers, etc.). The
versionfield is expected to be continuously updated according to the chosen release strategy (e.g., aligned with the InLong release train, independently versioned, etc.).Updated CMake integration to support both legacy and PEP 517 builds:
inlong-sdk/dataproxy-sdk-twins/dataproxy-sdk-python/CMakeLists.txtChanges include: prefer vendored
pybind11/when present (only for compatible with legacy build.sh), otherwise locatepybind11from the Python build environment, and install artifacts intoSKBUILD_PLATLIB_DIRfor wheel builds (including the.pyi).Added Python type stub file to improve typing support:
inlong-sdk/dataproxy-sdk-twins/dataproxy-sdk-python/inlong_dataproxy.pyiThis stub will require ongoing maintenance as the binding surface evolves.
Added a manylinux build container example for producing wheels across multiple CPython versions:
inlong-sdk/dataproxy-sdk-twins/dataproxy-sdk-docker/Dockerfile_pythonThis Dockerfile demonstrates building an sdist and multiple wheels, followed by
auditwheel repair.This is just for local test. The project offcial maintainers should publish the Python SDK to pypi.org.
Documented the wheel build entrypoint:
inlong-sdk/dataproxy-sdk-twins/dataproxy-sdk-docker/README.mdIncludes a short snippet on how to run the Docker build to produce wheels.
Verifying this change
(Please pick either of the following options)
This PR does not add new automated tests. It can be verified via packaging/build smoke checks:
Build wheels in manylinux via Docker:
cd inlong-sdk/dataproxy-sdk-twinsdocker build -f dataproxy-sdk-docker/Dockerfile_python .Validate produced artifacts:
auditwheel(typically underwheelhouse/in the image).python -c "import inlong_dataproxy; from inlong_dataproxy import InLongApi"Documentation
Additional Notes / Follow-ups for Maintainers
Static linking & licensing: Because several
third_partylibraries are linked statically, the resulting Python extension.sohas no external third-party runtime dependencies, which is beneficial for distribution and manylinux wheel packaging. However, this also means we must pay close attention to the licenses of those bundled third-party libraries. I did not audit them in this PR, so maintainers should review licensing/compliance carefully before merging.PyPI publishing expectation: It needs to create the project on PyPI and publish manylinux wheels for multiple CPython versions, with timely rebuilds as InLong/CPython releases progress.
Platform & Python baseline & sdist: The current Python SDK effectively targets Linux-only. This PR sets the baseline to Python >= 3.8; since Python 3.8 is approaching EOL and the manylinux ecosystem evolves, the baseline may need to be raised again soon. It is recommended to reflect these constraints clearly in the PyPI project description/classifiers. Given this, whether to ship sdist in addition to wheels is still an open decision. Because the current sdist bundles prebuilt artifacts (e.g.,
.aarchives of third-party libraries and the DataProxy C++ SDK), it is not a “source” distribution in the usual sense and is effectively Linux-bound. This also conflicts with common expectations for sdists, and shipping binaries inside an sdist is itself a debated practice. The upside is that when a prebuilt wheel is not available, users on a compatible Linux system may still have a good chance to build the package locally. However, on unsupported platforms (e.g., Windows), the build will inevitably fail and the resulting error messages are likely to be confusing and unfriendly. Alternatively, we could reshape the sdist so that it becomes a true source-only distribution and enables cross-platform builds. However, handling thethird_partydependency chain is non-trivial and would require maintainers to evaluate the best approach.Type stubs maintenance: The added
.pyiimproves usability but should be kept in sync with future API changes.Long-term direction (optional): If maintainers want to reduce per-version wheel builds (e.g., via
abi3) and/or improve automated.pyitype stub generation, migrating bindings frompybind11tonanobindcould be considered. This is optional and maintainer-driven.Cross-platform ambition: Most
third_partydependencies of the DataProxy C++ SDK are cross-platform. If the DataProxy C++ SDK itself gains Windows/macOS support, the Python SDK can follow naturally. A pragmatic first step could be supporting MSYS2 ucrt64 on Windows. If maintainers decide to pursue cross-platform support in the C++ layer, follow-up work for Python packaging can be proposed accordingly.Release workflow: Publishing to PyPI typically benefits from an automated pipeline (PyPI “Trusted Publishing” is commonly recommended; build matrices are often handled by
cibuildwheel). However, the final approach should match the project’s preferred release infrastructure and governance.