Skip to content

feat: add public C++ operator API#618

Merged
voltjia merged 4 commits into
masterfrom
feat/cpp-operator-api
May 26, 2026
Merged

feat: add public C++ operator API#618
voltjia merged 4 commits into
masterfrom
feat/cpp-operator-api

Conversation

@voltjia
Copy link
Copy Markdown
Collaborator

@voltjia voltjia commented May 20, 2026

Summary

  • Add the unified public entry header include/infini/ops.h for C++ consumers.
  • Generate public one-shot operator declarations under infini::ops::functional in generated/include/infini/functional_ops.h.
  • Compile generated functional operator definitions into libinfiniops.
  • Route pybind one-shot operator calls through infini::ops::functional::<OpName>.
  • Install public headers, libinfiniops, pkg-config metadata, and CMake package metadata.

Motivation

This is the first PR in the public operator API split. It adds the C++ source/link convenience layer first so the later stable C ABI can be implemented as a thin adapter over public C++ operator entrypoints instead of including backend kernel headers directly.

Closes #N/A

Type of Change

  • feat - new feature / new operator / new platform
  • N/A - fix - bug fix
  • N/A - perf - performance improvement (no behavioral change)
  • N/A - refactor - code restructuring without behavior change
  • N/A - test - adding or fixing tests only
  • N/A - docs - documentation only
  • build / ci - build system or CI configuration
  • N/A - chore - tooling, formatting, or other non-code changes
  • N/A - Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Test Results on Supported Platforms

All platform tests below were run remotely with unqualified pytest and the platform bound to physical card 6.

Platform Built pytest Result Notes / Hardware
NVIDIA Yes 9206 passed, 8665 skipped, 81 warnings in 362.25s ssh nvidia, CUDA_VISIBLE_DEVICES=0 inside Docker with --gpus device=6.
Iluvatar Yes 7704 passed, 8649 skipped, 81 warnings in 575.63s ssh iluvatar, CUDA_VISIBLE_DEVICES=6; CoreX compiler bin mounted at /usr/local/corex-4.3.0.20250624/bin.
MetaX Yes 8698 passed, 7655 skipped, 81 warnings in 401.38s ssh metax, CUDA_VISIBLE_DEVICES=6, MACA_VISIBLE_DEVICES=6.
Cambricon Yes 5899 passed, 10070 skipped, 172 warnings in 991.75s ssh cambricon, MLU_VISIBLE_DEVICES=6.
Moore Yes 8471 passed, 7900 skipped, 99 warnings in 584.02s ssh moore, MUSA_VISIBLE_DEVICES=6; no _musa_smoke_passes guard.
Ascend Yes 7405 passed, 8906 skipped, 98 warnings in 552.77s ssh ascend, ASCEND_RT_VISIBLE_DEVICES=6; pytest inner return code was 0.
Full `pytest` output (optional)
# NVIDIA
========= 9206 passed, 8665 skipped, 81 warnings in 362.25s (0:06:02) ==========

# Iluvatar
========= 7704 passed, 8649 skipped, 81 warnings in 575.63s (0:09:35) ==========

# MetaX
========= 8698 passed, 7655 skipped, 81 warnings in 401.38s (0:06:41) ==========

# Cambricon
======== 5899 passed, 10070 skipped, 172 warnings in 991.75s (0:16:31) =========

# Moore
========= 8471 passed, 7900 skipped, 99 warnings in 584.02s (0:09:44) ==========

# Ascend
========= 7405 passed, 8906 skipped, 98 warnings in 552.77s (0:09:12) ==========
PYTEST_INNER_RC=0

Additional checks:

git diff --check
python3 -m py_compile tests/utils.py scripts/generate_wrappers.py
ruff format --check tests/utils.py scripts/generate_wrappers.py
ruff check tests/utils.py scripts/generate_wrappers.py

Benchmark / Performance Impact

N/A. This PR adds public API wrappers and build/install plumbing. It does not change operator kernels or dispatch selection.

Notes for Reviewers

  • The branch was rebased onto origin/master at 1400daff; current head is 4f451af0.
  • The C++ API is intentionally a source/link convenience layer, not a stable C++ ABI commitment.
  • Public one-shot calls live in infini::ops::functional to avoid colliding with existing operator classes such as infini::ops::Add.
  • Backend implementation headers remain in generated source files; the public header does not include backend kernel headers.
  • The wrapper generator now has a constrained fallback parser for the current base operator headers when Python clang.cindex is unavailable in direct CMake builds.
  • The WITH_TORCH generated include directory is build-interface-only, which fixes the CMake install/export failure seen in wheel builds.
  • _musa_smoke_passes was removed. The low Moore count was caused by filtering out musa; selecting physical card 6 with MUSA_VISIBLE_DEVICES=6 makes the normal torch.musa.is_available() path work.
  • Ascend is no longer CPU-only in this result; ASCEND_RT_VISIBLE_DEVICES=6 produced ('cpu', 'npu') device discovery before the full pytest run.
  • This PR should land before the C ABI PR. The follow-up C ABI PR should target this branch.

Generated public C++ signature examples:

void infini::ops::functional::Add(const Handle& handle, const Config& config,
                                  const Tensor input, const Tensor other,
                                  Tensor out);

void infini::ops::functional::Matmul(const Handle& handle, const Config& config,
                                     const Tensor a, const Tensor b, Tensor c,
                                     bool trans_a, bool trans_b);

void infini::ops::functional::Linear(const Handle& handle, const Config& config,
                                     const Tensor a, const Tensor b,
                                     std::optional<Tensor> bias, bool trans_a,
                                     bool trans_b, Tensor out);

Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits: feat: add public C++ operator API.
  • Branch name follows the expected <type>/... form: feat/cpp-operator-api.
  • Commit messages follow Conventional Commits.
  • Small PR is a single squashable change set.
  • Branch was rebased onto current origin/master.
  • No stray merge commits from master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal for the stated motivation.
  • No dead code, commented-out blocks, debug prints, or ownerless TODOs were added.
  • No unrelated formatting churn was added.
  • Public API changes are intentional and covered by smoke tests.

General Code Hygiene

  • Comments were added only where they explain non-obvious behavior.
  • Modified and added files end with a single trailing newline.
  • git diff --check passed.
  • Comments and error messages are in English.
  • Comments and error messages are complete sentences where applicable.

C++ Specific

  • Code follows the repository's Google C++ style conventions to the extent verified manually.
  • clang-format version 21.1.0 was run remotely and the final dry-run passed.
  • N/A - clang-tidy was not run in this session; this PR does not add kernel code or template-heavy operator implementation code.
  • Operator parameter order is preserved from the generated base operator signatures.
  • No explicit exceptions are thrown by the added code.
  • N/A - no kernel files or launchers were added.
  • N/A - no new operator implementation classes were added.
  • No raw new/delete was added by this PR.

Python Specific

  • Python changes are limited to tests and generator code.
  • ruff format and ruff check were run remotely and passed.
  • Python smoke/full tests passed remotely.

Testing

  • Relevant remote NVIDIA build and pytest results are recorded above.
  • All supported remote platforms affected by this PR were tested and recorded in the table above.
  • New public C++ API has a smoke test under tests/test_cpp_api.py.
  • N/A - no new operator numerical tests were added.
  • N/A - no bug-fix regression test is required.

Build, CI, and Tooling

  • N/A - full pip install .[dev] was not used for the final full-platform runs; each container installed the required build/test packages and then used pip install . --no-build-isolation --no-deps.
  • CMake configure/build/install paths were verified remotely.
  • GENERATE_PYTHON_BINDINGS=ON pybind target still builds remotely.
  • No new runtime dependency was added.

Documentation

  • N/A - README-level public API documentation is deferred; this PR exposes the API through installed headers and smoke tests.
  • Public API surface is exposed through installed headers and covered by smoke tests.
  • N/A - no user-visible breaking change is introduced.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers were committed.
  • No third-party code was added.
  • No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

@voltjia
Copy link
Copy Markdown
Collaborator Author

voltjia commented May 21, 2026

请列出几组通过这样暴露出的 API 的函数签名。

Comment thread src/common/constexpr_map.h Outdated
Comment thread src/common/traits.h Outdated
Comment thread src/data_type.h Outdated
Comment thread src/data_type.h Outdated
Comment thread src/hash.h Outdated
Comment thread src/tensor.h Outdated
@voltjia voltjia force-pushed the feat/cpp-operator-api branch from 4b54fc5 to e63f0b0 Compare May 25, 2026 08:12
@voltjia voltjia force-pushed the feat/cpp-operator-api branch from e63f0b0 to b904d28 Compare May 25, 2026 09:20
Comment thread tests/utils.py Outdated
@voltjia
Copy link
Copy Markdown
Collaborator Author

voltjia commented May 26, 2026

Addressed the review concern around _musa_smoke_passes.

Root cause: the earlier Moore run was not pinned to a working card, so torch.musa.synchronize() hung and the added smoke guard filtered out musa, reducing coverage to CPU-only scale. With physical card 6 selected via MUSA_VISIBLE_DEVICES=6, torch.musa.is_available() reports true and the normal device discovery path works. I removed _musa_smoke_passes in 4f451af0.

I also reran unqualified pytest on all six platforms pinned to card 6 and updated the PR description with the full results. Moore is now 8471 passed, 7900 skipped; Ascend is 7405 passed, 8906 skipped with ('cpu', 'npu') discovered before the run.

@voltjia voltjia marked this pull request as ready for review May 26, 2026 07:51
@voltjia voltjia requested review from a team, Ziminli and crapromer May 26, 2026 07:51
@voltjia
Copy link
Copy Markdown
Collaborator Author

voltjia commented May 26, 2026

@crapromer@wooway777 初审,@Ziminli 终审。

@voltjia voltjia requested a review from wooway777 May 26, 2026 10:36
Copy link
Copy Markdown

@wooway777 wooway777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

回头把ci过一下吧,ci服务器g了。老黄说没问题反正

@voltjia voltjia merged commit d6804bf into master May 26, 2026
37 of 52 checks passed
@voltjia voltjia deleted the feat/cpp-operator-api branch May 26, 2026 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants