Skip to content

Exact FLOP costs for batched/broadcast contractions; unify on FMA=2#114

Merged
spMohanty merged 9 commits into
mainfrom
dev/contraction-cost-fma2
Jun 6, 2026
Merged

Exact FLOP costs for batched/broadcast contractions; unify on FMA=2#114
spMohanty merged 9 commits into
mainfrom
dev/contraction-cost-fma2

Conversation

@spMohanty
Copy link
Copy Markdown
Collaborator

Summary

Routes the binary-contraction family — matmul, dot, inner, vecmat, matvec, vecdot
through a single shared cost path built on the symmetry-aware einsum accumulation model, so per-op
FLOP counts are exact for all operand layouts (batched, broadcast, mixed-rank, 1-D-promoted),
not just the 2-D case. Previously these ops used per-op fallback formulas for non-2-D inputs that
did not account for all batch/broadcast axes; counting now derives from the operation's einsum
contraction structure, so it matches fnp.einsum(<equivalent subscripts>) exactly. The
contraction-order path search is unified onto the same FMA=2 convention used for billing,
and the legacy cost fallback is removed — one cost model end-to-end.

What changes

  • New shared helper _einsum_routed_binary — builds the op's einsum subscripts and routes cost +
    output-symmetry inference through _resolve_cost_and_output_symmetry (the path matmul/dot 2-D
    already used). All six wrappers use it.
  • vecmat/matvec/vecdot now count batch/broadcast axes exactly and bill in FMA=2.
  • matmul/dot N-D & mixed-rank and inner N-D route through einsum instead of the
    a.size * b.size fallback.
  • Path search (≥3-operand einsum) uses the FMA=2 accumulation cost — identical to billing — and
    the legacy FMA=1 fallback is removed.
  • linalg.lstsq uses matmul_cost directly (its 2-D×1-D workaround is no longer needed).
  • Docs: corrected stale "FMA=1" labels and dropped references to the removed fma_cost setting.

Gaming-resistance

Complements the existing no-gaming property (symmetric cost ≤ dense) with the dual guard: contraction
cost cannot be under-counted by re-expressing a matmul as a batched vector op. New parity tests
assert every op equals its einsum equivalent across batched/broadcast/mixed-rank shapes and that
cost scales with the batch dimension.

Breaking change

FLOP costs change for the affected ops (exactness + FMA=2 unification). Consumers that pin or budget
on absolute FLOP counts should re-baseline. A @ A symmetric behavior (symmetry-aware cost +
SymmetricTensor output) and all 2-D costs already routed through einsum are unchanged.

Out of scope

  • tensordot partial-contraction symmetry path (keeps direct_product_groups).
  • A @ A.T symmetry detection for non-symmetric A (write as einsum).
  • outer/kron/vdot (no contracted axis; already exact).

Test plan

  • New parity + batch-scaling tests for all six ops; A@A symmetry preserved.
  • Multi-operand einsum path-selection regression (chosen paths + billed totals).
  • Updated snapshots: test_cost_formula_vs_code.py, test_issue_69_cost_parity.py[lstsq],
    test_fma_unification.py, PathInfo snapshots.
  • Full suite, ruff, pyright, coverage ≥90%, client-server-sync, registry-conformance.

Test-suite robustness (no product code change)

  • tests/accumulation/test_deletion_safety.py now restores the package's lazy upstream
    __getattr__ shim after it imports the local _paths/_path_random submodules. Without this,
    those imports leaked into the package __dict__ and shadowed the shim for the rest of the
    process, causing the custom/random-optimizer tests in test_opt_einsum_paths.py to fail under
    serial (-n 0) execution (they passed under xdist, which distributes the tests across workers).

spMohanty added 9 commits June 6, 2026 03:04
…elper

Route matmul/dot/inner's 2-D/1-D contraction paths through one shared
_einsum_routed_binary helper that charges the FMA=2 symmetry-aware einsum
accumulation cost, preserves operand aliasing, and wraps symmetric results
as SymmetricTensor. Behavior-preserving: no FLOP-cost changes. The size*size
fallback for batched/mixed-ndim cases is left unchanged.

Add the helper to the test_overhead_coverage AST-lint exemption set, since it
is a shared cost-routing helper invoked by already-decorated wrappers and must
not itself carry @_counted_wrapper.
…l N-D

Route the broadcast contraction ops through the shared einsum cost path so
batch/broadcast axes on either operand are counted exactly, matching the
equivalent fnp.einsum. Also unifies these ops on the FMA=2 convention.

BREAKING CHANGE: FLOP costs change for vecmat, matvec, vecdot, and N-D/mixed
matmul. Consumers that pin or budget on absolute FLOP counts should re-baseline.
N-D dot/inner contract one axis and outer-product the rest; replace the
a.size*b.size / a.size*b.shape[-1] fallbacks with generated distinct-label
einsum subscripts so the cost is exact. inner now wraps tracked inputs
consistently with dot/matmul.

BREAKING CHANGE: FLOP costs change for dot/inner with >2-D operands.
The contraction-order search called flop_count with the legacy index-set
signature (FMA=1); thread per-step subscripts/shapes so it uses the same FMA=2
accumulation cost as billing, and remove the dead fallback. One cost model
end-to-end. Binary einsums are unaffected (single step).

BREAKING CHANGE: multi-operand einsum path selection and billed totals may
change where FMA=2 vs FMA=1 flips the cheapest order.
The deletion-safety tests intentionally import the local _paths/_path_random
submodules to assert they exist. That registers them in the
flopscope._opt_einsum package __dict__, permanently shadowing __init__.py's
lazy __getattr__ hook that maps oe._paths/oe._path_random to the upstream
opt_einsum modules. The leak broke tests run later in the same process — the
custom-optimizer tests in test_opt_einsum_paths.py rely on oe._path_random
being upstream (otherwise isinstance(optimize, PathOptimizer) flips to False,
the optimizer is forwarded to upstream, and it raises
"TypeError: 'RandomGreedy' object is not iterable").

xdist masked this (tests land on different workers); a serial run (-n 0)
deterministically exposed 5 failures. Add an autouse teardown that drops the
_paths/_path_random shadows after each test (sys.modules left intact to
preserve class identity; _helpers is intentionally kept local), plus a
regression test for the restoration invariant.
These were committed in the FMA=2 path-search change without ruff format;
no logic changes (whitespace only). Restores a clean `ruff format --check`.
Billing has used the FMA=2 textbook convention (multiplies and adds counted
separately) for a while; these labels still said FMA=1. Correct them and drop
references to the removed fma_cost setting. No FLOP numbers change.

- _flops.py matmul_cost, _polynomial.py polyval, _pointwise.py convolve/correlate
  docstrings: FMA=1 -> FMA=2.
- _opt_einsum __init__/NOTICE/_contract docstrings: FMA=2 via the accumulation
  model; state there is no fma_cost setting (opt_cost labelled as the upstream
  opt_einsum convention).
- data/weights.csv contraction notes: FMA=2 with the exact billed formulas
  (2*M*K*N - M*N for dot/matmul, 2*N - 1 for inner/vdot).
Routing matmul/dot/inner/vecmat/matvec/vecdot through the einsum
accumulation cost surfaced that _build_size_map rejected a shared label
appearing with sizes {1, N} — but that is NumPy broadcasting (the size-1
axis broadcasts to N). The same gap affected fnp.einsum with an explicit
size-1 broadcast batch axis.

Treat a size-1 axis as broadcastable: a label's size is the broadcast
extent (the non-1 value); only a mismatch where neither size is 1 is a
genuine inconsistency. The change is additive — it only converts
previously-raised broadcast errors into the correct (broadcasted) cost;
inputs whose label sizes already agreed are unaffected, and the
off-by-one output-orbit credit is applied per broadcasted output.

Fixes numpy-compat failures: numpy's own test_ufunc::test_output_argument
and ::test_axis_argument exercise np.vecdot with a size-1 batch operand.
@spMohanty spMohanty merged commit f6b8075 into main Jun 6, 2026
22 checks passed
@spMohanty spMohanty deleted the dev/contraction-cost-fma2 branch June 6, 2026 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant