feat(timing): precise client/server timing split — pure-kernel backend, full-dispatch overhead, participant-only residual by spMohanty · Pull Request #115 · AIcrowd/flopscope

spMohanty · 2026-06-06T14:15:37Z

Summary

The per-MLP timing split (wall = backend + overhead + residual) feeds the
leaderboard — residual is the billed bucket (C_m = F_m + λ·R_m). On the
server-backed path the client proxy reported it imprecisely: the framework's own
cost (request encode/decode, the ZMQ round-trip, and result reconstruction
such as .tolist()) leaked into the billed residual. Concretely, the grading
harness serializes the participant's predictions with preds.tolist() inside
the budget context — so participants were billed for the harness materializing
their own output.

This makes the decomposition precise and physically grounded:

bucket	definition	measured
backend	the pure numpy kernel — the actual numerical computation	server: times only the numpy call (`_run_kernel`); reports it as `compute_time_ns`
overhead	all flopscope machinery: client encode/decode/reconstruction + the wire + server-side marshaling/storage; not billed	client: `dispatch − backend`
residual	the participant's own Python, outside any flopscope call (the sandbox has no numpy); the billed bucket	client: `wall − dispatch`

What changed

flopscope-server: a _run_kernel chokepoint times only the numpy call;
compute_time_ns now reports kernel-only (arg marshaling, cost model, result
storage, and fetch/serialize contribute 0). No new wire field — the
existing total_compute_time_ns simply narrows in meaning.
flopscope-client: a new _dispatch.py accumulator
(dispatch_span / timed_dispatch, with baseline/delta nesting so each op's
wall is counted exactly once) wraps every op-dispatch entry point —
including the data-materialization methods (tolist, __repr__, __str__,
__float__, __int__, __bool__) so result reconstruction lands in
overhead, not residual. BudgetContext computes
overhead = dispatch − kernel, residual = wall − dispatch.

Test plan

Server: test_compute_time_is_kernel_only, test_fetch_contributes_no_kernel; full server suite green (206).
Client: _dispatch nesting unit tests (no double-count); decomposition unit tests (identity + both clamp branches); real client↔server integration suite.
Acceptance criteria (all green): test_tolist_is_overhead_not_residual — 10× .tolist() of a 128² array with no participant Python yields residual ≈ 0.66 ms (reconstruction correctly in overhead); test_residual_is_only_python (a sleep(0.2) is the only thing in residual); test_worker_tolist_not_billed; the no-double-count identity test; and a coverage test asserting every op family increments the dispatch accumulator.
Zero regressions: full client suite failure/error count unchanged from the pre-existing baseline.

Rollout (not in this PR)

Matched-version release: cz bump of flopscope / flopscope-client / flopscope-server together. The hello handshake enforces exact version match, so the narrowed compute_time_ns meaning can't drift across a version mismatch.
The consuming evaluator re-pins both flopscope[server] and flopscope-client to the new version.
This turns on real residual billing — re-scoring is a separate, owned workstream.
The participant-facing contract docs (whestbench-public) are updated separately.

Add _extract_compute_ns and _decompose_timing to flopscope-client/_budget.py. Both are pure functions (no I/O) that carry the close-response parsing and wall/backend/overhead/residual decomposition math needed by __exit__ wiring.

… + round-trip Wire __enter__ to snapshot wall-clock and round-trip baseline before the budget_open send, and __exit__ to read server compute time from the close response's comms_summary, then call _decompose_timing to fill _wall_time_s / _flopscope_backend_time / _flopscope_overhead_time / _residual_wall_time. Add integration regression suite (6 tests) that spawns a real FlopscopeServer and asserts the split is non-zero, decomposes wall, and correctly assigns participant sleep to residual. Update unit test mocks to configure comms_tracker.total_round_trip_ns as an int so __enter__/__exit__ arithmetic does not TypeError.

Apply @timed_dispatch / timed_dispatch(proxy) at all client op-dispatch entry points so every flopscope op family increments the dispatch accumulator: _fetch_data, __getitem__, _dispatch_op, RemoteGenerator._call, _make_proxy, _make_linalg_proxy, _make_random_proxy, _DistributionProxy.{pdf,cdf,ppf}, flops.{einsum,svd}_cost, and the special-cased array()/einsum(). Coverage test (test_every_op_family_increments_dispatch) verifies each family before and after with a real server subprocess.

…h - kernel)

…antics Replace Option-1 body tests with Option-3 equivalents: backend = pure kernel, overhead = all flopscope machinery (incl. .tolist() and implicit fetches), residual = participant Python only. Adds test_tolist_is_overhead_not_residual as the Option-3 acceptance criterion and keeps test_every_op_family_increments_dispatch.

…ead, not residual

…tests

…dual

spMohanty added 17 commits June 6, 2026 12:20

feat(client): accumulate per-connection round-trip time in send_recv

de85427

feat(client): add pure timing-decomposition helpers

c5e1622

Add _extract_compute_ns and _decompose_timing to flopscope-client/_budget.py. Both are pure functions (no I/O) that carry the close-response parsing and wall/backend/overhead/residual decomposition math needed by __exit__ wiring.

feat(client): expose timing-split properties on BudgetContext

7a67e80

test(server): pin that close() surfaces recorded compute time

221e87f

docs: changelog for client timing split

2590e8b

feat(server): report compute_time as pure numpy-kernel time

bc0d5d5

feat(client): add dispatch-time accumulator with baseline/delta nesting

d12c65b

feat(client): decompose timing from full dispatch (overhead = dispatc…

c9f4a71

…h - kernel)

fix(client): count fetch reconstruction (tolist/repr/scalar) as overh…

ff376f0

…ead, not residual

docs: changelog for precise timing split

830a484

fix(client): count budget-response parsing as overhead, not residual

9f42ada

fix(build): emit timed_dispatch in generated stats/linalg client proxies

06857ed

test: preload client _dispatch module before _remote_array in parity …

2e32237

…tests

style: ruff-format and import-sort the new timing tests

43cdd36

spMohanty force-pushed the feat/client-timing-split branch from fc16292 to 43cdd36 Compare June 6, 2026 14:36

spMohanty added 4 commits June 6, 2026 16:51

fix(client): count flops cost-query round-trips as overhead, not resi…

6f7c9b0

…dual

fix(client): refresh flops_used from server on BudgetContext close

e6ca3b3

feat(client): show backend/overhead/residual timing in budget summary

b0d69b9

test: relax flaky cold-call benchmark budget to 500ms; fix ms formatting

3709dcd

spMohanty merged commit 17e85a3 into main Jun 6, 2026
22 checks passed

spMohanty deleted the feat/client-timing-split branch June 6, 2026 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(timing): precise client/server timing split — pure-kernel backend, full-dispatch overhead, participant-only residual#115

feat(timing): precise client/server timing split — pure-kernel backend, full-dispatch overhead, participant-only residual#115
spMohanty merged 21 commits into
mainfrom
feat/client-timing-split

spMohanty commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

spMohanty commented Jun 6, 2026

Summary

What changed

Test plan

Rollout (not in this PR)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant