You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Double down on Nx rather than retreating to scalar math. The small-fixed-size overhead (eager per-op dispatch on 3-element vectors) is an accepted trade-off: machines running BB have ample CPU headroom for the per-op cost, and committing to Nx unlocks the payoff that matters — algorithms expressed as defn can be reused and composed into larger computations (batched FK/IK, multi-target solves, Jacobians, and control laws), rather than being locked inside eager per-call Elixir.
Direction
Push hot paths through defn rather than eager per-op BB.Math calls — FK tree walks and per-iteration IK are the first candidates (bb_ik_dls already uses defn compute_update; extend that boundary outward rather than wrapping it in eager Vec3/Transform accessors).
Minimise Nx.to_number/1 round-trips inside hot loops (each forces a device read); keep values as tensors through a computation and extract scalars only at the boundary.
Accepted trade-off
Single-pose/small-vector operations stay relatively expensive versus hand-rolled scalar float math, and EXLA makes tiny-tensor dispatch worse, not better — so the win is realised in batched/composed work, not single ops. Benchmarks (#149) are still worth having to know the actual cost, but they're no longer gating the decision.
Note on bb_estimator_ahrs
bb_estimator_ahrs currently keeps its own scalar Quaternion/Math for its hot per-sample loop, converting to BB.Math only at message boundaries. Under this direction that's the exception, not the model — worth revisiting whether its estimators can move to defn (Madgwick/Mahony are very amenable) once the defn patterns settle, but not urgent.
Decision (2026-06-16)
Double down on Nx rather than retreating to scalar math. The small-fixed-size overhead (eager per-op dispatch on 3-element vectors) is an accepted trade-off: machines running BB have ample CPU headroom for the per-op cost, and committing to Nx unlocks the payoff that matters — algorithms expressed as
defncan be reused and composed into larger computations (batched FK/IK, multi-target solves, Jacobians, and control laws), rather than being locked inside eager per-call Elixir.Direction
defnrather than eager per-opBB.Mathcalls — FK tree walks and per-iteration IK are the first candidates (bb_ik_dls already usesdefn compute_update; extend that boundary outward rather than wrapping it in eagerVec3/Transformaccessors).defnbuilding blocks. The PID-in-defnproposal in PID control loop has no timing/health guarantees — period drift, no dt, no staleness guard, no jitter metrics bb_pid_controller#43 is exactly this pattern — a control law that can be vectorised across many loops and composed into larger graphs.Nx.to_number/1round-trips inside hot loops (each forces a device read); keep values as tensors through a computation and extract scalars only at the boundary.Accepted trade-off
Single-pose/small-vector operations stay relatively expensive versus hand-rolled scalar float math, and
EXLAmakes tiny-tensor dispatch worse, not better — so the win is realised in batched/composed work, not single ops. Benchmarks (#149) are still worth having to know the actual cost, but they're no longer gating the decision.Note on bb_estimator_ahrs
bb_estimator_ahrscurrently keeps its own scalarQuaternion/Mathfor its hot per-sample loop, converting toBB.Mathonly at message boundaries. Under this direction that's the exception, not the model — worth revisiting whether its estimators can move todefn(Madgwick/Mahony are very amenable) once thedefnpatterns settle, but not urgent.