[tmva][sofie] Improve AD-friendlieness of emitted code for Clad#21896
Merged
guitargeek merged 1 commit intoroot-project:masterfrom Apr 14, 2026
Merged
[tmva][sofie] Improve AD-friendlieness of emitted code for Clad#21896guitargeek merged 1 commit intoroot-project:masterfrom
guitargeek merged 1 commit intoroot-project:masterfrom
Conversation
Test Results 22 files 22 suites 3d 7h 19m 23s ⏱️ For more details on these failures, see this check. Results for commit c558240. ♻️ This comment has been updated with latest results. |
lmoneta
approved these changes
Apr 14, 2026
Member
lmoneta
left a comment
There was a problem hiding this comment.
LGTM!
Thank you, Jonas, for these improvements, which help CLAD. I have only a question on whether is better to impelment a Copy function or using directly std::copy.
This commit refactors SOFIE-generated inference code to enable correct and efficient reverse-mode automatic differentiation with Clad. Key changes: * Introduce explicit primitive operations (`Copy`, `Fill`, `Relu`) in SOFIE_common.hxx and provide corresponding custom pullbacks in CladDerivator.h. This replaces previously inlined loops and allows Clad to generate efficient gradient code without relying on tapes or loop-level differentiation. * Update Gemm code generation to emit Copy/Fill instead of manually expanding bias initialization loops. This better exposes the intent and improves AD performance and correctness. * Replace manual ReLU loops with a dedicated Relu() call, enabling a custom pullback that avoids tape-based condition tracking. * Generate an additional "unoptimized" model variant in the SOFIE test suite (`OptimizationLevel::kBasic`), and use it for AD tests. This disables memory reuse of intermediate tensors. Opaque memory reuse is safe for inference but breaks source-transformation AD. * Improve gradient test diagnostics in SOFIE Clad tests by reporting mismatched indices instead of only checking a global max difference. With these changes, Clad-generated gradients for SOFIE models are both correct and significantly faster, reaching performance comparable to frameworks such as PyTorch and JAX on the CPU for the tested cases (fully-connected neural networks with multiple layers).
c558240 to
b7655c7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit refactors SOFIE-generated inference code to enable correct and efficient reverse-mode automatic differentiation with Clad.
Key changes:
Introduce explicit primitive operations (
Copy,Fill,Relu) in SOFIE_common.hxx and provide corresponding custom pullbacks in CladDerivator.h. This replaces previously inlined loops and allows Clad to generate efficient gradient code without relying on tapes or loop-level differentiation.Update Gemm code generation to emit Copy/Fill instead of manually expanding bias initialization loops. This better exposes the intent and improves AD performance and correctness.
Replace manual ReLU loops with a dedicated Relu() call, enabling a custom pullback that avoids tape-based condition tracking.
Generate an additional "unoptimized" model variant in the SOFIE test suite (
OptimizationLevel::kBasic), and use it for AD tests. This disables memory reuse of intermediate tensors. Opaque memory reuse is safe for inference but breaks source-transformation AD.Improve gradient test diagnostics in SOFIE Clad tests by reporting mismatched indices instead of only checking a global max difference.
With these changes, Clad-generated gradients for SOFIE models are both correct and significantly faster, reaching performance comparable to frameworks such as PyTorch and JAX on the CPU for the tested cases (fully-connected neural networks with multiple layers).