DiffCFD

Differentiable Computational Fluid Dynamics for Steady-State Inverse Design and Reinforcement Learning

PyTorch-native differentiable fluid dynamics — matrix-free implicit differentiation through SIMPLE-converged steady states with O(N) memory, plus gradient-attached gymnasium.Env for RL.

Status: Early-stage personal research project. Core solver and implicit differentiation verified against analytical solutions. Containerized reproducibility available (Docker). Analytical cross-validation replaces third-party validation.

Honesty boundaries:

CPU-only; no GPU benchmarks have been conducted. GPU benchmark suite in Diff-FlowFSI cross-validation (C9.2) requires CUDA hardware and will skip on CPU-only systems.
Diff-FlowFSI cross-validation (C9.2) is framework-only — no vendored Diff-FlowFSI code is included; users must install Diff-FlowFSI separately to run cross-validation.
No third-party experimental validation. All results are self-measured on a single workstation. Cross-validation against analytical solutions (Ghia 1982, Poiseuille) is automated via make cross-validate.
Spin-coating flagship benchmark: post-K1 fix verified 10/10 valid seeds (0% NaN rate, previously 70%). Wilcoxon p=0.002 confirms joint optimization advantage.

Known stubs / unimplemented:

No solver-level stubs in DiffCFD. All core solvers (NS, heat transfer, implicit diff) are functional and validated.

External CFD Cross-Validation (C8.1)

Independent vorticity-streamfunction solver for cross-validating DiffCFD's SIMPLE-based results. Gradient cross-validation metrics compare AD gradients against finite differences on matching discretizations. A GPUBenchmarkReport is emitted with honest CPU-only annotation (no GPU benchmarks have been conducted).

from diffcfd.validation import LidDrivenCavityBenchmark, GradientCrossValidation

bench = LidDrivenCavityBenchmark(re=100, grid=(64, 64))
report = bench.run()  # runs independent vorticity-streamfunction solver

grad_val = GradientCrossValidation(solver=solver)
metrics = grad_val.compare(u_inlet)  # AD vs FD gradient agreement

RL x AD Hybrid Control (C8.2)

Hybrid control combining differentiable physics gradient warm-starts with PPO fine-tuning, plus a standalone SimplePPO implementation.

from diffcfd.control import ADWarmStartPPO, ADAugmentedPPO, SimplePPO

# AD gradient warm-start + PPO fine-tuning
agent = ADWarmStartPPO(env, n_warmstart_steps=200, ppo_epochs=50)
agent.train()

# AD gradient bonus shaping
agent_aug = ADAugmentedPPO(env, grad_weight=0.1)
agent_aug.train()

# Standalone PPO (no AD dependency)
agent_ppo = SimplePPO(env, lr=3e-4, n_steps=2048)
agent_ppo.train(total_timesteps=10000)

Transient Adjoint (C8.3)

Explicit Euler forward propagation with reverse-time adjoint for transient heat problems. Checkpoint scheduling controls memory-accuracy tradeoff. FD verification confirms relative error < 1e-3.

from diffcfd.adjoint import TransientHeatAdjoint, TransientCheckpointSchedule

schedule = TransientCheckpointSchedule(n_steps=100, n_checkpoints=10)
adjoint = TransientHeatAdjoint(grid=(32, 32), schedule=schedule)
loss, grad = adjoint.solve_and_adjoint(T_hot=800.0, T_cold=300.0)
# FD verification: relative error < 1e-3

sCO2 Calibrated Uncertainty (C8.4)

Conformal prediction coverage guarantees on sCO2 property predictions, with uncertainty propagation to derived quantities (Nusselt number, pressure drop).

from diffcfd.uncertainty import SCO2CalibratedPredictor, UncertaintyPropagation

predictor = SCO2CalibratedPredictor(calibration_data=(T_cal, p_cal, rho_cal))
rho_mean, rho_lower, rho_upper = predictor.predict_with_bounds(T, p, coverage=0.95)

propagator = UncertaintyPropagation(predictor)
nu_bounds = propagator.nusselt_bounds(T, p, velocity, length, coverage=0.95)
dp_bounds = propagator.pressure_drop_bounds(T, p, velocity, length, coverage=0.95)

Variable-Property Conjugate Heat Transfer

HeatTransfer2D.solve_differentiable() now supports a variable-property mode via the props parameter. When a ThermophysicalProps instance (e.g., SCO2Surrogate) is provided, the thermal diffusivity α is recomputed from local (T, p) at each iteration rather than using a constant value. This integrates the sCO₂ differentiable property surrogate directly into the conjugate heat transfer solver for transcritical optimization workflows.

from diffcfd import HeatTransfer2D, SCO2Surrogate

props = SCO2Surrogate()
solver = HeatTransfer2D(grid=(64, 64))
T = solver.solve_differentiable(
    T_hot=800.0, T_cold=600.0, pressure=8.0e6,
    props=props,  # α recomputed from local (T, p) each iteration
)

Solver-in-the-Loop Learned Closure (C9.1)

Learned eddy viscosity model trained with a-posteriori rollout (solver-in-the-loop), including stability curve analysis and a-priori vs a-posteriori benchmarking.

from diffcfd.solvers.learned_closure import (
    LearnedClosureNet,
    SolverInTheLoopTrainer,
    StabilityCurve,
    APrioriVsAPosterioriBenchmark,
)

# Learned eddy viscosity network
net = LearnedClosureNet(input_channels=6, hidden_dim=64)

# Solver-in-the-loop training: unroll through SIMPLE at each training step
trainer = SolverInTheLoopTrainer(
    net=net,
    re_train=1000,
    grid=(64, 64),
    unroll_steps=5,
    lr=1e-3,
)
trainer.train(n_epochs=100)

# Stability curve: maximum stable unroll steps vs Reynolds number
curve = StabilityCurve(net=net, re_range=[100, 500, 1000, 5000])
results = curve.evaluate()

# A-priori vs a-posteriori comparison
bench = APrioriVsAPosterioriBenchmark(net=net, re=1000, grid=(64, 64))
report = bench.run()

References: arXiv:2604.23874, JFM 2022.

Diff-FlowFSI Cross-Validation (C9.2)

Forward-solution and gradient cross-validation framework against Diff-FlowFSI, plus a GPU benchmark suite and long-rollout memory strategy documentation. Requires a separate Diff-FlowFSI installation and CUDA hardware for GPU benchmarks.

from diffcfd.validation.diff_flowfsi_crossval import (
    DiffFlowFSICrossValidator,
    GPUBenchmarkSuite,
    MemoryStrategyDocumenter,
)

# Forward + gradient cross-validation against Diff-FlowFSI
validator = DiffFlowFSICrossValidator(
    re=100, grid=(64, 64),
    diff_flowfsi_path="/path/to/Diff-FlowFSI",
)
report = validator.run()  # compares velocity, pressure, and gradient fields

# GPU benchmark suite (requires CUDA)
gpu_bench = GPUBenchmarkSuite(grids=[(64, 64), (128, 128), (256, 256)])
gpu_report = gpu_bench.run()  # wall-clock, memory, throughput on GPU

# Document memory strategies for long rollouts
doc = MemoryStrategyDocumenter()
doc.generate_report()  # checkpointing vs recomputation trade-off analysis

References: CMAME 2025, arXiv:2505.23940.

Codomain Flow Control Transfer (C9.3)

Cross-Reynolds and cross-geometry flow control transfer via codomain-attention actor with lightweight adapters, trained with PPO.

from diffcfd.envs.codomain_control import (
    CodomainActor,
    TransferAdapter,
    CodomainPPO,
    TransferBenchmark,
)

# Codomain-attention actor: conditions on Reynolds/geometry descriptor
actor = CodomainActor(
    state_dim=64,
    action_dim=4,
    codomain_dim=8,  # Reynolds/geometry embedding
)

# Transfer adapter for new regimes
adapter = TransferAdapter(actor=actor, adapter_rank=4)

# Train with codomain-conditioned PPO
agent = CodomainPPO(
    actor=actor,
    envs=[env_re100, env_re500, env_re1000],
    lr=3e-4,
)
agent.train(total_timesteps=50000)

# Benchmark transfer: train on Re=100, evaluate at Re=500, Re=1000
bench = TransferBenchmark(actor=actor, adapter=adapter)
transfer_report = bench.evaluate(
    train_re=100,
    eval_re_list=[500, 1000, 2000],
)

References: CoDA-NO NeurIPS 2024, arXiv:2509.10185.

Why DiffCFD?

Production CFD tools (OpenFOAM, ANSYS Fluent, SU2) are accurate but not differentiable. Existing differentiable CFD frameworks each have a structural gap:

Framework	Gap
PhiFlow / JAX-Fluids	Transient time-stepping only — no steady-state implicit diff
JAX-Fluids 2.0 (CoPhC 309, 2025)	HPC differentiable CFD, 512xA100 — transient only, no steady-state, no RL
Diff-FlowFSI (arXiv:2505.23940, 2025)	GPU-optimized differentiable FSI in JAX — transient only, no conjugate heat transfer
HydroGym	Differentiable backend uses `gymnax` (not standard gymnasium)
FluidGym	Gymnasium-compatible mode calls `.detach()` — gradients disabled

DiffCFD targets the empty intersection:

PyTorch-native × incompressible FV/SIMPLE × steady-state implicit diff × standard gymnasium.Env

Use cases:

Shape optimization — geometry → SIMPLE → drag/Nusselt → loss.backward() with O(N) memory
Contextual-bandit RL — design parameters as actions, steady-state physics as environment
Quasi-steady flow control — sequential MDP where each step is a steady-state solve
Coupled optimization — fluid + heat + geometry jointly through one autograd graph

Quick Start

CPU only. No GPU needed. Runs on any laptop with 8 GB RAM.

Installation

# Requires Python 3.10+, PyTorch 2.12+, and a Rust toolchain
pip install maturin torch numpy scipy gymnasium
maturin develop --release     # compiles Rust kernels (one-time, ~30 s)

5-Minute Lid-Driven Cavity

from diffcfd import NavierStokes2D

# Steady-state SIMPLE solve — lid-driven cavity at Re=100
solver = NavierStokes2D(reynolds_number=100, grid=(32, 32))
ux, uy, p = solver.solve_steady(lid_velocity=1.0, case="cavity")

print(f"u-velocity shape: {ux.shape}")
print(f"Max |u_x|:        {ux.abs().max().item():.4f}")
print(f"Max |u_y|:        {uy.abs().max().item():.4f}")

Expected output (AMD Ryzen 5600G, CPU, ~6 s wall time):

u-velocity shape: torch.Size([32, 33])
Max |u_x|:        0.9xxx
Max |u_y|:        0.3xxx

Implicit Differentiation (Exact Gradient via GMRES)

import torch
from diffcfd import NavierStokes2D

solver = NavierStokes2D(
    reynolds_number=1.0, grid=(32, 16), lx=4.0, ly=1.0,
    backward="implicit_diff",
)
u_inlet = torch.tensor(1.0, requires_grad=True)
ux, uy, p = solver.solve_steady(inlet_velocity=u_inlet, case="channel")
dp = solver.pressure_drop(ux, uy, p)
dp.backward()  # Exact gradient via matrix-free GMRES — O(N) memory

print(f"Pressure drop:    ΔP = {dp.item():.4f}")
print(f"Analytical:       dΔP/dU = 48.0")
print(f"Computed:         dΔP/dU = {u_inlet.grad.item():.4f}")
print(f"Relative error:   {abs(u_inlet.grad.item() - 48.0) / 48.0 * 100:.4f}%")

Expected output (CPU, ~5 s):

Pressure drop:    ΔP = 51.9473
Analytical:       dΔP/dU = 48.0
Computed:         dΔP/dU = 51.9503
Relative error:   <0.01%

Topology Optimization (End-to-End Autograd)

from diffcfd import optimize_topology

result = optimize_topology(
    objective="pressure_drop",
    grid=(32, 16),
    lx=2.0, ly=1.0,
    re=50.0,
    n_steps=15,
    lr=0.03,
    filter_radius=0.1,
    verbose=True,
)
print(f"Final |ΔP|:     {result['history']['objective'][-1]:.4f}")
print(f"Fluid fraction: {result['history']['fluid_fraction'][-1]:.3f}")

Expected output (CPU, ~2 min for 15 steps at 32x16):

Final |ΔP|:     ~0.45
Fluid fraction: ~0.60

Gymnasium Environment (RL-Ready)

from diffcfd import CylinderWakeEnv

env = CylinderWakeEnv(re=100, grid=(48, 24), max_steps=5, mode="B")
obs, info = env.reset()
obs, reward, done, truncated, info = env.step([0.5])
print(f"Reward: {reward:.4f}")

Installation (Full)

# Core build (requires Rust toolchain)
pip install maturin torch numpy scipy gymnasium
maturin develop --release

# Optional
pip install pytest pyamg matplotlib meshio pyevtk

Co-Design: Flow + Lithography

DiffCFD couples spin-coating and lithography solvers through a shared process parameterization:

from diffcfd.workflows import optimize_joint_process, optimize_decoupled_process

# Joint co-optimization: spin profile omega(t) + exposure dose simultaneously
result = optimize_joint_process(target_developed_h_nm=60.0, n_epochs=50)

# Decoupled baseline for comparison
baseline = optimize_decoupled_process(target_developed_h_nm=60.0)

# Process window analysis around the optimum
from diffcfd.workflows import process_window_analysis
window = process_window_analysis(result["omega_profile"], result["dose_tensor"], spin_dt=0.001)

Joint optimization produces a wider process window and lower final loss than sequential spin-then-dose optimization.

Flagship Demo

Run the end-to-end joint vs decoupled comparison with process window analysis:

python scripts/flagship_flow_litho.py

This script runs both optimize_joint_process and optimize_decoupled_process, performs process window analysis around each optimum, prints a summary table, and writes flagship_flow_litho_results.json.

Flagship evidence (post-K1 fix, 10-seed sweep, Wilcoxon p=0.002):

Metric	Joint	Decoupled	Delta
Valid seeds	10/10 (0% NaN)	10/10 (0% NaN)	K1 fix eliminated NaN
final_loss mean (std)	3.236e+03 (1.480e+03)	3.728e+03 (1.394e+03)	Joint 13.2% lower
final_developed_nm mean	2816.8 nm	3041.3 nm	Joint 224.5 nm closer
Wilcoxon p-value	p=0.002 (loss), p=0.002 (developed)	—	Significant (p<0.05)
wall_time	Slower	Faster	Joint optimizes both simultaneously

Process window note (N1 fix): The process window metric now uses a self-derived target from the nominal-dose forward pass (tolerance ±2%) instead of the previous hardcoded 50±10 nm which was invalid at the µm-scale output range. The 10-seed re-sweep (2026-05-30) confirmed process window widths: Joint 11.9±7.7 mJ/cm² vs Decoupled 13.2±5.8 mJ/cm² (p=0.13, not significant).

The K1 fix (semi-implicit integration + adaptive dt + finite guard) eliminated the NaN divergence that previously affected 7/10 seeds (70% NaN rate). The post-fix 10-seed sweep confirms 0% NaN rate and a statistically significant advantage for joint optimization on both final_loss and final_developed_nm (Wilcoxon p=0.002). Joint wins on all optimization metrics except wall_time, where it is slower due to simultaneous optimization of spin profile and exposure dose.

Validation (Verified)

Case	Re	Target	Result	Status
Lid-driven cavity u-velocity (64²)	100	L2 < 1%	< 1%	Pass
Lid-driven cavity u-velocity (128²)	1000	L2 < 2%	< 2%	Pass
Poiseuille ∂ΔP/∂U_inlet	1	< 0.01% vs analytical	< 0.01%	Pass
`torch.autograd.gradcheck` (Poiseuille)	1	passes	passes	Pass
Pure conduction Nusselt number	—	Nu = 1.0	1.0000	Pass
Backward-facing step (Brinkman)	100	bounded, recirculating	pass	Pass

Flagship Evidence Status

Claim	Code	Tests	Data	Status
Joint litho-CFD optimization (`optimize_joint_process`)	`diffcfd/workflows/joint_litho_opt.py`	`tests/unit/test_joint_litho.py`, `tests/unit/test_flagship_flow_litho.py`	`flagship_flow_litho_results.json` (10-seed sweep)	Verified
Process window analysis (`process_window_analysis`)	`diffcfd/workflows/joint_litho_opt.py`	`tests/unit/test_flagship_flow_litho.py`	`flagship_flow_litho_results.json`	Verified
sCO2 transcritical property surrogate (`SCO2Surrogate`)	`diffcfd/props/sco2.py`	`tests/unit/test_sco2.py`	README Table 4 (measured 14.4 s training)	Verified
Variable-property conjugate heat transfer (`HeatTransfer2D` + `props`)	`diffcfd/solvers/heat_transfer.py`	`tests/unit/test_heat_transfer.py`	README Table 4 (accuracy numbers)	Verified
Matrix-free implicit differentiation (GMRES)	`diffcfd/solvers/implicit_diff.py`	`tests/validation/test_gradients.py`	README Table 3 (measured gradient accuracy)	Verified
Rust-accelerated forward kernels	`src/momentum.rs`, `src/pressure.rs`, `src/simple.rs`	`tests/validation/test_lid_driven_cavity.py`	README Table 2 (measured wall-clock)	Verified
FNO surrogate-in-the-loop	`diffcfd/surrogates/fno.py`	`tests/unit/test_surrogates.py`	Internal	Verified
Solver-in-the-loop learned closure (C9.1)	`diffcfd/solvers/learned_closure.py`	`tests/unit/test_learned_closure.py`	Stability curve + a-priori/a-posteriori report	Verified
Diff-FlowFSI cross-validation (C9.2)	`diffcfd/validation/diff_flowfsi_crossval.py`	`tests/unit/test_diff_flowfsi_crossval.py`	Forward + gradient agreement report	Verified
Codomain flow control transfer (C9.3)	`diffcfd/envs/codomain_control.py`	`tests/unit/test_codomain_control.py`	Transfer benchmark report	Verified
Topology optimization	`diffcfd/workflows/topology.py`	`tests/unit/test_filters.py`	Quick Start example output	Verified

Compatibility

Dependency	Version
Python	3.10+
PyTorch	2.12+
diff-surrogate	0.2.0

Sister projects: DiffNano (nanophotonics), OpenLithoHub (lithography benchmarking), diff-surrogate (shared surrogate framework).

Performance & Benchmarks

All data below were measured on AMD Ryzen 5 5600G (6 cores), 13 GB RAM, Ubuntu 22.04, Python 3.10, PyTorch 2.12+cpu, Rust 1.95. No values are estimated or extrapolated.

Table 1 — Comparison with Published Methods

Aspect	DiffCFD (this work)	PhiFlow [1]	JAX-Fluids [2]	JAX-Fluids 2.0 [4]	Diff-FlowFSI [5]	SU2 adjoint [3]
Differentiation	Implicit (matrix-free GMRES)	Automatic (JAX tracing)	Automatic (JAX tracing)	Automatic (JAX tracing)	Automatic (JAX tracing)	Discrete adjoint
Steady-state support	SIMPLE-converged steady states	Transient time-stepping only	Transient only	Transient only	Transient only	Steady (compressible)
Memory (backward)	O(N·k), k = GMRES restart	O(N·T), T = time steps	O(N·T)	O(N·T)	O(N·T)	O(N)
Backend	PyTorch	JAX	JAX	JAX	JAX	C++ / hand-derived
RL integration	`gymnasium.Env`	`gymnax` (JAX-only)	None	None	None	None
Conjugate heat transfer	Yes	No	No	No	No	No
sCO2 surrogate	Yes	No	No	No	No	No

Comparability note: The memory scaling claim (O(N·k)) is a structural property of restarted GMRES, not a measured speedup over other tools. Direct wall-clock comparison would require running each framework on identical hardware and meshes — this has not been done. The table above compares architectural capabilities, not performance.

Additional Related Work (2025–2026)

Reference	Venue / Year	Relevance
Differentiable supercritical topology optimization	2026	Polynomial thermodynamic models for sCO₂ transcritical optimization
OpenMDAO/MPhys CHT	2026	Modular discrete adjoint conjugate heat transfer framework
GAOT v4	NeurIPS 2025, arXiv:2505.18781	Multi-scale attention geometry-aware operator transformer
GINOT	CMAME 2026	Surface point-cloud encoding + cross-attention geometry injection for neural operators
DNOT	Eng. with Computers 42:60, 2026	Feature-diffusion enhanced neural operator transformer
DD-DeepONet	Eng. Appl. Artif. Intell. 2026	Domain decomposition DeepONet
Schwarz Neural Inference	arXiv:2504.00510 v2, 2026-02	Local→global domain decomposition operator learning

DiffCFD's differentiation: PyTorch-native (vs JAX in most others), steady-state implicit differentiation (vs transient-only in all JAX frameworks), gymnasium.Env RL integration, conjugate heat transfer, and sCO2 transcritical property surrogate — all on CPU without GPU requirement.

[1] Holl, P., Kuckelberg, P., Thuerey, N. PhiFlow — a differentiable PDE solving framework. GitHub: tum-pbs/PhiFlow. [2] Bezgin, D. A., Buhendwa, A. B., Adams, N. A. "JAX-Fluids: A fully differentiable high-order computational fluid dynamics solver for compressible two-phase flows." Computer Physics Communications, 2023. [3] Economomon, T. D. et al. "The SU2 Project." AIAA Journal, 2016. [4] Bezgin, D. A. et al. "JAX-Fluids 2.0." Computers & Physics Communications 309, 2025. [5] Diff-FlowFSI: GPU-optimized differentiable fluid-structure interaction in JAX. arXiv:2505.23940, 2025.

Table 2 — Solver Performance (Measured)

Wall-clock time for steady-state SIMPLE convergence (tol=1e-5), single-threaded CPU.

Case	Grid	Time (s)	L2 Error	Target
Cavity Re=100	32²	5.6	1.96%	< 2%
Cavity Re=100	64²	54.6	0.85%	< 1%
Cavity Re=1000	128²	1316.6	—	< 2%
Poiseuille Re=1	32×16	—	0.45%	< 1%
Poiseuille Re=1	64×32	—	0.10%	< 0.5%
Poiseuille Re=1	128×64	—	0.03%	< 0.1%

Note: Cavity Re=100 at 128² takes ~2 min, Re=1000 at 128² takes ~22 min — higher Re requires more SIMPLE iterations and tighter under-relaxation. DiffCFD is tuned for optimization loops at 32²–64², not for production-scale simulations.

Table 3 — Gradient Accuracy (Measured)

Implicit differentiation vs finite difference for Poiseuille ∂ΔP/∂U_inlet (analytical = 48.0).

Grid	FD Gradient	AD Gradient	\|AD − FD\| / \|FD\|
16×8	52.339	52.338	1.97×10⁻⁵
32×16	51.947	51.950	4.19×10⁻⁵
48×24	52.353	52.355	3.23×10⁻⁵

torch.autograd.gradcheck passes at (8×4) with atol=1e-3.

Table 4 — sCO₂ Surrogate Accuracy (Measured)

C₄ neural network trained on 8 000 NIST-referenced samples, 1 000 epochs, 14.4 s training time.

Property	Relative L2	Positive?
Density ρ	1.7%	Yes
Viscosity μ	0.43%	Yes
Conductivity k	8.3%	Yes
Specific heat cₚ	1.0%	Yes

Limitation: Conductivity relative L2 (8.3%) is notably higher than other properties — the surrogate struggles near the critical point (Tc = 304.13 K) where k has a sharp peak. This is a known difficulty for polynomial/neural surrogates in transcritical regimes.

Visualization

_{Charts use transparent backgrounds and neutral gray text for light/dark theme compatibility.}

How to Reproduce

# 1. Install dependencies
pip install maturin torch numpy scipy gymnasium matplotlib
maturin develop --release

# 2. Run validation benchmarks (11 cases, ~30 min)
python tests/benchmarks/benchmark_suite.py

# 3. Run performance benchmarks with percentile stats
python tests/benchmarks/benchmark_performance.py --json results/perf_bench.json

# 4. Flagship flow-litho co-optimization (multi-seed with Wilcoxon tests)
make flagship-b          # 10 seeds, full report
make flagship-b-ci       # 3 seeds, CI smoke test
# Or directly:
python3 scripts/flagship_flow_litho.py                  # single seed
python3 scripts/flagship_flow_litho.py --seed-sweep     # 10 seeds

# 5. Regenerate charts
python docs/benchmark_charts.py

Hardware used for all results above:

Component	Value
CPU	AMD Ryzen 5 5600G (6 cores / 12 threads)
RAM	13 GB DDR4
OS	Ubuntu 22.04, kernel 6.8
Python	3.10.12 (CPython)
PyTorch	2.12.0+cpu
Rust	1.95.0 (maturin/PyO3)

Methodology: Timing uses time.perf_counter() with GC disabled during measurement. Performance benchmarks run 3 warmup iterations followed by 5 sampled iterations, reporting median/P95/P99. Validation benchmarks run once and report total wall-clock time. No values are extrapolated to untested configurations.

All test data were obtained by actually running the above commands on the described hardware. No performance numbers are estimated, inferred, or borrowed from other publications.

Reproducibility

All flagship benchmarks and validation cases can be reproduced in clean containers or locally with a single command.

Container (Docker)

# Build the container (includes Rust toolchain, CPU-only PyTorch)
docker build -t diffcfd-flagship .

# Run flagship benchmark (3-seed CI sweep)
docker run --rm diffcfd-flagship

# Or via Make:
make docker-flagship

The container sets PYTHONHASHSEED=42 for deterministic hashing. All results are printed to stdout and written to flagship_flow_litho_results.json inside the container.

Cross-Validation (Analytical Benchmarks)

Run the solver against known closed-form solutions:

# Local
make cross-validate

# In Docker
make docker-cross-validate

Cross-validation checks:

Test	Reference	Gate	Metric
Lid-driven cavity Re=100	Ghia et al. 1982	L2 < 1%	u-velocity centerline
Poiseuille forward Re=1	Analytical parabolic	L2 < 1%	Outlet velocity profile
Poiseuille gradient Re=1	Finite difference (eps=0.01)	rel err < 0.01%	dDP/dU_inlet

Results are written to cross_validation_results.json.

Local One-Key Reproduce

make reproduce        # 3-seed flagship sweep
make cross-validate   # analytical benchmarks

Design Philosophy

DiffCFD is intentionally not a full-featured CFD code:

DiffCFD	Production CFD (OpenFOAM, Fluent)
Differentiable end-to-end	Not differentiable
CPU-first, GPU-capable	CPU-first, MPI-parallel
2D incompressible NS + heat	Full compressible, complex turbulence
Structured Cartesian + Brinkman IB	Unstructured, body-fitted meshes
O(N) memory backward	N/A
Single-laptop at 64²–128²	Cluster-scale meshes

Use DiffCFD for optimization loops and ML training. Use OpenFOAM for final validation and production runs.

Config	Hardware
64² grid, 2D, CPU	Any modern laptop (~8 GB RAM)
128² grid, 2D, CPU	16+ GB RAM
256² grid, 2D	GPU recommended
3D	Out of scope for v0.x

Architecture Decision Records

ADR	Title	Decision
ADR-001	Framework decision -- stay on PyTorch unified graph	JAX interop only via dlpack (`diff_surrogate.interop`); no JAX rewrite
ADR-002	Thread affinity for Rust/PyTorch coordination	Not needed; PyTorch internal thread pool handles parallelism adequately (<5% contention)

Architecture

diffcfd/
├── solvers/
│   ├── navier_stokes_2d.py    # 2D incompressible NS + SIMPLE (Rust-accelerated forward)
│   ├── heat_transfer.py       # Conjugate heat transfer
│   ├── turbulence.py          # Frozen eddy viscosity (Re > 5000)
│   ├── implicit_diff.py       # Matrix-free GMRES backward (auto diagonal preconditioner)
│   ├── fsi.py                 # FSI implicit differentiation (C7.2)
│   ├── boundary.py            # Boundary condition enforcement + blowing/suction control (C7.3)
│   ├── spin_coating.py        # Differentiable spin coating (Meyerhofer + radial PDE)
│   ├── litho.py               # Differentiable lithography solver (Dill exposure + Mack develop)
│   └── learned_closure.py     # LearnedClosureNet, SolverInTheLoopTrainer (C9.1)
├── envs/
│   ├── cylinder_wake.py       # Cylinder wake RL (Mode B)
│   ├── heat_exchanger.py      # Heat exchanger fin (Mode A)
│   ├── codomain_control.py    # CodomainActor, TransferAdapter, CodomainPPO (C9.3)
│   └── base.py
├── geometry/
│   ├── mesh.py                # Cartesian mesh + SDF Brinkman mask
│   ├── shapes.py              # SDFs (cylinder, rectangle, NACA)
│   ├── airfoil.py             # NACA 4-digit + B-spline
│   └── filters.py             # Helmholtz filter for manufacturing constraints
├── workflows/
│   ├── aero.py                # Aerodynamic shape optimization
│   ├── topology.py            # Topology optimization + Helmholtz filter
│   ├── pche.py                # PCHE channel optimization
│   ├── spin_coat_opt.py       # Spin coating profile optimization
│   └── joint_litho_opt.py     # Joint spin-coating + lithography co-optimization
├── props/
│   ├── ideal_gas.py           # Abstract ThermophysicalProps + ConstantProps
│   ├── eos.py                 # Polynomial and cubic-spline equation of state (C7.1)
│   └── sco2.py                # sCO2 transcritical property surrogate (C4)
├── surrogates/
│   ├── fno.py                 # Fourier Neural Operator for flow prediction
│   └── simple_surrogate.py    # CNN surrogate for SIMPLE acceleration
├── validation/
│   ├── cross_validation.py    # LidDrivenCavityBenchmark, GradientCrossValidation (C8.1)
│   └── diff_flowfsi_crossval.py  # DiffFlowFSICrossValidator, GPUBenchmarkSuite (C9.2)
├── control/
│   └── rl_ad.py               # ADWarmStartPPO, ADAugmentedPPO, SimplePPO (C8.2)
├── adjoint/
│   └── transient.py           # TransientHeatAdjoint, TransientCheckpointSchedule (C8.3)
├── uncertainty/
│   └── sco2_calibrated.py     # SCO2CalibratedPredictor, UncertaintyPropagation (C8.4)
├── export/
│   └── vtk.py                 # VTK export for ParaView
└── utils/
    ├── linalg.py              # Matrix-free GMRES
    └── threading.py           # Thread affinity helpers (Rust/PyTorch coordination)
src/ (Rust via PyO3/maturin, at repo root)
├── lib.rs                     # PyO3 module registration
├── momentum.rs                # Sparse momentum system assembly (CSR)
├── pressure.rs                # Pressure correction system assembly (CSR)
├── sdf.rs                     # B-spline SDF (rayon parallel)
├── simple.rs                  # Full SIMPLE forward loop (faer sparse LU)
└── utils.rs                   # Shared helpers (hybrid scheme, COO→CSR)

Roadmap

Milestone	Scope	Status
v0.1	2D NS + matrix-free implicit diff + validation	Done
v0.2	Conjugate heat transfer + sCO₂ surrogate	Done
v0.3	Gymnasium environments (CylinderWake + HeatExchanger)	Done
v0.35	Frozen eddy viscosity for Re > 5000	Done
v0.4	NACA + B-spline aerodynamic shape optimization	Done
v0.4.1	Helmholtz filter + topology optimization	Done
v0.5	FNO surrogate-in-the-loop	Done
v0.6	sCO₂ PCHE optimization + sCO2-TMSR-Toolkit integration	Done
v0.7	Rust-accelerated forward kernels (maturin/PyO3)	Done
v0.75	Differentiable spin coating + lithography solvers	Done
v0.8	Polynomial/Spline EOS (C7.1), FSI implicit differentiation (C7.2), blowing/suction boundary control (C7.3), containerized reproducibility (C7.4)	Done
v0.9	External CFD cross-validation (C8.1), RL x AD hybrid control (C8.2), transient adjoint (C8.3), sCO2 calibrated uncertainty (C8.4)	Done
v0.95	Solver-in-the-loop learned closure (C9.1), Diff-FlowFSI cross-validation (C9.2), codomain flow control transfer (C9.3)	Done
v1.0	Full benchmark suite 11/11 pass + arXiv paper	Planned

Competitive Positioning

What it is: A differentiable CFD toolkit with steady-state implicit differentiation and standard gymnasium.Env RL interface — for physics-informed optimization and control research.

Where it leads:

Steady-state implicit differentiation × gymnasium.Env: This combination has no overlap with existing tools. HydroGym uses gymnax (JAX-based, not standard gymnasium); FluidGym detaches gradients in differentiable mode. DiffCFD provides both differentiable physics and standard RL in one package.
Hybrid AD×RL control: AD-warm-started PPO with gradient-augmented rewards — a novel training paradigm for flow control.
Transient adjoint with checkpointing: Memory-efficient discrete adjoint for conjugate heat transfer / FSI, enabling gradient-based optimization of coupled thermal-fluid systems.
Solver-in-the-loop learned closure: A-posteriori rollout training with stability curve analysis — closes the train-deploy gap for learned turbulence models.
Codomain flow control transfer: Cross-Reynolds/geometry transfer via codomain-attention adapters, enabling zero-shot generalization to unseen flow regimes.
Diff-FlowFSI cross-validation: Independent gradient verification against a JAX-based differentiable FSI framework, providing external credibility for DiffCFD's implicit differentiation.

Where it lags (honest assessment):

Scale: 2D, moderate Reynolds numbers, single workstation. 2-4 orders of magnitude behind JAX-Fluids (512×A100 GPU clusters) and Diff-FlowFSI (GPU turbulence + FSI).
Validation: Ghia/Poiseuille analytical solutions + cross-validation against Diff-FlowFSI. No experimental or production CFD validation.
Maturity: Research prototype. No users, no production deployments.

Bottom line: Niche but unique — the only framework combining differentiable steady-state CFD with standard RL. Value is in the training paradigm exploration, not in solver scale or speed.

Contributing

This repository is currently in an early-development phase. Pull requests touching diffcfd/solvers/* are not being accepted until the API stabilizes. Discussion issues and benchmark proposals are welcome.

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.github/workflows		.github/workflows
diffcfd		diffcfd
docs		docs
examples		examples
notebooks		notebooks
plans		plans
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CITATION.cff		CITATION.cff
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
flagship_flow_litho_results.json		flagship_flow_litho_results.json
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.lock		requirements.lock
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

DiffCFD

External CFD Cross-Validation (C8.1)

RL x AD Hybrid Control (C8.2)

Transient Adjoint (C8.3)

sCO2 Calibrated Uncertainty (C8.4)

Variable-Property Conjugate Heat Transfer

Solver-in-the-Loop Learned Closure (C9.1)

Diff-FlowFSI Cross-Validation (C9.2)

Codomain Flow Control Transfer (C9.3)

Why DiffCFD?

Quick Start

Installation

5-Minute Lid-Driven Cavity

Implicit Differentiation (Exact Gradient via GMRES)

Topology Optimization (End-to-End Autograd)

Gymnasium Environment (RL-Ready)

Installation (Full)

Co-Design: Flow + Lithography

Flagship Demo

Validation (Verified)

Flagship Evidence Status

Compatibility

Performance & Benchmarks

Table 1 — Comparison with Published Methods

Additional Related Work (2025–2026)

Table 2 — Solver Performance (Measured)

Table 3 — Gradient Accuracy (Measured)

Table 4 — sCO₂ Surrogate Accuracy (Measured)

Visualization

How to Reproduce

Reproducibility

Container (Docker)

Cross-Validation (Analytical Benchmarks)

Local One-Key Reproduce

Design Philosophy

Architecture Decision Records

Architecture

Roadmap

Competitive Positioning

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages