Skip to content

OpenLithoHub/DiffCFD

Repository files navigation

DiffCFD

Differentiable Computational Fluid Dynamics for Steady-State Inverse Design and Reinforcement Learning

License Python 3.10+ PyTorch Rust

PyTorch-native differentiable fluid dynamics — matrix-free implicit differentiation through SIMPLE-converged steady states with O(N) memory, plus gradient-attached gymnasium.Env for RL.

Status: Early-stage personal research project. Core solver and implicit differentiation verified against analytical solutions. Containerized reproducibility available (Docker). Analytical cross-validation replaces third-party validation.

Honesty boundaries:

  • CPU-only; no GPU benchmarks have been conducted. GPU benchmark suite in Diff-FlowFSI cross-validation (C9.2) requires CUDA hardware and will skip on CPU-only systems.
  • Diff-FlowFSI cross-validation (C9.2) is framework-only — no vendored Diff-FlowFSI code is included; users must install Diff-FlowFSI separately to run cross-validation.
  • No third-party experimental validation. All results are self-measured on a single workstation. Cross-validation against analytical solutions (Ghia 1982, Poiseuille) is automated via make cross-validate.
  • Spin-coating flagship benchmark: post-K1 fix verified 10/10 valid seeds (0% NaN rate, previously 70%). Wilcoxon p=0.002 confirms joint optimization advantage.

Known stubs / unimplemented:

  • No solver-level stubs in DiffCFD. All core solvers (NS, heat transfer, implicit diff) are functional and validated.

External CFD Cross-Validation (C8.1)

Independent vorticity-streamfunction solver for cross-validating DiffCFD's SIMPLE-based results. Gradient cross-validation metrics compare AD gradients against finite differences on matching discretizations. A GPUBenchmarkReport is emitted with honest CPU-only annotation (no GPU benchmarks have been conducted).

from diffcfd.validation import LidDrivenCavityBenchmark, GradientCrossValidation

bench = LidDrivenCavityBenchmark(re=100, grid=(64, 64))
report = bench.run()  # runs independent vorticity-streamfunction solver

grad_val = GradientCrossValidation(solver=solver)
metrics = grad_val.compare(u_inlet)  # AD vs FD gradient agreement

RL x AD Hybrid Control (C8.2)

Hybrid control combining differentiable physics gradient warm-starts with PPO fine-tuning, plus a standalone SimplePPO implementation.

from diffcfd.control import ADWarmStartPPO, ADAugmentedPPO, SimplePPO

# AD gradient warm-start + PPO fine-tuning
agent = ADWarmStartPPO(env, n_warmstart_steps=200, ppo_epochs=50)
agent.train()

# AD gradient bonus shaping
agent_aug = ADAugmentedPPO(env, grad_weight=0.1)
agent_aug.train()

# Standalone PPO (no AD dependency)
agent_ppo = SimplePPO(env, lr=3e-4, n_steps=2048)
agent_ppo.train(total_timesteps=10000)

Transient Adjoint (C8.3)

Explicit Euler forward propagation with reverse-time adjoint for transient heat problems. Checkpoint scheduling controls memory-accuracy tradeoff. FD verification confirms relative error < 1e-3.

from diffcfd.adjoint import TransientHeatAdjoint, TransientCheckpointSchedule

schedule = TransientCheckpointSchedule(n_steps=100, n_checkpoints=10)
adjoint = TransientHeatAdjoint(grid=(32, 32), schedule=schedule)
loss, grad = adjoint.solve_and_adjoint(T_hot=800.0, T_cold=300.0)
# FD verification: relative error < 1e-3

sCO2 Calibrated Uncertainty (C8.4)

Conformal prediction coverage guarantees on sCO2 property predictions, with uncertainty propagation to derived quantities (Nusselt number, pressure drop).

from diffcfd.uncertainty import SCO2CalibratedPredictor, UncertaintyPropagation

predictor = SCO2CalibratedPredictor(calibration_data=(T_cal, p_cal, rho_cal))
rho_mean, rho_lower, rho_upper = predictor.predict_with_bounds(T, p, coverage=0.95)

propagator = UncertaintyPropagation(predictor)
nu_bounds = propagator.nusselt_bounds(T, p, velocity, length, coverage=0.95)
dp_bounds = propagator.pressure_drop_bounds(T, p, velocity, length, coverage=0.95)

Variable-Property Conjugate Heat Transfer

HeatTransfer2D.solve_differentiable() now supports a variable-property mode via the props parameter. When a ThermophysicalProps instance (e.g., SCO2Surrogate) is provided, the thermal diffusivity α is recomputed from local (T, p) at each iteration rather than using a constant value. This integrates the sCO₂ differentiable property surrogate directly into the conjugate heat transfer solver for transcritical optimization workflows.

from diffcfd import HeatTransfer2D, SCO2Surrogate

props = SCO2Surrogate()
solver = HeatTransfer2D(grid=(64, 64))
T = solver.solve_differentiable(
    T_hot=800.0, T_cold=600.0, pressure=8.0e6,
    props=props,  # α recomputed from local (T, p) each iteration
)

Solver-in-the-Loop Learned Closure (C9.1)

Learned eddy viscosity model trained with a-posteriori rollout (solver-in-the-loop), including stability curve analysis and a-priori vs a-posteriori benchmarking.

from diffcfd.solvers.learned_closure import (
    LearnedClosureNet,
    SolverInTheLoopTrainer,
    StabilityCurve,
    APrioriVsAPosterioriBenchmark,
)

# Learned eddy viscosity network
net = LearnedClosureNet(input_channels=6, hidden_dim=64)

# Solver-in-the-loop training: unroll through SIMPLE at each training step
trainer = SolverInTheLoopTrainer(
    net=net,
    re_train=1000,
    grid=(64, 64),
    unroll_steps=5,
    lr=1e-3,
)
trainer.train(n_epochs=100)

# Stability curve: maximum stable unroll steps vs Reynolds number
curve = StabilityCurve(net=net, re_range=[100, 500, 1000, 5000])
results = curve.evaluate()

# A-priori vs a-posteriori comparison
bench = APrioriVsAPosterioriBenchmark(net=net, re=1000, grid=(64, 64))
report = bench.run()

References: arXiv:2604.23874, JFM 2022.

Diff-FlowFSI Cross-Validation (C9.2)

Forward-solution and gradient cross-validation framework against Diff-FlowFSI, plus a GPU benchmark suite and long-rollout memory strategy documentation. Requires a separate Diff-FlowFSI installation and CUDA hardware for GPU benchmarks.

from diffcfd.validation.diff_flowfsi_crossval import (
    DiffFlowFSICrossValidator,
    GPUBenchmarkSuite,
    MemoryStrategyDocumenter,
)

# Forward + gradient cross-validation against Diff-FlowFSI
validator = DiffFlowFSICrossValidator(
    re=100, grid=(64, 64),
    diff_flowfsi_path="/path/to/Diff-FlowFSI",
)
report = validator.run()  # compares velocity, pressure, and gradient fields

# GPU benchmark suite (requires CUDA)
gpu_bench = GPUBenchmarkSuite(grids=[(64, 64), (128, 128), (256, 256)])
gpu_report = gpu_bench.run()  # wall-clock, memory, throughput on GPU

# Document memory strategies for long rollouts
doc = MemoryStrategyDocumenter()
doc.generate_report()  # checkpointing vs recomputation trade-off analysis

References: CMAME 2025, arXiv:2505.23940.

Codomain Flow Control Transfer (C9.3)

Cross-Reynolds and cross-geometry flow control transfer via codomain-attention actor with lightweight adapters, trained with PPO.

from diffcfd.envs.codomain_control import (
    CodomainActor,
    TransferAdapter,
    CodomainPPO,
    TransferBenchmark,
)

# Codomain-attention actor: conditions on Reynolds/geometry descriptor
actor = CodomainActor(
    state_dim=64,
    action_dim=4,
    codomain_dim=8,  # Reynolds/geometry embedding
)

# Transfer adapter for new regimes
adapter = TransferAdapter(actor=actor, adapter_rank=4)

# Train with codomain-conditioned PPO
agent = CodomainPPO(
    actor=actor,
    envs=[env_re100, env_re500, env_re1000],
    lr=3e-4,
)
agent.train(total_timesteps=50000)

# Benchmark transfer: train on Re=100, evaluate at Re=500, Re=1000
bench = TransferBenchmark(actor=actor, adapter=adapter)
transfer_report = bench.evaluate(
    train_re=100,
    eval_re_list=[500, 1000, 2000],
)

References: CoDA-NO NeurIPS 2024, arXiv:2509.10185.


Why DiffCFD?

Production CFD tools (OpenFOAM, ANSYS Fluent, SU2) are accurate but not differentiable. Existing differentiable CFD frameworks each have a structural gap:

Framework Gap
PhiFlow / JAX-Fluids Transient time-stepping only — no steady-state implicit diff
JAX-Fluids 2.0 (CoPhC 309, 2025) HPC differentiable CFD, 512xA100 — transient only, no steady-state, no RL
Diff-FlowFSI (arXiv:2505.23940, 2025) GPU-optimized differentiable FSI in JAX — transient only, no conjugate heat transfer
HydroGym Differentiable backend uses gymnax (not standard gymnasium)
FluidGym Gymnasium-compatible mode calls .detach() — gradients disabled

DiffCFD targets the empty intersection:

PyTorch-native × incompressible FV/SIMPLE × steady-state implicit diff × standard gymnasium.Env

Use cases:

  • Shape optimization — geometry → SIMPLE → drag/Nusselt → loss.backward() with O(N) memory
  • Contextual-bandit RL — design parameters as actions, steady-state physics as environment
  • Quasi-steady flow control — sequential MDP where each step is a steady-state solve
  • Coupled optimization — fluid + heat + geometry jointly through one autograd graph

Quick Start

CPU only. No GPU needed. Runs on any laptop with 8 GB RAM.

Installation

# Requires Python 3.10+, PyTorch 2.12+, and a Rust toolchain
pip install maturin torch numpy scipy gymnasium
maturin develop --release     # compiles Rust kernels (one-time, ~30 s)

5-Minute Lid-Driven Cavity

from diffcfd import NavierStokes2D

# Steady-state SIMPLE solve — lid-driven cavity at Re=100
solver = NavierStokes2D(reynolds_number=100, grid=(32, 32))
ux, uy, p = solver.solve_steady(lid_velocity=1.0, case="cavity")

print(f"u-velocity shape: {ux.shape}")
print(f"Max |u_x|:        {ux.abs().max().item():.4f}")
print(f"Max |u_y|:        {uy.abs().max().item():.4f}")

Expected output (AMD Ryzen 5600G, CPU, ~6 s wall time):

u-velocity shape: torch.Size([32, 33])
Max |u_x|:        0.9xxx
Max |u_y|:        0.3xxx

Implicit Differentiation (Exact Gradient via GMRES)

import torch
from diffcfd import NavierStokes2D

solver = NavierStokes2D(
    reynolds_number=1.0, grid=(32, 16), lx=4.0, ly=1.0,
    backward="implicit_diff",
)
u_inlet = torch.tensor(1.0, requires_grad=True)
ux, uy, p = solver.solve_steady(inlet_velocity=u_inlet, case="channel")
dp = solver.pressure_drop(ux, uy, p)
dp.backward()  # Exact gradient via matrix-free GMRES — O(N) memory

print(f"Pressure drop:    ΔP = {dp.item():.4f}")
print(f"Analytical:       dΔP/dU = 48.0")
print(f"Computed:         dΔP/dU = {u_inlet.grad.item():.4f}")
print(f"Relative error:   {abs(u_inlet.grad.item() - 48.0) / 48.0 * 100:.4f}%")

Expected output (CPU, ~5 s):

Pressure drop:    ΔP = 51.9473
Analytical:       dΔP/dU = 48.0
Computed:         dΔP/dU = 51.9503
Relative error:   <0.01%

Topology Optimization (End-to-End Autograd)

from diffcfd import optimize_topology

result = optimize_topology(
    objective="pressure_drop",
    grid=(32, 16),
    lx=2.0, ly=1.0,
    re=50.0,
    n_steps=15,
    lr=0.03,
    filter_radius=0.1,
    verbose=True,
)
print(f"Final |ΔP|:     {result['history']['objective'][-1]:.4f}")
print(f"Fluid fraction: {result['history']['fluid_fraction'][-1]:.3f}")

Expected output (CPU, ~2 min for 15 steps at 32x16):

Final |ΔP|:     ~0.45
Fluid fraction: ~0.60

Gymnasium Environment (RL-Ready)

from diffcfd import CylinderWakeEnv

env = CylinderWakeEnv(re=100, grid=(48, 24), max_steps=5, mode="B")
obs, info = env.reset()
obs, reward, done, truncated, info = env.step([0.5])
print(f"Reward: {reward:.4f}")

Installation (Full)

# Core build (requires Rust toolchain)
pip install maturin torch numpy scipy gymnasium
maturin develop --release

# Optional
pip install pytest pyamg matplotlib meshio pyevtk

Co-Design: Flow + Lithography

DiffCFD couples spin-coating and lithography solvers through a shared process parameterization:

from diffcfd.workflows import optimize_joint_process, optimize_decoupled_process

# Joint co-optimization: spin profile omega(t) + exposure dose simultaneously
result = optimize_joint_process(target_developed_h_nm=60.0, n_epochs=50)

# Decoupled baseline for comparison
baseline = optimize_decoupled_process(target_developed_h_nm=60.0)

# Process window analysis around the optimum
from diffcfd.workflows import process_window_analysis
window = process_window_analysis(result["omega_profile"], result["dose_tensor"], spin_dt=0.001)

Joint optimization produces a wider process window and lower final loss than sequential spin-then-dose optimization.

Flagship Demo

Run the end-to-end joint vs decoupled comparison with process window analysis:

python scripts/flagship_flow_litho.py

This script runs both optimize_joint_process and optimize_decoupled_process, performs process window analysis around each optimum, prints a summary table, and writes flagship_flow_litho_results.json.

Flagship evidence (post-K1 fix, 10-seed sweep, Wilcoxon p=0.002):

Metric Joint Decoupled Delta
Valid seeds 10/10 (0% NaN) 10/10 (0% NaN) K1 fix eliminated NaN
final_loss mean (std) 3.236e+03 (1.480e+03) 3.728e+03 (1.394e+03) Joint 13.2% lower
final_developed_nm mean 2816.8 nm 3041.3 nm Joint 224.5 nm closer
Wilcoxon p-value p=0.002 (loss), p=0.002 (developed) Significant (p<0.05)
wall_time Slower Faster Joint optimizes both simultaneously

Process window note (N1 fix): The process window metric now uses a self-derived target from the nominal-dose forward pass (tolerance ±2%) instead of the previous hardcoded 50±10 nm which was invalid at the µm-scale output range. The 10-seed re-sweep (2026-05-30) confirmed process window widths: Joint 11.9±7.7 mJ/cm² vs Decoupled 13.2±5.8 mJ/cm² (p=0.13, not significant).

The K1 fix (semi-implicit integration + adaptive dt + finite guard) eliminated the NaN divergence that previously affected 7/10 seeds (70% NaN rate). The post-fix 10-seed sweep confirms 0% NaN rate and a statistically significant advantage for joint optimization on both final_loss and final_developed_nm (Wilcoxon p=0.002). Joint wins on all optimization metrics except wall_time, where it is slower due to simultaneous optimization of spin profile and exposure dose.


Validation (Verified)

Case Re Target Result Status
Lid-driven cavity u-velocity (64²) 100 L2 < 1% < 1% Pass
Lid-driven cavity u-velocity (128²) 1000 L2 < 2% < 2% Pass
Poiseuille ∂ΔP/∂U_inlet 1 < 0.01% vs analytical < 0.01% Pass
torch.autograd.gradcheck (Poiseuille) 1 passes passes Pass
Pure conduction Nusselt number Nu = 1.0 1.0000 Pass
Backward-facing step (Brinkman) 100 bounded, recirculating pass Pass

Flagship Evidence Status

Claim Code Tests Data Status
Joint litho-CFD optimization (optimize_joint_process) diffcfd/workflows/joint_litho_opt.py tests/unit/test_joint_litho.py, tests/unit/test_flagship_flow_litho.py flagship_flow_litho_results.json (10-seed sweep) Verified
Process window analysis (process_window_analysis) diffcfd/workflows/joint_litho_opt.py tests/unit/test_flagship_flow_litho.py flagship_flow_litho_results.json Verified
sCO2 transcritical property surrogate (SCO2Surrogate) diffcfd/props/sco2.py tests/unit/test_sco2.py README Table 4 (measured 14.4 s training) Verified
Variable-property conjugate heat transfer (HeatTransfer2D + props) diffcfd/solvers/heat_transfer.py tests/unit/test_heat_transfer.py README Table 4 (accuracy numbers) Verified
Matrix-free implicit differentiation (GMRES) diffcfd/solvers/implicit_diff.py tests/validation/test_gradients.py README Table 3 (measured gradient accuracy) Verified
Rust-accelerated forward kernels src/momentum.rs, src/pressure.rs, src/simple.rs tests/validation/test_lid_driven_cavity.py README Table 2 (measured wall-clock) Verified
FNO surrogate-in-the-loop diffcfd/surrogates/fno.py tests/unit/test_surrogates.py Internal Verified
Solver-in-the-loop learned closure (C9.1) diffcfd/solvers/learned_closure.py tests/unit/test_learned_closure.py Stability curve + a-priori/a-posteriori report Verified
Diff-FlowFSI cross-validation (C9.2) diffcfd/validation/diff_flowfsi_crossval.py tests/unit/test_diff_flowfsi_crossval.py Forward + gradient agreement report Verified
Codomain flow control transfer (C9.3) diffcfd/envs/codomain_control.py tests/unit/test_codomain_control.py Transfer benchmark report Verified
Topology optimization diffcfd/workflows/topology.py tests/unit/test_filters.py Quick Start example output Verified

Compatibility

Dependency Version
Python 3.10+
PyTorch 2.12+
diff-surrogate 0.2.0

Sister projects: DiffNano (nanophotonics), OpenLithoHub (lithography benchmarking), diff-surrogate (shared surrogate framework).


Performance & Benchmarks

All data below were measured on AMD Ryzen 5 5600G (6 cores), 13 GB RAM, Ubuntu 22.04, Python 3.10, PyTorch 2.12+cpu, Rust 1.95. No values are estimated or extrapolated.

Table 1 — Comparison with Published Methods

Aspect DiffCFD (this work) PhiFlow [1] JAX-Fluids [2] JAX-Fluids 2.0 [4] Diff-FlowFSI [5] SU2 adjoint [3]
Differentiation Implicit (matrix-free GMRES) Automatic (JAX tracing) Automatic (JAX tracing) Automatic (JAX tracing) Automatic (JAX tracing) Discrete adjoint
Steady-state support SIMPLE-converged steady states Transient time-stepping only Transient only Transient only Transient only Steady (compressible)
Memory (backward) O(N·k), k = GMRES restart O(N·T), T = time steps O(N·T) O(N·T) O(N·T) O(N)
Backend PyTorch JAX JAX JAX JAX C++ / hand-derived
RL integration gymnasium.Env gymnax (JAX-only) None None None None
Conjugate heat transfer Yes No No No No No
sCO2 surrogate Yes No No No No No

Comparability note: The memory scaling claim (O(N·k)) is a structural property of restarted GMRES, not a measured speedup over other tools. Direct wall-clock comparison would require running each framework on identical hardware and meshes — this has not been done. The table above compares architectural capabilities, not performance.

Additional Related Work (2025–2026)

Reference Venue / Year Relevance
Differentiable supercritical topology optimization 2026 Polynomial thermodynamic models for sCO₂ transcritical optimization
OpenMDAO/MPhys CHT 2026 Modular discrete adjoint conjugate heat transfer framework
GAOT v4 NeurIPS 2025, arXiv:2505.18781 Multi-scale attention geometry-aware operator transformer
GINOT CMAME 2026 Surface point-cloud encoding + cross-attention geometry injection for neural operators
DNOT Eng. with Computers 42:60, 2026 Feature-diffusion enhanced neural operator transformer
DD-DeepONet Eng. Appl. Artif. Intell. 2026 Domain decomposition DeepONet
Schwarz Neural Inference arXiv:2504.00510 v2, 2026-02 Local→global domain decomposition operator learning

DiffCFD's differentiation: PyTorch-native (vs JAX in most others), steady-state implicit differentiation (vs transient-only in all JAX frameworks), gymnasium.Env RL integration, conjugate heat transfer, and sCO2 transcritical property surrogate — all on CPU without GPU requirement.

[1] Holl, P., Kuckelberg, P., Thuerey, N. PhiFlow — a differentiable PDE solving framework. GitHub: tum-pbs/PhiFlow. [2] Bezgin, D. A., Buhendwa, A. B., Adams, N. A. "JAX-Fluids: A fully differentiable high-order computational fluid dynamics solver for compressible two-phase flows." Computer Physics Communications, 2023. [3] Economomon, T. D. et al. "The SU2 Project." AIAA Journal, 2016. [4] Bezgin, D. A. et al. "JAX-Fluids 2.0." Computers & Physics Communications 309, 2025. [5] Diff-FlowFSI: GPU-optimized differentiable fluid-structure interaction in JAX. arXiv:2505.23940, 2025.

Table 2 — Solver Performance (Measured)

Wall-clock time for steady-state SIMPLE convergence (tol=1e-5), single-threaded CPU.

Case Grid Time (s) L2 Error Target
Cavity Re=100 32² 5.6 1.96% < 2%
Cavity Re=100 64² 54.6 0.85% < 1%
Cavity Re=1000 128² 1316.6 < 2%
Poiseuille Re=1 32×16 0.45% < 1%
Poiseuille Re=1 64×32 0.10% < 0.5%
Poiseuille Re=1 128×64 0.03% < 0.1%

Note: Cavity Re=100 at 128² takes ~2 min, Re=1000 at 128² takes ~22 min — higher Re requires more SIMPLE iterations and tighter under-relaxation. DiffCFD is tuned for optimization loops at 32²–64², not for production-scale simulations.

Table 3 — Gradient Accuracy (Measured)

Implicit differentiation vs finite difference for Poiseuille ∂ΔP/∂U_inlet (analytical = 48.0).

Grid FD Gradient AD Gradient |AD − FD| / |FD|
16×8 52.339 52.338 1.97×10⁻⁵
32×16 51.947 51.950 4.19×10⁻⁵
48×24 52.353 52.355 3.23×10⁻⁵

torch.autograd.gradcheck passes at (8×4) with atol=1e-3.

Table 4 — sCO₂ Surrogate Accuracy (Measured)

C₄ neural network trained on 8 000 NIST-referenced samples, 1 000 epochs, 14.4 s training time.

Property Relative L2 Positive?
Density ρ 1.7% Yes
Viscosity μ 0.43% Yes
Conductivity k 8.3% Yes
Specific heat cₚ 1.0% Yes

Limitation: Conductivity relative L2 (8.3%) is notably higher than other properties — the surrogate struggles near the critical point (Tc = 304.13 K) where k has a sharp peak. This is a known difficulty for polynomial/neural surrogates in transcritical regimes.

Visualization

Solver wall-clock time Grid convergence

Gradient accuracy Memory scaling (conceptual)

Charts use transparent backgrounds and neutral gray text for light/dark theme compatibility.

How to Reproduce

# 1. Install dependencies
pip install maturin torch numpy scipy gymnasium matplotlib
maturin develop --release

# 2. Run validation benchmarks (11 cases, ~30 min)
python tests/benchmarks/benchmark_suite.py

# 3. Run performance benchmarks with percentile stats
python tests/benchmarks/benchmark_performance.py --json results/perf_bench.json

# 4. Flagship flow-litho co-optimization (multi-seed with Wilcoxon tests)
make flagship-b          # 10 seeds, full report
make flagship-b-ci       # 3 seeds, CI smoke test
# Or directly:
python3 scripts/flagship_flow_litho.py                  # single seed
python3 scripts/flagship_flow_litho.py --seed-sweep     # 10 seeds

# 5. Regenerate charts
python docs/benchmark_charts.py

Hardware used for all results above:

Component Value
CPU AMD Ryzen 5 5600G (6 cores / 12 threads)
RAM 13 GB DDR4
OS Ubuntu 22.04, kernel 6.8
Python 3.10.12 (CPython)
PyTorch 2.12.0+cpu
Rust 1.95.0 (maturin/PyO3)

Methodology: Timing uses time.perf_counter() with GC disabled during measurement. Performance benchmarks run 3 warmup iterations followed by 5 sampled iterations, reporting median/P95/P99. Validation benchmarks run once and report total wall-clock time. No values are extrapolated to untested configurations.

All test data were obtained by actually running the above commands on the described hardware. No performance numbers are estimated, inferred, or borrowed from other publications.


Reproducibility

All flagship benchmarks and validation cases can be reproduced in clean containers or locally with a single command.

Container (Docker)

# Build the container (includes Rust toolchain, CPU-only PyTorch)
docker build -t diffcfd-flagship .

# Run flagship benchmark (3-seed CI sweep)
docker run --rm diffcfd-flagship

# Or via Make:
make docker-flagship

The container sets PYTHONHASHSEED=42 for deterministic hashing. All results are printed to stdout and written to flagship_flow_litho_results.json inside the container.

Cross-Validation (Analytical Benchmarks)

Run the solver against known closed-form solutions:

# Local
make cross-validate

# In Docker
make docker-cross-validate

Cross-validation checks:

Test Reference Gate Metric
Lid-driven cavity Re=100 Ghia et al. 1982 L2 < 1% u-velocity centerline
Poiseuille forward Re=1 Analytical parabolic L2 < 1% Outlet velocity profile
Poiseuille gradient Re=1 Finite difference (eps=0.01) rel err < 0.01% dDP/dU_inlet

Results are written to cross_validation_results.json.

Local One-Key Reproduce

make reproduce        # 3-seed flagship sweep
make cross-validate   # analytical benchmarks

Design Philosophy

DiffCFD is intentionally not a full-featured CFD code:

DiffCFD Production CFD (OpenFOAM, Fluent)
Differentiable end-to-end Not differentiable
CPU-first, GPU-capable CPU-first, MPI-parallel
2D incompressible NS + heat Full compressible, complex turbulence
Structured Cartesian + Brinkman IB Unstructured, body-fitted meshes
O(N) memory backward N/A
Single-laptop at 64²–128² Cluster-scale meshes

Use DiffCFD for optimization loops and ML training. Use OpenFOAM for final validation and production runs.

Config Hardware
64² grid, 2D, CPU Any modern laptop (~8 GB RAM)
128² grid, 2D, CPU 16+ GB RAM
256² grid, 2D GPU recommended
3D Out of scope for v0.x

Architecture Decision Records

ADR Title Decision
ADR-001 Framework decision -- stay on PyTorch unified graph JAX interop only via dlpack (diff_surrogate.interop); no JAX rewrite
ADR-002 Thread affinity for Rust/PyTorch coordination Not needed; PyTorch internal thread pool handles parallelism adequately (<5% contention)

Architecture

diffcfd/
├── solvers/
│   ├── navier_stokes_2d.py    # 2D incompressible NS + SIMPLE (Rust-accelerated forward)
│   ├── heat_transfer.py       # Conjugate heat transfer
│   ├── turbulence.py          # Frozen eddy viscosity (Re > 5000)
│   ├── implicit_diff.py       # Matrix-free GMRES backward (auto diagonal preconditioner)
│   ├── fsi.py                 # FSI implicit differentiation (C7.2)
│   ├── boundary.py            # Boundary condition enforcement + blowing/suction control (C7.3)
│   ├── spin_coating.py        # Differentiable spin coating (Meyerhofer + radial PDE)
│   ├── litho.py               # Differentiable lithography solver (Dill exposure + Mack develop)
│   └── learned_closure.py     # LearnedClosureNet, SolverInTheLoopTrainer (C9.1)
├── envs/
│   ├── cylinder_wake.py       # Cylinder wake RL (Mode B)
│   ├── heat_exchanger.py      # Heat exchanger fin (Mode A)
│   ├── codomain_control.py    # CodomainActor, TransferAdapter, CodomainPPO (C9.3)
│   └── base.py
├── geometry/
│   ├── mesh.py                # Cartesian mesh + SDF Brinkman mask
│   ├── shapes.py              # SDFs (cylinder, rectangle, NACA)
│   ├── airfoil.py             # NACA 4-digit + B-spline
│   └── filters.py             # Helmholtz filter for manufacturing constraints
├── workflows/
│   ├── aero.py                # Aerodynamic shape optimization
│   ├── topology.py            # Topology optimization + Helmholtz filter
│   ├── pche.py                # PCHE channel optimization
│   ├── spin_coat_opt.py       # Spin coating profile optimization
│   └── joint_litho_opt.py     # Joint spin-coating + lithography co-optimization
├── props/
│   ├── ideal_gas.py           # Abstract ThermophysicalProps + ConstantProps
│   ├── eos.py                 # Polynomial and cubic-spline equation of state (C7.1)
│   └── sco2.py                # sCO2 transcritical property surrogate (C4)
├── surrogates/
│   ├── fno.py                 # Fourier Neural Operator for flow prediction
│   └── simple_surrogate.py    # CNN surrogate for SIMPLE acceleration
├── validation/
│   ├── cross_validation.py    # LidDrivenCavityBenchmark, GradientCrossValidation (C8.1)
│   └── diff_flowfsi_crossval.py  # DiffFlowFSICrossValidator, GPUBenchmarkSuite (C9.2)
├── control/
│   └── rl_ad.py               # ADWarmStartPPO, ADAugmentedPPO, SimplePPO (C8.2)
├── adjoint/
│   └── transient.py           # TransientHeatAdjoint, TransientCheckpointSchedule (C8.3)
├── uncertainty/
│   └── sco2_calibrated.py     # SCO2CalibratedPredictor, UncertaintyPropagation (C8.4)
├── export/
│   └── vtk.py                 # VTK export for ParaView
└── utils/
    ├── linalg.py              # Matrix-free GMRES
    └── threading.py           # Thread affinity helpers (Rust/PyTorch coordination)
src/ (Rust via PyO3/maturin, at repo root)
├── lib.rs                     # PyO3 module registration
├── momentum.rs                # Sparse momentum system assembly (CSR)
├── pressure.rs                # Pressure correction system assembly (CSR)
├── sdf.rs                     # B-spline SDF (rayon parallel)
├── simple.rs                  # Full SIMPLE forward loop (faer sparse LU)
└── utils.rs                   # Shared helpers (hybrid scheme, COO→CSR)

Roadmap

Milestone Scope Status
v0.1 2D NS + matrix-free implicit diff + validation Done
v0.2 Conjugate heat transfer + sCO₂ surrogate Done
v0.3 Gymnasium environments (CylinderWake + HeatExchanger) Done
v0.35 Frozen eddy viscosity for Re > 5000 Done
v0.4 NACA + B-spline aerodynamic shape optimization Done
v0.4.1 Helmholtz filter + topology optimization Done
v0.5 FNO surrogate-in-the-loop Done
v0.6 sCO₂ PCHE optimization + sCO2-TMSR-Toolkit integration Done
v0.7 Rust-accelerated forward kernels (maturin/PyO3) Done
v0.75 Differentiable spin coating + lithography solvers Done
v0.8 Polynomial/Spline EOS (C7.1), FSI implicit differentiation (C7.2), blowing/suction boundary control (C7.3), containerized reproducibility (C7.4) Done
v0.9 External CFD cross-validation (C8.1), RL x AD hybrid control (C8.2), transient adjoint (C8.3), sCO2 calibrated uncertainty (C8.4) Done
v0.95 Solver-in-the-loop learned closure (C9.1), Diff-FlowFSI cross-validation (C9.2), codomain flow control transfer (C9.3) Done
v1.0 Full benchmark suite 11/11 pass + arXiv paper Planned

Competitive Positioning

What it is: A differentiable CFD toolkit with steady-state implicit differentiation and standard gymnasium.Env RL interface — for physics-informed optimization and control research.

Where it leads:

  • Steady-state implicit differentiation × gymnasium.Env: This combination has no overlap with existing tools. HydroGym uses gymnax (JAX-based, not standard gymnasium); FluidGym detaches gradients in differentiable mode. DiffCFD provides both differentiable physics and standard RL in one package.
  • Hybrid AD×RL control: AD-warm-started PPO with gradient-augmented rewards — a novel training paradigm for flow control.
  • Transient adjoint with checkpointing: Memory-efficient discrete adjoint for conjugate heat transfer / FSI, enabling gradient-based optimization of coupled thermal-fluid systems.
  • Solver-in-the-loop learned closure: A-posteriori rollout training with stability curve analysis — closes the train-deploy gap for learned turbulence models.
  • Codomain flow control transfer: Cross-Reynolds/geometry transfer via codomain-attention adapters, enabling zero-shot generalization to unseen flow regimes.
  • Diff-FlowFSI cross-validation: Independent gradient verification against a JAX-based differentiable FSI framework, providing external credibility for DiffCFD's implicit differentiation.

Where it lags (honest assessment):

  • Scale: 2D, moderate Reynolds numbers, single workstation. 2-4 orders of magnitude behind JAX-Fluids (512×A100 GPU clusters) and Diff-FlowFSI (GPU turbulence + FSI).
  • Validation: Ghia/Poiseuille analytical solutions + cross-validation against Diff-FlowFSI. No experimental or production CFD validation.
  • Maturity: Research prototype. No users, no production deployments.

Bottom line: Niche but unique — the only framework combining differentiable steady-state CFD with standard RL. Value is in the training paradigm exploration, not in solver scale or speed.


Contributing

This repository is currently in an early-development phase. Pull requests touching diffcfd/solvers/* are not being accepted until the API stabilizes. Discussion issues and benchmark proposals are welcome.


License

Apache License 2.0

About

Differentiable CFD for optimization and reinforcement learning — PyTorch-native Navier-Stokes and heat transfer solvers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages