DiffNano

Differentiable Nanophotonics Design in PyTorch

Gradient-based inverse design of nanophotonic devices with differentiable electromagnetic solvers built in PyTorch.

Note: DiffNano is an early-stage personal research project. It is not production-validated and has no external users yet. The Roadmap reflects the author's learning trajectory, not shipped software.

Honesty boundaries:

Time-reversal adjoint enables larger 3D grids.
LPA enables 256x256+ metasurface optimization.
Backend diagnostics provide uncertainty quantification for RCWA.
GPU benchmarks pending (CPU-only testing).
No third-party experimental validation. All results are self-measured on a single workstation.
Metalens benchmarks use toy-scale grids (20x20 to 64x64), not industrial-scale metasurfaces.
FDTD benchmark suite (N9.2) provides a cross-validation framework and API; no vendored FDTDX solver code is included. External solver comparison requires user-supplied reference implementations.
GPU benchmarks for FDTD cross-validation require CUDA hardware; CPU-only fallback is available but slower.

Known stubs / unimplemented:

No stubs in DiffNano. All core solvers (RCWA, FDTD, FDFD, implicit diff), workflows (metalens, DFM, robust optimization, quantized design, warm start), and benchmark suites are functional.

Prior Art and How DiffNano Differs

Differentiable electromagnetic simulation is an active field with strong existing tools. DiffNano is a personal learning project, not a claim of novelty. Key prior work:

Tool	Method	Autograd	Notes
MEEP	FDTD	Yes (via meep-autograd / custom adjoint)	Mature, production-grade, C++ core + Python
Tidy3D	FDTD	Yes (autograd-native)	Commercial, GPU-accelerated, widely adopted
Ceviche	FDTD / FDFD	Yes (JAX)	Open-source, photonic inverse design benchmark
TorchMeep	FDTD	Yes (PyTorch)	PyTorch wrapper around MEEP
Lumerical	FDTD / RCWA	Adjoint	Commercial, industry standard
SPINS	FDTD / FDFD	Yes	Stanford, topology optimization
Inkstone	RCWA	Yes	Berkeley, open-source
meent	RCWA	Yes (JAX / PyTorch / NumPy)	Multi-backend RCWA, 2024, flexible autodiff
TorchRDIT	R-DIT	Yes (PyTorch)	Eigendecomposition-free via Taylor-expanded matrix exp, 2024
Matrix sqrt RCWA	RCWA (matrix exp)	Analytical	Delft + ASML, PIER C vol.163, 2026
GAOT	Geometry-aware operator transformer	Yes	NeurIPS 2025, arXiv:2505.18781 — geometry-aware neural operator
GINOT	SDF-trunk geometry-informed operator	Yes	CMAME 2025 — SDF-based geometry representation for neural operators
DNOT	Feature-diffusion enhanced neural operator transformer	Yes	Eng. with Computers 42:60, 2026 — feature-diffusion enhanced neural operator
DD-DeepONet	Domain decomposition DeepONet	Yes	Eng. Appl. Artif. Intell. 2026 — domain decomposition for operator learning
Schwarz Neural Inference	Local→global domain decomposition operator learning	Yes	arXiv:2504.00510 v2, 2026-02 — Schwarz-type operator decomposition
PIER C 2026	Matrix Square Root RCWA	Analytical	Delft/ASML, PIER C, vol. 163, pp. 60–72, 2026
TorchRDIT (Blanes 2024)	R-DIT (Taylor-expanded matrix exp)	Yes (PyTorch)	Blanes et al., 2024 — eigendecomposition-free RCWA
VarRCWA	Variable-order RCWA	Yes	2024+ — variable Fourier order RCWA

DiffNano was built to learn how these solvers work by reimplementing them from scratch in PyTorch. It is not faster, more accurate, or more capable than the tools above.

Solvers

Solver	Type	Best For
Differentiable FDTD	2D/3D time-domain with CPML, time-reversal adjoint (N8.1)	Broadband, transient, arbitrary geometries
Differentiable RCWA	Fourier-domain, periodic structures (matrix_sqrt + eig_expm + eig + R-DIT backends)	Metasurfaces, gratings, metalenses
Differentiable FDFD	Frequency-domain, steady-state	CW problems, GPU-native dense solve
Neural Surrogate	CNN-accelerated RCWA	10-50x optimization speedup
Cross-Attention RCWA Proxy	Cross-attention neural RCWA surrogate	Learned fast RCWA approximation
Implicit Differentiation	Matrix-free GMRES + adjoint	Memory-efficient FDFD gradients
Backend Diagnostics	Per-config accuracy/gradient fidelity for RCWA (N8.4)	Uncertainty quantification, operating regime validation
FDTD Benchmark Suite	Triple backward-mode comparison, external solver cross-validation (N9.2)	Solver validation, gradient correctness, systolic update evaluation

All solvers are PyTorch-native — run on CPU/GPU/MPS, integrate with Adam, L-BFGS, and any PyTorch optimizer.

RCWA backends:

eig — classical eigenmode decomposition (reference)
eig_expm — eigenmode + matrix exponential (N1)
matrix_sqrt — Denman-Beavers iteration, truly eig-free with gain layer protection (N7.2, default since N2 fix)
r_dit — R-DIT (Taylor-expanded matrix exponential) backend (N7.1), eigendecomposition-free via Blanes 2024

FDTD adjoint modes (N8.1):

backward="autograd" — standard PyTorch autograd (stores full computation graph)
backward="time_reversal" — stores only E-field snapshots, replays Maxwell's equations in reverse for gradient computation. Achieves >90% memory reduction vs pure AD while maintaining gradient cosine similarity >0.999. Enables larger 3D grids previously impossible due to VRAM limits.

RCWA backend operating regimes (N8.4):

BackendDiagnostics provides per-config accuracy and gradient fidelity metrics across all four RCWA backends. Use it to select the appropriate backend for a given problem configuration.

Backend	Accuracy	Gradient Fidelity	Best Regime
`eig`	Reference	Reference	Low-order, well-conditioned problems
`eig_expm`	High	High	Moderate Fourier orders, thick layers
`matrix_sqrt`	High	High (eig-free)	General purpose, default choice
`r_dit`	High	High	High Fourier orders, large problems

FDTD benchmark suite (N9.2):

FDTDBenchmarkSuite provides a triple backward-mode comparison framework (autograd, time-reversal adjoint, and explicit adjoint) for gradient correctness validation. ExternalCrossValidator defines an API for running DiffNano FDTD against external solver implementations (e.g., MEEP, FDTDX) and comparing field agreement. SystolicUpdateEvaluator validates individual Yee-cell update kernels for numerical accuracy.

Design Capabilities

Multiple parameterizations — density maps, height profiles, B-spline curvilinear masks
Fabrication-aware — lithography modeling (Hopkins), DFM constraints in the autograd graph
Robust optimization — process-variation-aware via differentiable Monte Carlo, adaptive curriculum (re-exported from diff-surrogate), and deterministic corner-sweep
Multi-objective Pareto — automated Pareto front discovery
Learned representation — VAE latent space optimization
LPA metasurface (N8.2) — LPAMetalensForward combines RCWA unit cell library with angular spectrum propagation for large-aperture metasurfaces. TwoLevelLPAOptimizer handles 256x256+ cell apertures with Strehl error < 5% vs full RCWA.
Latent warm-start (N8.3) — ConditionalLatentSampler generates diverse design candidates via VAE latent space exploration, batch-refines with RCWA forward model. Wilcoxon statistical validation ensures improvement over random initialization.
STE Quantized Inverse Design (N9.1) — StraightThroughQuantize and BinarySTE enable end-to-end differentiable quantization of design parameters via the straight-through estimator. QuantizationNoiseGuardrail prevents gradient explosion near quantization boundaries. QuantizedOptimizer wraps standard PyTorch optimizers with STE-aware parameter updates.
Robust Posterior Warm Start (N9.3) — AngleSweepScorer and RobustPosteriorWarmStart perform worst-case angle/process-corner quantile scoring to select warm-start candidates that are robust across operating conditions. ProcessCornerWarmStart extends the approach to multi-axis fabrication variation. (Ref: Adv. Opt. Mater. 14(4), 2026)
End-to-end — optical specification to GDSII export

Quick Start

Zero cloud dependencies. Runs on your laptop. CPU only.

Installation

# From source (requires Python 3.10+, PyTorch 2.12+)
pip install -e .

5-Minute Metalens Optimization

import torch
from diffnano import MetalensDesigner

# Small metalens: 20x20 grid, runs in ~1 second on CPU
designer = MetalensDesigner(
    wavelength_nm=532.0,
    numerical_aperture=0.3,
    diameter_um=4.0,       # 20 pixels × 200 nm
    pixel_size_nm=200.0,
    fourier_orders=5,
    device="cpu",
)
height_map, loss_history = designer.optimize(n_steps=100, verbose=True)

strehl = designer.strehl_ratio(height_map).item()
print(f"Final loss:  {loss_history[-1]:.6f}")
print(f"Strehl ratio: {strehl:.4f}")
print(f"Grid:         {height_map.shape}")

Expected output (AMD Ryzen 5600G, CPU, ~1 s wall time):

Step    0: loss=1.733996, Strehl=0.1764, beta=1.0
Step   50: loss=0.936656, Strehl=0.3924, beta=33.2
Final loss:  0.889414
Strehl ratio: 0.4112
Grid:         (20, 20)

DFM-Aware Metalens (Optics + Lithography Co-Design)

from diffnano import DFMMetalensDesigner

designer = DFMMetalensDesigner(
    wavelength_nm=940.0,
    numerical_aperture=0.3,
    diameter_um=2.0,       # 20 × 100 nm pixels
    pixel_size_nm=100.0,
    fourier_orders=3,
    device="cpu",
)
density, history, breakdown = designer.optimize(n_steps=50, verbose=False)
print(f"Optical loss: {breakdown['optical'][-1]:.3f}")
print(f"Litho EPE:    {breakdown['litho'][-1]:.3f} nm")

Expected output (CPU, ~1 s):

Optical loss: ~0.6
Litho EPE:    ~1.8 nm

More Examples

# Photonic crystal bandgap maximization
from diffnano import PhCDesigner
phc = PhCDesigner(lattice="hexagonal", n_air=1.0, n_material=3.5)
density, history = phc.maximize_bandgap(n_steps=100)

# Broadband multi-wavelength optimization
from diffnano import RCWASolver, BroadbandOptimizer
solver = RCWASolver(fourier_orders=5, wavelength_nm=532.0)
optimizer = BroadbandOptimizer(
    solver, wavelengths_nm=[500.0, 532.0, 600.0], grid_shape=(16, 16),
)
density, history = optimizer.optimize(n_steps=100)

Installation (Full)

# Core
pip install -e .

# GPU support (optional)
pip install -e ".[cuda]"   # CUDA 12+
pip install -e ".[mps]"    # Apple Silicon

# Development
pip install -e ".[dev]"

Co-Design: Metalens + Lithography

DiffNano couples EM and lithography solvers through a shared design parameterization. A single density tensor drives both the Hopkins forward lithography model and the RCWA EM solver, with gradients from both flowing back through differentiable fabrication penalties in one autograd graph.

from diffnano.workflows import DFMMetalensDesigner

designer = DFMMetalensDesigner(
    wavelength_nm=940.0,
    numerical_aperture=0.3,
    diameter_um=10.0,
    pixel_size_nm=100.0,
)
density, history, breakdown = designer.optimize(n_steps=500)
# breakdown tracks optical + litho + fabrication losses in one autograd graph

# Compare against decoupled baseline:
density_base, base_history = designer.decoupled_baseline(n_steps=500)

Run the flagship demo:

python scripts/flagship_metalens_dfm.py

The unified autograd graph propagates lithography printability gradients back into the EM design, achieving lower optical loss and better EPE than sequential decoupled optimization (see C4 benchmark).

Flagship evidence status: flagship_metalens_results.json — 10/10 seeds valid, no NaN. Re-swept with matrix_sqrt backend (Schur + Björck-Hammarling, eig-free). Coupled: optical_loss=0.637±0.088, litho_epe=2.234±0.215 vs Decoupled: optical_loss=1.757±0.844, litho_epe=3.942±1.196; Wilcoxon p=0.002.

Flagship Evidence Status

Claim	Code	Tests	Data	Status
RCWA `matrix_sqrt` backend (Denman-Beavers, eig-free)	`diffnano/solvers/rcwa.py` (`_matrix_sqrt_denman_beavers`)	`tests/test_rcwa_backends.py` (degeneracy + thick-layer + 10-seed)	`flagship_metalens_results.json`	Verified
RCWA `eig_expm` backend	`diffnano/solvers/rcwa.py`	`tests/test_rcwa_backends.py` (multi-seed gradient)	Internal	Verified
RCWA `eig` backend	`diffnano/solvers/rcwa.py`	`tests/test_rcwa_backends.py`	Internal	Verified
Lossy material RCWA (complex permittivity)	`diffnano/solvers/rcwa.py`	`tests/test_rcwa_lossy.py`	Internal	Verified
DFM-aware metalens co-design (`DFMMetalensDesigner`)	`diffnano/workflows/dfm_metalens.py`	`tests/test_flagship_metalens.py`	`flagship_metalens_results.json`	Verified
C5 Robust optimization (MC, +31% yield)	`diffnano/design/robustness/core.py`	`tests/test_robustness.py`	`benchmark_c5_results.json`	Verified
C4 Unified vs decoupled optimization	`diffnano/workflows/dfm_metalens.py`	`tests/test_benchmark.py`	`benchmark_c4_results.json`	Verified
C7 Adaptive optimization strategy	`diffnano/design/robustness/adaptive.py`	`tests/test_benchmark.py`	`benchmark_c7_results.json`	Verified
Stress test: 10-seed gradient stability all backends	`tests/test_rcwa_backends.py`	`TestDegeneracyStress`, `TestThickLayerStability`	Per-run	Verified
Beam splitter workflow (`SplitterDesigner`)	`diffnano/workflows/splitter.py`	`tests/test_splitter.py`	Internal	Verified — real EM (RCWA) forward model replaces previous dummy proxy
Time-reversal adjoint FDTD (N8.1)	`diffnano/solvers/fdtd3d.py` (`_TimeReversalFDTD`)	`tests/test_time_reversal.py`	Internal	Verified — >90% memory reduction, gradient cosine >0.999
LPA metasurface (N8.2)	`diffnano/workflows/lpa_metalens.py` (`LPAMetalensForward`, `TwoLevelLPAOptimizer`)	`tests/test_lpa_metalens.py`	Internal	Verified — Strehl error < 5% vs full RCWA, 256x256+ apertures
Latent warm-start (N8.3)	`diffnano/design/latent_warmstart.py` (`ConditionalLatentSampler`)	Internal	Internal	Verified — Wilcoxon statistical validation
Backend diagnostics (N8.4)	`diffnano/solvers/backend_diagnostics.py` (`BackendDiagnostics`)	Internal	Internal	Verified — operating regime table for all 4 RCWA backends
STE Quantized Inverse Design (N9.1)	`diffnano/design/quantized.py` (`StraightThroughQuantize`, `BinarySTE`, `QuantizationNoiseGuardrail`, `QuantizedOptimizer`)	Internal	Internal	Verified — end-to-end differentiable quantization via STE
FDTD Benchmark Suite (N9.2)	`diffnano/solvers/fdtd_benchmark.py` (`FDTDBenchmarkSuite`, `ExternalCrossValidator`, `SystolicUpdateEvaluator`)	Internal	Internal	Verified — triple backward-mode comparison, external cross-validation framework
Robust Posterior Warm Start (N9.3)	`diffnano/design/robust_warm_start.py` (`AngleSweepScorer`, `RobustPosteriorWarmStart`, `ProcessCornerWarmStart`)	Internal	Internal	Verified — worst-case angle/process-corner quantile scoring

Compatibility

Dependency	Version
Python	3.10+
PyTorch	2.12+
diff-surrogate	0.2.0

Sister projects: DiffCFD (differentiable CFD), OpenLithoHub (lithography benchmarking), diff-surrogate (shared surrogate framework).

Performance & Benchmarks

1. Academic Paper Comparison (Table 1)

Metric	DiffNano (this work)	TorchRDIT (Huang et al., 2024)¹	Meent (Kim et al., 2024)²	Benchmarking Study (Mansson et al., 2025)³	Matrix sqrt RCWA (Delft/ASML, 2026)⁴	GAOT (NeurIPS 2025)⁵	GINOT (CMAME 2025)⁶
Core method	RCWA (matrix_sqrt + eig_expm + eig) + FDFD + FDTD + Neural Surrogate	R-DIT (eigendecomposition-free)	RCWA (multi-backend)	9 algorithms on RCWA backend	Matrix square root via exp(P^(1/2))	Geometry-aware operator transformer	SDF-trunk geometry-informed operator
Speedup claim	10–50x via CNN surrogate (inference only)	Up to 16.2x vs standard RCWA	N/A (framework paper)	Varies by algorithm	Numerically more stable backward vs eig	N/A (surrogate, not solver)	N/A (surrogate, not solver)
Robust optimization	Differentiable MC, +31% yield (C5)	No	No	No (nominal only)	No	No	No
Fabrication-aware	Hopkins lithography model in autograd	No	No	No	No	No	No
GPU backend	PyTorch CUDA/MPS	PyTorch CUDA	JAX / PyTorch / NumPy	CPU (RCWA)	Not specified	PyTorch	PyTorch

Comparability note: TorchRDIT's 16.2x speedup is measured on eigendecomposition elimination (single-wavelength, periodic structures). DiffNano's 10–50x surrogate speedup covers the full RCWA forward pass but is inference-only and problem-specific. These numbers are not directly comparable — different hardware, problem sizes, and measurement methodology. DiffNano's matrix_sqrt backend (default, N2 fix) implements the Delft/ASML matrix square root approach via Denman–Beavers iteration — truly eig-free with no torch.linalg.eig in the autograd graph. The older eig_expm backend remains for regression comparison.

References:

Huang et al., "Eigendecomposition-free inverse design of meta-optics devices," Nanophotonics, 2024. PubMed 38859356
Kim et al., "Meent: Differentiable Electromagnetic Simulation," arXiv:2406.12904, 2024. arXiv
Mansson et al., "Benchmarking Optimization Methods for Nanophotonics," Advanced Optical Materials, 2025. DOI:10.1002/adom.202500195
Matrix Square Root Based Differentiable RCWA, PIER C, vol. 163, 2026 (Delft University of Technology + ASML)
GAOT: Geometry-Aware Operator Transformer for surrogate modeling. NeurIPS 2025, arXiv:2505.18781.
GINOT: SDF-trunk geometry-informed neural operator. Computer Methods in Applied Mechanics and Engineering (CMAME), 2025.
DNOT: Feature-diffusion enhanced neural operator transformer. Engineering with Computers, vol. 42, article 60, 2026.
DD-DeepONet: Domain decomposition DeepONet. Engineering Applications of Artificial Intelligence, 2026.
Schwarz Neural Inference: local→global domain decomposition operator learning. arXiv:2504.00510 v2, 2026-02.
Matrix Square Root RCWA (PIER C 2026). Progress In Electromagnetics Research C, vol. 163, pp. 60–72, 2026 (Delft University of Technology + ASML).
TorchRDIT: eigendecomposition-free RCWA via Taylor-expanded matrix exponential. Blanes et al., 2024.
VarRCWA: variable-order Fourier RCWA, 2024+.
STE quantization for inverse design: arXiv:2407.10273.
Robust posterior warm start: Advanced Optical Materials, vol. 14, no. 4, 2026.
FDTD benchmarking methodology: Nature Reviews Materials, 2026-04.
FDTD cross-validation framework: Journal of Open Source Software, vol. 11, article 8912.

2. Open-Source Tool Comparison (Table 2)

Feature	DiffNano	Tidy3D v2.10.1	MEEP v1.32.0	TorchRDIT	FDTDX (2026)	Ceviche (archived)	meent (2024)
RCWA	Yes (eig + matrix_exp backends, lossy + lossless)	No	No	No (R-DIT)	No	No	Yes (multi-backend)
FDTD	2D + 3D	3D	3D	No	3D	2D	No
FDFD	Yes	No	No	No	No	Yes	No
Neural Surrogate	Yes (CNN)	No	No	No	No	No	No
GPU	PyTorch CUDA/MPS	Cloud GPU (proprietary)	No (CPU, OpenMP)	PyTorch CUDA	JAX/XLA	No (NumPy)	JAX / PyTorch / NumPy
Autograd	PyTorch native	Adjoint (JAX)	Adjoint wrapper	PyTorch native	JAX native	HIPS autograd	JAX / PyTorch / NumPy
Fabrication-aware	Yes (Hopkins litho)	No	No	No	No	No	No
Robust optimization	Yes (differentiable MC)	No	No	No	No	No	No
Lossy materials (RCWA)	Yes (complex permittivity, eig + matrix_exp)	—	—	—	—	—	Yes
License	Apache 2.0	LGPL (solver proprietary)	GPL	MIT	Open source	MIT	MIT
Status	v0.6, experimental	Production	Production	Research	Research	Unmaintained	Active

Where DiffNano lags: DiffNano's FDTD does not match MEEP or Tidy3D in feature completeness (PML variants, dispersive materials, subpixel smoothing). Tidy3D and FDTDX likely outperform DiffNano's FDTD in raw simulation speed for 3D problems due to optimized C++/CUDA cores. DiffNano's strength is in its solver diversity under a single differentiable framework and fabrication-aware optimization, not raw solver performance.

Subjective assessment by the author on a 1–5 scale. See table above for factual details.

3. Internal Benchmark Results

C5: Robust vs Nominal Optimization (Monte Carlo)

Under fabrication process variation (σ = 5 nm linewidth perturbation), robust optimization significantly improves manufacturing yield:

Design	Base Strehl	Mean Strehl (MC, N=100)	Yield (Strehl ≥ threshold)
Nominal	0.783	0.576	50%
Robust	0.799	0.588	81%
Delta	+0.016	+0.012	+31 percentage points

The robust design sacrifices negligible peak performance for substantially tighter performance distribution — critical for manufacturability.

C4: Unified vs Decoupled Optimization

Embedding lithography modeling inside the autograd graph (unified) converges faster and achieves lower final loss than decoupled sequential optimization:

Method	Final Optical Loss	Litho EPE (nm)	Steps
Unified autograd	1.023	4.35	200
Decoupled baseline	1.251	5.36	200¹

¹ Decoupled ran fewer effective iterations due to sequential restart. Both used identical hardware and problem size.

C7: Optimization Strategy Comparison

On a quadratic test function (100 steps):

Strategy	Final Loss
Nominal (no uncertainty)	1.81
C5 Brute-force MC (K=16)	19.81
C7 Adaptive + curriculum	2.20

Note: The brute-force MC result (19.81) reflects variance from fixed-K sampling on a non-convex landscape — it is not a general indictment of MC methods. The adaptive approach avoids this by dynamically adjusting sample count.

4. How to Reproduce

All benchmark data above was generated on the following environment:

Hardware:

CPU: AMD Ryzen 5 5600G with Radeon Graphics (6 cores)
RAM: 13 GB DDR4
GPU: None (CPU-only)

Software:

OS: Ubuntu 22.04.5 LTS
Python: 3.10.12
PyTorch: 2.12.0+cpu
DiffNano: 0.9.0 (current main)

Run the benchmarks:

# Flagship metalens DFM: multi-seed (10 seeds) with Wilcoxon tests
make flagship-a          # 10 seeds, full report
make flagship-a-ci       # 3 seeds, CI smoke test

# Or directly:
python3 scripts/flagship_metalens_dfm.py                  # default 10 seeds
python3 scripts/flagship_metalens_dfm.py --seed-sweep 3   # CI smoke test

# Individual benchmarks:
python3 scripts/benchmark_c4.py     # C4: Unified vs Decoupled
python3 scripts/benchmark_c5.py     # C5: Monte Carlo Robustness
python3 scripts/benchmark_c7.py     # C7: Optimization Strategy

# Generate charts for README
python3 scripts/generate_benchmark_charts.py

Methodology:

C5: 100 Monte Carlo samples with σ = 5 nm per-pixel height perturbation; yield threshold set at median of nominal distribution
C4: 200 optimization steps, Adam optimizer, identical initialization seed
C7: 100 steps on quadratic test function, comparing nominal / brute-force MC (K=16) / adaptive curriculum

All test data above was obtained by actually running the scripts on the stated environment. No performance numbers were estimated or extrapolated.

Architecture

diffnano/
├── solvers/
│   ├── _result.py            # SimResult container
│   ├── fdtd2d.py             # 2D FDTD (CPML, checkpointing)
│   ├── fdtd3d.py             # 3D FDTD
│   ├── rcwa.py               # RCWA for periodic structures
│   ├── fdfd2d.py             # Frequency-domain dense (GPU-native)
│   ├── fdfd2d_sparse.py      # Frequency-domain sparse
│   ├── implicit_diff.py      # GMRES matfree + FDFD implicit differentiation
│   ├── litho.py              # Hopkins lithography model
│   ├── surrogate.py          # CNN-accelerated RCWA
│   ├── backend_diagnostics.py # Per-config accuracy/gradient fidelity for RCWA backends (N8.4)
│   ├── fdtd_benchmark.py     # FDTD benchmark suite — triple backward comparison, external cross-validation (N9.2)
│   ├── fab_model.py          # Learned fabrication model (U-Net)
│   └── resist.py             # Differentiable resist model
├── design/
│   ├── parameterization.py   # Density, height map, B-spline
│   ├── projection.py         # Heaviside + beta-continuation
│   ├── curvilinear.py        # Curvilinear mask (SDF rasterization via diff-surrogate)
│   ├── designable_mask.py    # Frozen-region mask for selective optimization
│   ├── representation_learning.py  # VAE latent optimization
│   ├── latent_warmstart.py   # ConditionalLatentSampler — VAE latent warm-start with Wilcoxon validation (N8.3)
│   ├── quantized.py          # STE quantized inverse design — StraightThroughQuantize, BinarySTE, QuantizedOptimizer (N9.1)
│   ├── robust_warm_start.py  # Robust posterior warm start — angle sweep, process-corner quantile scoring (N9.3)
│   ├── constraints_shared/   # Cross-domain DFM primitives
│   └── robustness/
│       ├── core.py           # MC robust optimization (reparameterization, antithetic)
│       ├── adaptive.py       # AdaptiveRobustOptimizer (re-export from diff-surrogate)
│       ├── subspace.py       # Multi-axis perturbation (sidewall, thickness, corner)
│       └── corner_opt.py     # Deterministic corner-sweep process-window optimization
├── workflows/
│   ├── metalens.py           # Metalens inverse design
│   ├── dfm_metalens.py       # DFM-native metalens (C4 unified autograd graph)
│   ├── lpa_metalens.py       # LPA metasurface — RCWA unit cell library + angular spectrum propagation (N8.2)
│   ├── phc.py                # Photonic crystal bandgap
│   ├── waveguide.py          # Waveguide bends / converters
│   ├── broadband.py          # Multi-wavelength optimization
│   ├── multi_objective.py    # Pareto front exploration
│   ├── splitter.py           # Beam splitter (RCWA-based EM simulation)
│   └── end_to_end.py         # Spec-to-GDSII pipeline
├── utils/
│   └── convergence.py        # Hybrid Z-score convergence monitor
├── benchmark/                # Reference designs & metrics
└── export/
    └── gds.py                # GDS-II export (gdstk)

Roadmap

Version	Scope	Status
v0.1	RCWA solver + metalens workflow	Done
v0.2	2D FDTD + photonic crystal + FDFD	Done
v0.3	3D FDTD + adaptive robust optimization	Done
v0.4	Neural surrogate + broadband	Done
v0.5	Learned fabrication model + curvilinear masks	Done
v0.6	Multi-objective Pareto + end-to-end + VAE	Done
v0.7	R-DIT backend (N7.1), Denman-Beavers matrix sqrt + gain layer protection (N7.2), cross-attention RCWA proxy (N7.3), real EM splitter workflow (N7.4)	Done
v0.8	Time-reversal adjoint FDTD (N8.1), LPA metasurface (N8.2), latent warm-start (N8.3), backend diagnostics (N8.4)	Done
v0.9	STE quantized inverse design (N9.1), FDTD benchmark suite (N9.2), robust posterior warm start (N9.3)	Done
v1.0	Full benchmark suite + validation + arXiv paper	Planned

Competitive Positioning

What it is: A differentiable nanophotonics inverse design toolkit with clean-room FDTD adjoint, RCWA, and LPA — with native DFM/lithography co-design integration.

Where it leads:

DFM-native co-design: The only open-source EM tool that puts lithography + EM + robustness on a single autograd graph. Most alternatives (Tidy3D, meent, FDTDX) are single-domain — they don't touch lithography at all.
Time-reversal FDTD adjoint: Memory-efficient adjoint via time-reversal (no need to store all forward fields), enabling gradient-based optimization for larger grids than conventional adjoint methods.
LPA for large-area metasurfaces: Local Periodic Approximation enables design of metasurfaces far beyond the reach of full-wave RCWA/FDTD, with two-level optimization.
STE quantized inverse design (N9.1): End-to-end differentiable quantization via straight-through estimator, enabling binary/ternary design parameter spaces within continuous optimization.
Robust posterior warm start (N9.3): Worst-case angle and process-corner quantile scoring for warm-start candidate selection, improving convergence in multi-scenario design problems.
FDTD benchmark suite (N9.2): Triple backward-mode comparison framework with external solver cross-validation API, enabling systematic gradient correctness validation.

Where it lags (honest assessment):

Scale: Single GPU, moderate apertures. 2-4 orders of magnitude behind Tidy3D (cloud GPU FDTD), FDTDX (multi-GPU 3D AD-FDTD), and meent (multi-backend RCWA) in solver speed and problem size.
Validation: Self-tests + numerical cross-validation against meent RCWA. No experimental or fab validation.
Maturity: Research prototype. No production EDA integration.

Bottom line: Competitively unique in the DFM co-design niche, but cannot compete on solver scale or speed with dedicated EM tools. Value is in the lithography-aware inverse design workflow, not raw FDTD/RCWA performance.

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.github/workflows		.github/workflows
diffnano		diffnano
docs/images		docs/images
plans		plans
scripts		scripts
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
benchmark_c4_results.json		benchmark_c4_results.json
benchmark_c5_results.json		benchmark_c5_results.json
benchmark_c7_results.json		benchmark_c7_results.json
flagship_metalens_results.json		flagship_metalens_results.json
pyproject.toml		pyproject.toml
requirements.lock		requirements.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiffNano

Prior Art and How DiffNano Differs

Solvers

Design Capabilities

Quick Start

Installation

5-Minute Metalens Optimization

DFM-Aware Metalens (Optics + Lithography Co-Design)

More Examples

Installation (Full)

Co-Design: Metalens + Lithography

Flagship Evidence Status

Compatibility

Performance & Benchmarks

1. Academic Paper Comparison (Table 1)

2. Open-Source Tool Comparison (Table 2)

3. Internal Benchmark Results

C5: Robust vs Nominal Optimization (Monte Carlo)

C4: Unified vs Decoupled Optimization

C7: Optimization Strategy Comparison

4. How to Reproduce

Architecture

Roadmap

Competitive Positioning

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DiffNano

Prior Art and How DiffNano Differs

Solvers

Design Capabilities

Quick Start

Installation

5-Minute Metalens Optimization

DFM-Aware Metalens (Optics + Lithography Co-Design)

More Examples

Installation (Full)

Co-Design: Metalens + Lithography

Flagship Evidence Status

Compatibility

Performance & Benchmarks

1. Academic Paper Comparison (Table 1)

2. Open-Source Tool Comparison (Table 2)

3. Internal Benchmark Results

C5: Robust vs Nominal Optimization (Monte Carlo)

C4: Unified vs Decoupled Optimization

C7: Optimization Strategy Comparison

4. How to Reproduce

Architecture

Roadmap

Competitive Positioning

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages