Differentiable Nanophotonics Design in PyTorch
Gradient-based inverse design of nanophotonic devices with differentiable electromagnetic solvers built in PyTorch.
Note: DiffNano is an early-stage personal research project. It is not production-validated and has no external users yet. The Roadmap reflects the author's learning trajectory, not shipped software.
Honesty boundaries:
- Time-reversal adjoint enables larger 3D grids.
- LPA enables 256x256+ metasurface optimization.
- Backend diagnostics provide uncertainty quantification for RCWA.
- GPU benchmarks pending (CPU-only testing).
- No third-party experimental validation. All results are self-measured on a single workstation.
- Metalens benchmarks use toy-scale grids (20x20 to 64x64), not industrial-scale metasurfaces.
- FDTD benchmark suite (N9.2) provides a cross-validation framework and API; no vendored FDTDX solver code is included. External solver comparison requires user-supplied reference implementations.
- GPU benchmarks for FDTD cross-validation require CUDA hardware; CPU-only fallback is available but slower.
Known stubs / unimplemented:
- No stubs in DiffNano. All core solvers (RCWA, FDTD, FDFD, implicit diff), workflows (metalens, DFM, robust optimization, quantized design, warm start), and benchmark suites are functional.
Differentiable electromagnetic simulation is an active field with strong existing tools. DiffNano is a personal learning project, not a claim of novelty. Key prior work:
| Tool | Method | Autograd | Notes |
|---|---|---|---|
| MEEP | FDTD | Yes (via meep-autograd / custom adjoint) | Mature, production-grade, C++ core + Python |
| Tidy3D | FDTD | Yes (autograd-native) | Commercial, GPU-accelerated, widely adopted |
| Ceviche | FDTD / FDFD | Yes (JAX) | Open-source, photonic inverse design benchmark |
| TorchMeep | FDTD | Yes (PyTorch) | PyTorch wrapper around MEEP |
| Lumerical | FDTD / RCWA | Adjoint | Commercial, industry standard |
| SPINS | FDTD / FDFD | Yes | Stanford, topology optimization |
| Inkstone | RCWA | Yes | Berkeley, open-source |
| meent | RCWA | Yes (JAX / PyTorch / NumPy) | Multi-backend RCWA, 2024, flexible autodiff |
| TorchRDIT | R-DIT | Yes (PyTorch) | Eigendecomposition-free via Taylor-expanded matrix exp, 2024 |
| Matrix sqrt RCWA | RCWA (matrix exp) | Analytical | Delft + ASML, PIER C vol.163, 2026 |
| GAOT | Geometry-aware operator transformer | Yes | NeurIPS 2025, arXiv:2505.18781 — geometry-aware neural operator |
| GINOT | SDF-trunk geometry-informed operator | Yes | CMAME 2025 — SDF-based geometry representation for neural operators |
| DNOT | Feature-diffusion enhanced neural operator transformer | Yes | Eng. with Computers 42:60, 2026 — feature-diffusion enhanced neural operator |
| DD-DeepONet | Domain decomposition DeepONet | Yes | Eng. Appl. Artif. Intell. 2026 — domain decomposition for operator learning |
| Schwarz Neural Inference | Local→global domain decomposition operator learning | Yes | arXiv:2504.00510 v2, 2026-02 — Schwarz-type operator decomposition |
| PIER C 2026 | Matrix Square Root RCWA | Analytical | Delft/ASML, PIER C, vol. 163, pp. 60–72, 2026 |
| TorchRDIT (Blanes 2024) | R-DIT (Taylor-expanded matrix exp) | Yes (PyTorch) | Blanes et al., 2024 — eigendecomposition-free RCWA |
| VarRCWA | Variable-order RCWA | Yes | 2024+ — variable Fourier order RCWA |
DiffNano was built to learn how these solvers work by reimplementing them from scratch in PyTorch. It is not faster, more accurate, or more capable than the tools above.
| Solver | Type | Best For |
|---|---|---|
| Differentiable FDTD | 2D/3D time-domain with CPML, time-reversal adjoint (N8.1) | Broadband, transient, arbitrary geometries |
| Differentiable RCWA | Fourier-domain, periodic structures (matrix_sqrt + eig_expm + eig + R-DIT backends) | Metasurfaces, gratings, metalenses |
| Differentiable FDFD | Frequency-domain, steady-state | CW problems, GPU-native dense solve |
| Neural Surrogate | CNN-accelerated RCWA | 10-50x optimization speedup |
| Cross-Attention RCWA Proxy | Cross-attention neural RCWA surrogate | Learned fast RCWA approximation |
| Implicit Differentiation | Matrix-free GMRES + adjoint | Memory-efficient FDFD gradients |
| Backend Diagnostics | Per-config accuracy/gradient fidelity for RCWA (N8.4) | Uncertainty quantification, operating regime validation |
| FDTD Benchmark Suite | Triple backward-mode comparison, external solver cross-validation (N9.2) | Solver validation, gradient correctness, systolic update evaluation |
All solvers are PyTorch-native — run on CPU/GPU/MPS, integrate with Adam, L-BFGS, and any PyTorch optimizer.
RCWA backends:
eig— classical eigenmode decomposition (reference)eig_expm— eigenmode + matrix exponential (N1)matrix_sqrt— Denman-Beavers iteration, truly eig-free with gain layer protection (N7.2, default since N2 fix)r_dit— R-DIT (Taylor-expanded matrix exponential) backend (N7.1), eigendecomposition-free via Blanes 2024
FDTD adjoint modes (N8.1):
backward="autograd"— standard PyTorch autograd (stores full computation graph)backward="time_reversal"— stores only E-field snapshots, replays Maxwell's equations in reverse for gradient computation. Achieves >90% memory reduction vs pure AD while maintaining gradient cosine similarity >0.999. Enables larger 3D grids previously impossible due to VRAM limits.
RCWA backend operating regimes (N8.4):
BackendDiagnostics provides per-config accuracy and gradient fidelity metrics across all four RCWA backends. Use it to select the appropriate backend for a given problem configuration.
| Backend | Accuracy | Gradient Fidelity | Best Regime |
|---|---|---|---|
eig |
Reference | Reference | Low-order, well-conditioned problems |
eig_expm |
High | High | Moderate Fourier orders, thick layers |
matrix_sqrt |
High | High (eig-free) | General purpose, default choice |
r_dit |
High | High | High Fourier orders, large problems |
FDTD benchmark suite (N9.2):
FDTDBenchmarkSuite provides a triple backward-mode comparison framework (autograd, time-reversal adjoint, and explicit adjoint) for gradient correctness validation. ExternalCrossValidator defines an API for running DiffNano FDTD against external solver implementations (e.g., MEEP, FDTDX) and comparing field agreement. SystolicUpdateEvaluator validates individual Yee-cell update kernels for numerical accuracy.
- Multiple parameterizations — density maps, height profiles, B-spline curvilinear masks
- Fabrication-aware — lithography modeling (Hopkins), DFM constraints in the autograd graph
- Robust optimization — process-variation-aware via differentiable Monte Carlo, adaptive curriculum (re-exported from diff-surrogate), and deterministic corner-sweep
- Multi-objective Pareto — automated Pareto front discovery
- Learned representation — VAE latent space optimization
- LPA metasurface (N8.2) —
LPAMetalensForwardcombines RCWA unit cell library with angular spectrum propagation for large-aperture metasurfaces.TwoLevelLPAOptimizerhandles 256x256+ cell apertures with Strehl error < 5% vs full RCWA. - Latent warm-start (N8.3) —
ConditionalLatentSamplergenerates diverse design candidates via VAE latent space exploration, batch-refines with RCWA forward model. Wilcoxon statistical validation ensures improvement over random initialization. - STE Quantized Inverse Design (N9.1) —
StraightThroughQuantizeandBinarySTEenable end-to-end differentiable quantization of design parameters via the straight-through estimator.QuantizationNoiseGuardrailprevents gradient explosion near quantization boundaries.QuantizedOptimizerwraps standard PyTorch optimizers with STE-aware parameter updates. - Robust Posterior Warm Start (N9.3) —
AngleSweepScorerandRobustPosteriorWarmStartperform worst-case angle/process-corner quantile scoring to select warm-start candidates that are robust across operating conditions.ProcessCornerWarmStartextends the approach to multi-axis fabrication variation. (Ref: Adv. Opt. Mater. 14(4), 2026) - End-to-end — optical specification to GDSII export
Zero cloud dependencies. Runs on your laptop. CPU only.
# From source (requires Python 3.10+, PyTorch 2.12+)
pip install -e .import torch
from diffnano import MetalensDesigner
# Small metalens: 20x20 grid, runs in ~1 second on CPU
designer = MetalensDesigner(
wavelength_nm=532.0,
numerical_aperture=0.3,
diameter_um=4.0, # 20 pixels × 200 nm
pixel_size_nm=200.0,
fourier_orders=5,
device="cpu",
)
height_map, loss_history = designer.optimize(n_steps=100, verbose=True)
strehl = designer.strehl_ratio(height_map).item()
print(f"Final loss: {loss_history[-1]:.6f}")
print(f"Strehl ratio: {strehl:.4f}")
print(f"Grid: {height_map.shape}")Expected output (AMD Ryzen 5600G, CPU, ~1 s wall time):
Step 0: loss=1.733996, Strehl=0.1764, beta=1.0
Step 50: loss=0.936656, Strehl=0.3924, beta=33.2
Final loss: 0.889414
Strehl ratio: 0.4112
Grid: (20, 20)
from diffnano import DFMMetalensDesigner
designer = DFMMetalensDesigner(
wavelength_nm=940.0,
numerical_aperture=0.3,
diameter_um=2.0, # 20 × 100 nm pixels
pixel_size_nm=100.0,
fourier_orders=3,
device="cpu",
)
density, history, breakdown = designer.optimize(n_steps=50, verbose=False)
print(f"Optical loss: {breakdown['optical'][-1]:.3f}")
print(f"Litho EPE: {breakdown['litho'][-1]:.3f} nm")Expected output (CPU, ~1 s):
Optical loss: ~0.6
Litho EPE: ~1.8 nm
# Photonic crystal bandgap maximization
from diffnano import PhCDesigner
phc = PhCDesigner(lattice="hexagonal", n_air=1.0, n_material=3.5)
density, history = phc.maximize_bandgap(n_steps=100)
# Broadband multi-wavelength optimization
from diffnano import RCWASolver, BroadbandOptimizer
solver = RCWASolver(fourier_orders=5, wavelength_nm=532.0)
optimizer = BroadbandOptimizer(
solver, wavelengths_nm=[500.0, 532.0, 600.0], grid_shape=(16, 16),
)
density, history = optimizer.optimize(n_steps=100)# Core
pip install -e .
# GPU support (optional)
pip install -e ".[cuda]" # CUDA 12+
pip install -e ".[mps]" # Apple Silicon
# Development
pip install -e ".[dev]"DiffNano couples EM and lithography solvers through a shared design parameterization. A single density tensor drives both the Hopkins forward lithography model and the RCWA EM solver, with gradients from both flowing back through differentiable fabrication penalties in one autograd graph.
from diffnano.workflows import DFMMetalensDesigner
designer = DFMMetalensDesigner(
wavelength_nm=940.0,
numerical_aperture=0.3,
diameter_um=10.0,
pixel_size_nm=100.0,
)
density, history, breakdown = designer.optimize(n_steps=500)
# breakdown tracks optical + litho + fabrication losses in one autograd graph
# Compare against decoupled baseline:
density_base, base_history = designer.decoupled_baseline(n_steps=500)Run the flagship demo:
python scripts/flagship_metalens_dfm.pyThe unified autograd graph propagates lithography printability gradients back into the EM design, achieving lower optical loss and better EPE than sequential decoupled optimization (see C4 benchmark).
Flagship evidence status: flagship_metalens_results.json — 10/10 seeds valid, no NaN. Re-swept with matrix_sqrt backend (Schur + Björck-Hammarling, eig-free). Coupled: optical_loss=0.637±0.088, litho_epe=2.234±0.215 vs Decoupled: optical_loss=1.757±0.844, litho_epe=3.942±1.196; Wilcoxon p=0.002.
| Claim | Code | Tests | Data | Status |
|---|---|---|---|---|
RCWA matrix_sqrt backend (Denman-Beavers, eig-free) |
diffnano/solvers/rcwa.py (_matrix_sqrt_denman_beavers) |
tests/test_rcwa_backends.py (degeneracy + thick-layer + 10-seed) |
flagship_metalens_results.json |
Verified |
RCWA eig_expm backend |
diffnano/solvers/rcwa.py |
tests/test_rcwa_backends.py (multi-seed gradient) |
Internal | Verified |
RCWA eig backend |
diffnano/solvers/rcwa.py |
tests/test_rcwa_backends.py |
Internal | Verified |
| Lossy material RCWA (complex permittivity) | diffnano/solvers/rcwa.py |
tests/test_rcwa_lossy.py |
Internal | Verified |
DFM-aware metalens co-design (DFMMetalensDesigner) |
diffnano/workflows/dfm_metalens.py |
tests/test_flagship_metalens.py |
flagship_metalens_results.json |
Verified |
| C5 Robust optimization (MC, +31% yield) | diffnano/design/robustness/core.py |
tests/test_robustness.py |
benchmark_c5_results.json |
Verified |
| C4 Unified vs decoupled optimization | diffnano/workflows/dfm_metalens.py |
tests/test_benchmark.py |
benchmark_c4_results.json |
Verified |
| C7 Adaptive optimization strategy | diffnano/design/robustness/adaptive.py |
tests/test_benchmark.py |
benchmark_c7_results.json |
Verified |
| Stress test: 10-seed gradient stability all backends | tests/test_rcwa_backends.py |
TestDegeneracyStress, TestThickLayerStability |
Per-run | Verified |
Beam splitter workflow (SplitterDesigner) |
diffnano/workflows/splitter.py |
tests/test_splitter.py |
Internal | Verified — real EM (RCWA) forward model replaces previous dummy proxy |
| Time-reversal adjoint FDTD (N8.1) | diffnano/solvers/fdtd3d.py (_TimeReversalFDTD) |
tests/test_time_reversal.py |
Internal | Verified — >90% memory reduction, gradient cosine >0.999 |
| LPA metasurface (N8.2) | diffnano/workflows/lpa_metalens.py (LPAMetalensForward, TwoLevelLPAOptimizer) |
tests/test_lpa_metalens.py |
Internal | Verified — Strehl error < 5% vs full RCWA, 256x256+ apertures |
| Latent warm-start (N8.3) | diffnano/design/latent_warmstart.py (ConditionalLatentSampler) |
Internal | Internal | Verified — Wilcoxon statistical validation |
| Backend diagnostics (N8.4) | diffnano/solvers/backend_diagnostics.py (BackendDiagnostics) |
Internal | Internal | Verified — operating regime table for all 4 RCWA backends |
| STE Quantized Inverse Design (N9.1) | diffnano/design/quantized.py (StraightThroughQuantize, BinarySTE, QuantizationNoiseGuardrail, QuantizedOptimizer) |
Internal | Internal | Verified — end-to-end differentiable quantization via STE |
| FDTD Benchmark Suite (N9.2) | diffnano/solvers/fdtd_benchmark.py (FDTDBenchmarkSuite, ExternalCrossValidator, SystolicUpdateEvaluator) |
Internal | Internal | Verified — triple backward-mode comparison, external cross-validation framework |
| Robust Posterior Warm Start (N9.3) | diffnano/design/robust_warm_start.py (AngleSweepScorer, RobustPosteriorWarmStart, ProcessCornerWarmStart) |
Internal | Internal | Verified — worst-case angle/process-corner quantile scoring |
| Dependency | Version |
|---|---|
| Python | 3.10+ |
| PyTorch | 2.12+ |
| diff-surrogate | 0.2.0 |
Sister projects: DiffCFD (differentiable CFD), OpenLithoHub (lithography benchmarking), diff-surrogate (shared surrogate framework).
| Metric | DiffNano (this work) | TorchRDIT (Huang et al., 2024)¹ | Meent (Kim et al., 2024)² | Benchmarking Study (Mansson et al., 2025)³ | Matrix sqrt RCWA (Delft/ASML, 2026)⁴ | GAOT (NeurIPS 2025)⁵ | GINOT (CMAME 2025)⁶ |
|---|---|---|---|---|---|---|---|
| Core method | RCWA (matrix_sqrt + eig_expm + eig) + FDFD + FDTD + Neural Surrogate | R-DIT (eigendecomposition-free) | RCWA (multi-backend) | 9 algorithms on RCWA backend | Matrix square root via exp(P^(1/2)) | Geometry-aware operator transformer | SDF-trunk geometry-informed operator |
| Speedup claim | 10–50x via CNN surrogate (inference only) | Up to 16.2x vs standard RCWA | N/A (framework paper) | Varies by algorithm | Numerically more stable backward vs eig | N/A (surrogate, not solver) | N/A (surrogate, not solver) |
| Robust optimization | Differentiable MC, +31% yield (C5) | No | No | No (nominal only) | No | No | No |
| Fabrication-aware | Hopkins lithography model in autograd | No | No | No | No | No | No |
| GPU backend | PyTorch CUDA/MPS | PyTorch CUDA | JAX / PyTorch / NumPy | CPU (RCWA) | Not specified | PyTorch | PyTorch |
Comparability note: TorchRDIT's 16.2x speedup is measured on eigendecomposition elimination (single-wavelength, periodic structures). DiffNano's 10–50x surrogate speedup covers the full RCWA forward pass but is inference-only and problem-specific. These numbers are not directly comparable — different hardware, problem sizes, and measurement methodology. DiffNano's
matrix_sqrtbackend (default, N2 fix) implements the Delft/ASML matrix square root approach via Denman–Beavers iteration — truly eig-free with notorch.linalg.eigin the autograd graph. The oldereig_expmbackend remains for regression comparison.
References:
- Huang et al., "Eigendecomposition-free inverse design of meta-optics devices," Nanophotonics, 2024. PubMed 38859356
- Kim et al., "Meent: Differentiable Electromagnetic Simulation," arXiv:2406.12904, 2024. arXiv
- Mansson et al., "Benchmarking Optimization Methods for Nanophotonics," Advanced Optical Materials, 2025. DOI:10.1002/adom.202500195
- Matrix Square Root Based Differentiable RCWA, PIER C, vol. 163, 2026 (Delft University of Technology + ASML)
- GAOT: Geometry-Aware Operator Transformer for surrogate modeling. NeurIPS 2025, arXiv:2505.18781.
- GINOT: SDF-trunk geometry-informed neural operator. Computer Methods in Applied Mechanics and Engineering (CMAME), 2025.
- DNOT: Feature-diffusion enhanced neural operator transformer. Engineering with Computers, vol. 42, article 60, 2026.
- DD-DeepONet: Domain decomposition DeepONet. Engineering Applications of Artificial Intelligence, 2026.
- Schwarz Neural Inference: local→global domain decomposition operator learning. arXiv:2504.00510 v2, 2026-02.
- Matrix Square Root RCWA (PIER C 2026). Progress In Electromagnetics Research C, vol. 163, pp. 60–72, 2026 (Delft University of Technology + ASML).
- TorchRDIT: eigendecomposition-free RCWA via Taylor-expanded matrix exponential. Blanes et al., 2024.
- VarRCWA: variable-order Fourier RCWA, 2024+.
- STE quantization for inverse design: arXiv:2407.10273.
- Robust posterior warm start: Advanced Optical Materials, vol. 14, no. 4, 2026.
- FDTD benchmarking methodology: Nature Reviews Materials, 2026-04.
- FDTD cross-validation framework: Journal of Open Source Software, vol. 11, article 8912.
| Feature | DiffNano | Tidy3D v2.10.1 | MEEP v1.32.0 | TorchRDIT | FDTDX (2026) | Ceviche (archived) | meent (2024) |
|---|---|---|---|---|---|---|---|
| RCWA | Yes (eig + matrix_exp backends, lossy + lossless) | No | No | No (R-DIT) | No | No | Yes (multi-backend) |
| FDTD | 2D + 3D | 3D | 3D | No | 3D | 2D | No |
| FDFD | Yes | No | No | No | No | Yes | No |
| Neural Surrogate | Yes (CNN) | No | No | No | No | No | No |
| GPU | PyTorch CUDA/MPS | Cloud GPU (proprietary) | No (CPU, OpenMP) | PyTorch CUDA | JAX/XLA | No (NumPy) | JAX / PyTorch / NumPy |
| Autograd | PyTorch native | Adjoint (JAX) | Adjoint wrapper | PyTorch native | JAX native | HIPS autograd | JAX / PyTorch / NumPy |
| Fabrication-aware | Yes (Hopkins litho) | No | No | No | No | No | No |
| Robust optimization | Yes (differentiable MC) | No | No | No | No | No | No |
| Lossy materials (RCWA) | Yes (complex permittivity, eig + matrix_exp) | — | — | — | — | — | Yes |
| License | Apache 2.0 | LGPL (solver proprietary) | GPL | MIT | Open source | MIT | MIT |
| Status | v0.6, experimental | Production | Production | Research | Research | Unmaintained | Active |
Where DiffNano lags: DiffNano's FDTD does not match MEEP or Tidy3D in feature completeness (PML variants, dispersive materials, subpixel smoothing). Tidy3D and FDTDX likely outperform DiffNano's FDTD in raw simulation speed for 3D problems due to optimized C++/CUDA cores. DiffNano's strength is in its solver diversity under a single differentiable framework and fabrication-aware optimization, not raw solver performance.
Subjective assessment by the author on a 1–5 scale. See table above for factual details.
Under fabrication process variation (σ = 5 nm linewidth perturbation), robust optimization significantly improves manufacturing yield:
| Design | Base Strehl | Mean Strehl (MC, N=100) | Yield (Strehl ≥ threshold) |
|---|---|---|---|
| Nominal | 0.783 | 0.576 | 50% |
| Robust | 0.799 | 0.588 | 81% |
| Delta | +0.016 | +0.012 | +31 percentage points |
The robust design sacrifices negligible peak performance for substantially tighter performance distribution — critical for manufacturability.
Embedding lithography modeling inside the autograd graph (unified) converges faster and achieves lower final loss than decoupled sequential optimization:
| Method | Final Optical Loss | Litho EPE (nm) | Steps |
|---|---|---|---|
| Unified autograd | 1.023 | 4.35 | 200 |
| Decoupled baseline | 1.251 | 5.36 | 200¹ |
¹ Decoupled ran fewer effective iterations due to sequential restart. Both used identical hardware and problem size.
On a quadratic test function (100 steps):
| Strategy | Final Loss |
|---|---|
| Nominal (no uncertainty) | 1.81 |
| C5 Brute-force MC (K=16) | 19.81 |
| C7 Adaptive + curriculum | 2.20 |
Note: The brute-force MC result (19.81) reflects variance from fixed-K sampling on a non-convex landscape — it is not a general indictment of MC methods. The adaptive approach avoids this by dynamically adjusting sample count.
All benchmark data above was generated on the following environment:
Hardware:
- CPU: AMD Ryzen 5 5600G with Radeon Graphics (6 cores)
- RAM: 13 GB DDR4
- GPU: None (CPU-only)
Software:
- OS: Ubuntu 22.04.5 LTS
- Python: 3.10.12
- PyTorch: 2.12.0+cpu
- DiffNano:
0.9.0(current main)
Run the benchmarks:
# Flagship metalens DFM: multi-seed (10 seeds) with Wilcoxon tests
make flagship-a # 10 seeds, full report
make flagship-a-ci # 3 seeds, CI smoke test
# Or directly:
python3 scripts/flagship_metalens_dfm.py # default 10 seeds
python3 scripts/flagship_metalens_dfm.py --seed-sweep 3 # CI smoke test
# Individual benchmarks:
python3 scripts/benchmark_c4.py # C4: Unified vs Decoupled
python3 scripts/benchmark_c5.py # C5: Monte Carlo Robustness
python3 scripts/benchmark_c7.py # C7: Optimization Strategy
# Generate charts for README
python3 scripts/generate_benchmark_charts.pyMethodology:
- C5: 100 Monte Carlo samples with σ = 5 nm per-pixel height perturbation; yield threshold set at median of nominal distribution
- C4: 200 optimization steps, Adam optimizer, identical initialization seed
- C7: 100 steps on quadratic test function, comparing nominal / brute-force MC (K=16) / adaptive curriculum
All test data above was obtained by actually running the scripts on the stated environment. No performance numbers were estimated or extrapolated.
diffnano/
├── solvers/
│ ├── _result.py # SimResult container
│ ├── fdtd2d.py # 2D FDTD (CPML, checkpointing)
│ ├── fdtd3d.py # 3D FDTD
│ ├── rcwa.py # RCWA for periodic structures
│ ├── fdfd2d.py # Frequency-domain dense (GPU-native)
│ ├── fdfd2d_sparse.py # Frequency-domain sparse
│ ├── implicit_diff.py # GMRES matfree + FDFD implicit differentiation
│ ├── litho.py # Hopkins lithography model
│ ├── surrogate.py # CNN-accelerated RCWA
│ ├── backend_diagnostics.py # Per-config accuracy/gradient fidelity for RCWA backends (N8.4)
│ ├── fdtd_benchmark.py # FDTD benchmark suite — triple backward comparison, external cross-validation (N9.2)
│ ├── fab_model.py # Learned fabrication model (U-Net)
│ └── resist.py # Differentiable resist model
├── design/
│ ├── parameterization.py # Density, height map, B-spline
│ ├── projection.py # Heaviside + beta-continuation
│ ├── curvilinear.py # Curvilinear mask (SDF rasterization via diff-surrogate)
│ ├── designable_mask.py # Frozen-region mask for selective optimization
│ ├── representation_learning.py # VAE latent optimization
│ ├── latent_warmstart.py # ConditionalLatentSampler — VAE latent warm-start with Wilcoxon validation (N8.3)
│ ├── quantized.py # STE quantized inverse design — StraightThroughQuantize, BinarySTE, QuantizedOptimizer (N9.1)
│ ├── robust_warm_start.py # Robust posterior warm start — angle sweep, process-corner quantile scoring (N9.3)
│ ├── constraints_shared/ # Cross-domain DFM primitives
│ └── robustness/
│ ├── core.py # MC robust optimization (reparameterization, antithetic)
│ ├── adaptive.py # AdaptiveRobustOptimizer (re-export from diff-surrogate)
│ ├── subspace.py # Multi-axis perturbation (sidewall, thickness, corner)
│ └── corner_opt.py # Deterministic corner-sweep process-window optimization
├── workflows/
│ ├── metalens.py # Metalens inverse design
│ ├── dfm_metalens.py # DFM-native metalens (C4 unified autograd graph)
│ ├── lpa_metalens.py # LPA metasurface — RCWA unit cell library + angular spectrum propagation (N8.2)
│ ├── phc.py # Photonic crystal bandgap
│ ├── waveguide.py # Waveguide bends / converters
│ ├── broadband.py # Multi-wavelength optimization
│ ├── multi_objective.py # Pareto front exploration
│ ├── splitter.py # Beam splitter (RCWA-based EM simulation)
│ └── end_to_end.py # Spec-to-GDSII pipeline
├── utils/
│ └── convergence.py # Hybrid Z-score convergence monitor
├── benchmark/ # Reference designs & metrics
└── export/
└── gds.py # GDS-II export (gdstk)
| Version | Scope | Status |
|---|---|---|
| v0.1 | RCWA solver + metalens workflow | Done |
| v0.2 | 2D FDTD + photonic crystal + FDFD | Done |
| v0.3 | 3D FDTD + adaptive robust optimization | Done |
| v0.4 | Neural surrogate + broadband | Done |
| v0.5 | Learned fabrication model + curvilinear masks | Done |
| v0.6 | Multi-objective Pareto + end-to-end + VAE | Done |
| v0.7 | R-DIT backend (N7.1), Denman-Beavers matrix sqrt + gain layer protection (N7.2), cross-attention RCWA proxy (N7.3), real EM splitter workflow (N7.4) | Done |
| v0.8 | Time-reversal adjoint FDTD (N8.1), LPA metasurface (N8.2), latent warm-start (N8.3), backend diagnostics (N8.4) | Done |
| v0.9 | STE quantized inverse design (N9.1), FDTD benchmark suite (N9.2), robust posterior warm start (N9.3) | Done |
| v1.0 | Full benchmark suite + validation + arXiv paper | Planned |
What it is: A differentiable nanophotonics inverse design toolkit with clean-room FDTD adjoint, RCWA, and LPA — with native DFM/lithography co-design integration.
Where it leads:
- DFM-native co-design: The only open-source EM tool that puts lithography + EM + robustness on a single autograd graph. Most alternatives (Tidy3D, meent, FDTDX) are single-domain — they don't touch lithography at all.
- Time-reversal FDTD adjoint: Memory-efficient adjoint via time-reversal (no need to store all forward fields), enabling gradient-based optimization for larger grids than conventional adjoint methods.
- LPA for large-area metasurfaces: Local Periodic Approximation enables design of metasurfaces far beyond the reach of full-wave RCWA/FDTD, with two-level optimization.
- STE quantized inverse design (N9.1): End-to-end differentiable quantization via straight-through estimator, enabling binary/ternary design parameter spaces within continuous optimization.
- Robust posterior warm start (N9.3): Worst-case angle and process-corner quantile scoring for warm-start candidate selection, improving convergence in multi-scenario design problems.
- FDTD benchmark suite (N9.2): Triple backward-mode comparison framework with external solver cross-validation API, enabling systematic gradient correctness validation.
Where it lags (honest assessment):
- Scale: Single GPU, moderate apertures. 2-4 orders of magnitude behind Tidy3D (cloud GPU FDTD), FDTDX (multi-GPU 3D AD-FDTD), and meent (multi-backend RCWA) in solver speed and problem size.
- Validation: Self-tests + numerical cross-validation against meent RCWA. No experimental or fab validation.
- Maturity: Research prototype. No production EDA integration.
Bottom line: Competitively unique in the DFM co-design niche, but cannot compete on solver scale or speed with dedicated EM tools. Value is in the lithography-aware inverse design workflow, not raw FDTD/RCWA performance.
Apache License 2.0