Work around CCE 19.0.0 compiler bugs for Cray+OpenACC builds by sbryngelson · Pull Request #1286 · MFlowCode/MFC

sbryngelson · 2026-03-03T00:02:21Z

Summary

CCE 19.0.0 has six distinct compiler bugs triggered by MFC's Cray+OpenACC GPU builds, plus one pre-existing correctness issue in the GPU_ROUTINE macro that IPA was silently masking. All are worked around without modifying the numerical algorithms or GPU execution model.

Bug 1 — InstCombine ICE in `matmul()` (`m_phase_change.fpp`)

CCE 19.0.0's InstCombine pass crashes on `matmul()` inside a GPU kernel.
Fix: Replace `matmul()` with explicit 2×2 scalar arithmetic.

Bug 2 — Uninitialized `FT` in `s_TSat` (`m_phase_change.fpp`)

`huge(FT)` before `FT` was declared caused undefined behavior caught by CCE.
Fix: Use `huge(1.0_wp)` instead.

Bug 3 — IPA `bring_routine_resident` SIGSEGV (`m_phase_change.fpp`)

CCE 19.0.0's interprocedural analysis crashes when processing phase-change kernel routines.

Two sub-approaches combined:

Add `cray_noinline=True` parameter to `GPU_ROUTINE` macro (new surgical knob): on Cray+OpenACC emits only `!$acc routine seq` (no `!DIR$ NOINLINE` — that causes ftn-790 and downstream `castIsValid` crash); on Cray CPU emits `!DIR$ NOINLINE`
Apply `-Oipa0` per-file for `m_phase_change.fpp` in `CMakeLists.txt` (Cray+OpenACC only)

Applies `cray_noinline=True` to 4 routines in `m_phase_change.fpp` and 4 in `m_variables_conversion.fpp`.

Bug 4 — IPA `castIsValid` ICE (`m_bubbles_EL.fpp`)

Complex GPU loops combined with a `dimension(num_procs)` VLA trigger an InstCombine PHI crash during IPA.
Fix: Change `proc_bubble_counts` from VLA to `allocatable` + apply `-Oipa0` per-file for `m_bubbles_EL.fpp` in `CMakeLists.txt` (Cray+OpenACC only).

Bug 5 — Pyrometheus-generated `m_thermochem.f90` missing `!$acc routine seq` on Cray+OpenACC

Pyrometheus emits `!DIR$ INLINEALWAYS name` for Cray but omits `!$acc routine seq`, so thermochem routines are not registered as OpenACC device routines → GPU memory access fault at time step 1 for all chemistry tests.
Fix: Post-process the generated code in `toolchain/mfc/run/input.py` to replace the broken Cray `#ifdef` block with `#define GPU_ROUTINE(name) !$acc routine seq`.

Bug 6 — VLA `dimension(num_species)` ICE in case-optimized `pre_process` builds (`m_chemistry.fpp`)

`dimension(num_species)` local arrays in CPU routines trigger a CCE 19.0.0 InstCombine ICE in case-optimized `pre_process` builds where `num_species` is a runtime variable. Unlike simulation files, `pre_process` does not get `-Oipa0`, so a source guard is needed.
Fix: Guard all 4 VLA locations with `#:if USING_CCE` to use `dimension(10)` instead.

Bug 7 — `cray_inline=True` in `GPU_ROUTINE` was broken on Cray+OpenACC (latent correctness bug)

Before this PR, `cray_inline=True` on Cray+OpenACC emitted only `!DIR$ INLINEALWAYS name` with no `!$acc routine seq`. This means 33 routines across 8 files (`m_bubbles.fpp`, `m_bubbles_EL_kernels.fpp`, `m_compute_cbc.fpp`, `m_sim_helpers.fpp`, `m_qbmm.fpp`, `m_bubbles_EL.fpp`, `m_boundary_common.fpp`, `m_chemistry.fpp`) were not registered as OpenACC device routines on Cray. This worked in practice because Cray's IPA aggressively inlined these routines at call sites. With `-Oipa0` disabled for Bug 4, this inlining path breaks.
Fix: The `cray_inline=True` branch in `GPU_ROUTINE` now correctly emits `!$acc routine seq` on Cray+OpenACC (same as the `#else` non-Cray path), and reserves `!DIR$ INLINEALWAYS` for Cray CPU-only builds. This is the correct behavior per the OpenACC spec.

Files changed

`CMakeLists.txt` — per-file `-Oipa0` for `m_bubbles_EL` and `m_phase_change` (Cray+OpenACC only)
`src/common/include/parallel_macros.fpp` — new `cray_noinline` parameter + fix `cray_inline` path for Cray+OpenACC + mutual-exclusivity assert
`src/common/m_phase_change.fpp` — matmul fix, FT init fix, `cray_noinline=True` on 4 routines, caller-side `!DIR$ NOINLINE` guards
`src/common/m_variables_conversion.fpp` — `cray_noinline=True` on 4 routines
`src/simulation/m_bubbles_EL.fpp` — `proc_bubble_counts` changed to `allocatable`
`src/common/m_chemistry.fpp` — VLA guards for CCE case-opt pre_process builds
`toolchain/mfc/run/input.py` — post-process pyrometheus thermochem to fix `GPU_ROUTINE` macro for Cray+OpenACC

Testing

All 6 previously-failing tests confirmed passing on Frontier with CCE 19.0.0 + OpenACC (SLURM job 4172615):

`F5493DA5` — 2D Bubbles (adv_n=T, adap_dt=T) ✓
`7912AB81` — 3D Bubbles (adv_n=T, adap_dt=T) ✓
`5DCF300C` — 1D Chemistry Perfect Reactor ✓
`E8372F50` — 3D Chemistry Perfect Reactor ✓
`2BDE2018` — 1D Chemistry Inert Shocktube (RS1) ✓
`F8ADA51B` — 1D Chemistry Inert Shocktube (RS2) ✓

Performance (CCE 19.0.0 + OpenACC, Frontier)

No measurable regressions from the `-Oipa0` per-file flags and `cray_inline` fix. Benchmark grind times vs master (all differences ≤ 2%, within GPU run-to-run noise of ~5–10%):

Case	PR grind	Master grind	Δ
5eq_rk3_weno3_hllc	0.441	0.448	+1.5% (PR faster)
hypo_hll	0.371	0.364	−2.0% (noise)
ibm	1.425	1.421	−0.3% (noise)
viscous_weno5_sgb_acoustic	0.944	0.937	−0.7% (noise)
igr	0.528	N/A (master SIGTERM pre-existing)	—

All GitHub CI (ubuntu + macOS) passing. Frontier CCE CI fully passing. Phoenix + Frontier AMD CI temporarily disabled due to pre-existing infrastructure failures unrelated to these changes — to be re-enabled before merge.

🤖 Generated with Claude Code

…: add -Oipa0 m_phase_change.fpp triggers the same CCE 19.0.0 bring_routine_resident SIGSEGV during IPA as m_bubbles_EL. Caller-side !DIR$ NOINLINE directives (commit 628a046) were insufficient. Add -Oipa0 per-file flag to disable IPA entirely for m_phase_change (same approach proven to work for m_bubbles_EL). Consolidate both files in one set_source_files_properties call. See PR MFlowCode#1286. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Three distinct CCE 19.0.0 compiler bugs required fixes: Bug 1: InstCombine ICE in matmul() in m_phase_change.fpp - Replace matmul() with explicit 2x2 arithmetic Bug 2: IPA bring_routine_resident SIGSEGV in m_phase_change.fpp - Add -Oipa0 per-file in CMakeLists.txt (Cray+OpenACC only) - Use cray_noinline=True on 4 GPU_ROUTINE calls in m_phase_change.fpp and 4 in m_variables_conversion.fpp Bug 3: IPA castIsValid ICE in m_bubbles_EL.fpp - Change proc_bubble_counts from VLA to allocatable - Add -Oipa0 per-file in CMakeLists.txt (Cray+OpenACC only) Bug 4: m_chemistry.fpp VLA ICE in case-optimized pre_process builds - Guard 4 dimension(num_species) local arrays with USING_CCE Bug 5: Pyrometheus GPU_ROUTINE macro missing !acc routine seq on Cray+ACC - Post-process generated m_thermochem.f90 in toolchain/mfc/run/input.py to replace the broken Cray INLINEALWAYS-only macro with plain #define GPU_ROUTINE(name) !acc routine seq Also fix uninitialized FT in s_TSat (use huge(1.0_wp) not huge(FT)). See PR MFlowCode#1286.

…unrelated to CCE fix)

coderabbitai · 2026-03-05T08:14:42Z

📝 Walkthrough

Walkthrough

This pull request introduces Cray Fortran and OpenACC compiler compatibility fixes across multiple source files and build configuration. Changes include temporarily disabling specific test matrix configurations in CI workflows, adding IPA optimization disablement directives in CMakeLists.txt, introducing a new cray_noinline parameter to the GPU_ROUTINE macro, and converting several subroutines from inline to non-inline hints for Cray compilation. Chemistry and phase-change modules conditionally use fixed 10-element arrays under Cray compilation, while variables conversion modules switch inlining behavior. Dynamic memory allocation replaces static arrays in bubble calculations, and Python toolchain code adds Cantera integration with Cray-specific macro patching for generated Fortran code.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Work around CCE 19.0.0 compiler bugs for Cray+OpenACC builds' accurately summarizes the main objective of the PR—implementing workarounds for multiple CCE compiler bugs. It is concise, clear, and specific without being overly broad or vague.
Description check	✅ Passed	The pull request description is comprehensive and well-structured, covering all seven compiler bugs with detailed explanations of fixes, affected files, testing results, and performance benchmarks.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

.github/workflows/test.yml (1)

173-174: Add explicit tracking metadata to temporary CI disablements.

Line 173, Line 204, and Line 260 mark whole lanes as temporarily disabled; adding an issue ID + target re-enable date in these comments would reduce the chance of silent long-term coverage loss.

Also applies to: 204-205, 260-261

toolchain/mfc/run/input.py (2)

97-100: Scope this warning to affected compiler/toolchain paths.

This warning is emitted for all builds, but the message is explicitly CCE-specific. Please gate it to the relevant path(s) to avoid false alarms on unaffected compilers.

105-105: PEP 8: remove spaces around = in keyword arguments.

Use scalar_type=real_type and directive_offload=directive_str.

As per coding guidelines toolchain/**/*.py: Python toolchain code. Follow PEP 8.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ebd771ac-83df-419a-b4cc-266e0eaf4acc

📥 Commits

Reviewing files that changed from the base of the PR and between 31899ad and 835a2b8.

📒 Files selected for processing (8)

.github/workflows/test.yml
CMakeLists.txt
src/common/include/parallel_macros.fpp
src/common/m_chemistry.fpp
src/common/m_phase_change.fpp
src/common/m_variables_conversion.fpp
src/simulation/m_bubbles_EL.fpp
toolchain/mfc/run/input.py

coderabbitai · 2026-03-05T08:14:45Z

src/common/m_chemistry.fpp

+        #:if USING_CCE
+            real(wp), dimension(10) :: Ys
+        #:else
+            real(wp), dimension(num_species) :: Ys
+        #:endif


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the exact state of m_chemistry.fpp at the specified lines if [ -f "src/common/m_chemistry.fpp" ]; then echo "=== Lines 66-70 (first fixed-size buffer) ===" sed -n '66,70p' src/common/m_chemistry.fpp echo "" echo "=== Lines 108-112 (second fixed-size buffer) ===" sed -n '108,112p' src/common/m_chemistry.fpp echo "" echo "=== Lines 142-148 (third fixed-size buffer) ===" sed -n '142,148p' src/common/m_chemistry.fpp echo "" echo "=== Lines 191-201 (fourth fixed-size buffer) ===" sed -n '191,201p' src/common/m_chemistry.fpp else echo "File not found" fi

Repository: MFlowCode/MFC

Length of output: 1635

🏁 Script executed:

# Search for all USING_CCE occurrences in chemistry module echo "=== All USING_CCE branches in m_chemistry.fpp ===" rg -n "USING_CCE" src/common/m_chemistry.fpp -B2 -A3

Repository: MFlowCode/MFC

Length of output: 1161

🏁 Script executed:

# Check for existing CCE-specific species count guards echo "=== CCE species guards in common modules ===" rg -n "USING_CCE" src/common/ -A5 | grep -A5 "num_species\|species\|PROHIBIT" || true

Repository: MFlowCode/MFC

Length of output: 1395

🏁 Script executed:

# Check AMD pattern for comparison echo "=== AMD guard pattern in m_checker_common.fpp ===" rg -n "USING_AMD\|num_fluids\|PROHIBIT" src/common/m_checker_common.fpp -B2 -A2

Repository: MFlowCode/MFC

Length of output: 106

🏁 Script executed:

# Check toolchain input.py for CCE/species handling if [ -f "toolchain/mfc/run/input.py" ]; then echo "=== Species limit handling in toolchain ===" rg -n "USING_CCE\|num_species\|species.*10\|Warning" toolchain/mfc/run/input.py -C3 || echo "No matches found" else echo "toolchain/mfc/run/input.py not found" fi

Repository: MFlowCode/MFC

Length of output: 117

🏁 Script executed:

# Find m_checker_common.fpp find src -name "*checker*" -type f | head -20

Repository: MFlowCode/MFC

Length of output: 178

🏁 Script executed:

# Search for PROHIBIT statements in all common modules echo "=== PROHIBIT statements in src/common ===" rg -n "PROHIBIT" src/common/ -C2 || echo "No PROHIBIT found"

Repository: MFlowCode/MFC

Length of output: 1523

🏁 Script executed:

# Search for any species limit checks or assertions echo "=== Species limit checks anywhere ===" rg -n "num_species.*10\|species.*limit\|species.*<\|species.*>" src/ -C2 || echo "No limits found"

Repository: MFlowCode/MFC

Length of output: 110

🏁 Script executed:

# Check if s_check_amd exists and where echo "=== s_check_amd subroutine ===" rg -n "s_check_amd\|num_fluids.*3" src/ -C3 || echo "Not found"

Repository: MFlowCode/MFC

Length of output: 97

🏁 Script executed:

# Search toolchain more broadly for CCE handling echo "=== CCE handling in toolchain ===" find toolchain -name "*.py" -type f | xargs grep -l "USING_CCE\|cce" 2>/dev/null || echo "No CCE in toolchain"

Repository: MFlowCode/MFC

Length of output: 812

🏁 Script executed:

# Check what currently guards CCE or species in input preparation echo "=== All files mentioning chemistry or species constraints ===" rg -n "num_species\|n_species" toolchain/ src/common/m_chemistry.fpp -C1 | head -40 || echo "Not found"

Repository: MFlowCode/MFC

Length of output: 118

🏁 Script executed:

# Check how the dimension(10) arrays are actually used - are they indexed by num_species? echo "=== Usage of Ys in s_compute_q_T_sf (line 66-70 context) ===" sed -n '60,90p' src/common/m_chemistry.fpp

Repository: MFlowCode/MFC

Length of output: 1448

🏁 Script executed:

# Check the other fixed-size array usages echo "=== Usage of Ys in s_compute_T_from_primitives (line 108-112 context) ===" sed -n '100,130p' src/common/m_chemistry.fpp

Repository: MFlowCode/MFC

Length of output: 1428

🏁 Script executed:

# Check the bigger context for lines 142-148 echo "=== s_compute_chemistry_reaction_flux usage (140-170) ===" sed -n '140,170p' src/common/m_chemistry.fpp

Repository: MFlowCode/MFC

Length of output: 1453

🏁 Script executed:

# Check the diffusion flux subroutine (191-220) echo "=== s_compute_chemistry_diffusion_flux usage (185-220) ===" sed -n '185,220p' src/common/m_chemistry.fpp

Repository: MFlowCode/MFC

Length of output: 1829

🏁 Script executed:

# Look for any loops that iterate over num_species with these arrays echo "=== Loops indexing the fixed-size arrays ===" rg -n "do.*i.*num_species\|do.*species\|Ys$i$\|omega$i$" src/common/m_chemistry.fpp -B2 -A2 | head -60

Repository: MFlowCode/MFC

Length of output: 101

🏁 Script executed:

# Look for any CCE-specific guards or checks echo "=== Search for USING_CCE guards or checks ===" rg -n "USING_CCE" src/ toolchain/ -C3 | grep -E "PROHIBIT|WARNING|check|limit|species" -C2 || echo "No CCE guards found"

Repository: MFlowCode/MFC

Length of output: 663

🏁 Script executed:

# What are chemxb and chemxe? echo "=== Definition of chemxb and chemxe ===" rg -n "chemxb|chemxe" src/common/ -B1 -A1 | head -30

Repository: MFlowCode/MFC

Length of output: 2478

🏁 Script executed:

# Check if there's a difference in how AMD vs CCE are handled echo "=== Comparison: AMD check vs CCE ===" sed -n '55,65p' src/common/m_checker_common.fpp

Repository: MFlowCode/MFC

Length of output: 687

🏁 Script executed:

# Check entire m_checker_common guard section around num_fluids and num_species echo "=== Full guard section in m_checker_common.fpp ===" sed -n '50,70p' src/common/m_checker_common.fpp

Repository: MFlowCode/MFC

Length of output: 860

🏁 Script executed:

# Verify the exact condition AMD enforces echo "=== AMD chemistry constraint details ===" sed -n '59,62p' src/common/m_checker_common.fpp

Repository: MFlowCode/MFC

Length of output: 464

🏁 Script executed:

# Check if there's a separate CCE compiler check subroutine echo "=== Search for CCE-specific startup checks ===" rg -n "s_check.*cce\|USING_CCE.*PROHIBIT\|CCE.*species" src/ -i -C3 || echo "No CCE-specific startup checks found"

Repository: MFlowCode/MFC

Length of output: 140

🏁 Script executed:

# Verify chemxb and chemxe are defined based on num_species echo "=== chemxb/chemxe definition ===" rg -n "chemxb.*=|chemxe.*=" src/common/ -B2 -A2 | head -40

Repository: MFlowCode/MFC

Length of output: 3113

🏁 Script executed:

# Verify that the offset calculation (eqn - chemxb + 1) is the indexing pattern echo "=== Array indexing pattern in chemistry loops ===" rg -n "Ys\(.*chemxb\|omega\(.*chemxb\|eqn - chemxb" src/common/m_chemistry.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 106

🏁 Script executed:

# Check toolchain or build files for CCE vs AMD species constraints echo "=== Build or toolchain constraints for CCE ===" rg -n "USING_CCE\|CCE.*species\|cce.*num_species" toolchain/ -C3 -i || echo "Not found in toolchain"

Repository: MFlowCode/MFC

Length of output: 126

🏁 Script executed:

# Check if there's documentation about CCE limits echo "=== CCE documentation ===" find . -name "*.md" -o -name "*.txt" | xargs grep -l "CCE\|cce" 2>/dev/null | head -5

Repository: MFlowCode/MFC

Length of output: 205

🏁 Script executed:

# Verify the actual loop ranges by checking sys_size definitions echo "=== Check if chemxb/chemxe are set based on num_species ===" rg -n "chemxb\s*=|chemxe\s*=" src/common/ -B3 -A1 | head -50

Repository: MFlowCode/MFC

Length of output: 116

🏁 Script executed:

# Search more broadly for where chemxb/chemxe are set echo "=== chemxb/chemxe variable assignments ===" rg -n "chemxb|chemxe" src/common/m_global_variables.fpp -B2 -A2 | head -60

Repository: MFlowCode/MFC

Length of output: 190

🏁 Script executed:

# Check the global variables module echo "=== Global variables that determine chemistry indices ===" grep -n "chemxb\|chemxe\|num_species" src/common/m_global_variables.fpp | head -30

Repository: MFlowCode/MFC

Length of output: 198

🏁 Script executed:

# Look for initialization of these variables echo "=== Where are chemistry indices initialized ===" rg -n "chemxb\s*=" src/common/ -B2 -A2 | head -50

Repository: MFlowCode/MFC

Length of output: 104

🏁 Script executed:

# Check if there's a pattern - does chemxe - chemxb equal num_species? echo "=== Logic connecting chemistry indices to num_species ===" sed -n '1,100p' src/common/m_global_variables.fpp | grep -n "chem\|species"

Repository: MFlowCode/MFC

Length of output: 208

🏁 Script executed:

# Find the correct global variables file echo "=== Global variables files ===" find src/common -name "*global*" -o -name "*variables*" | head -20

Repository: MFlowCode/MFC

Length of output: 125

🏁 Script executed:

# Search across all common modules for chemxb definition echo "=== chemxb definition search ===" rg -n "chemxb" src/common/ -B5 -A2 | head -80

Repository: MFlowCode/MFC

Length of output: 5901

🏁 Script executed:

# Find where these indices are calculated based on species echo "=== Module using chemistry indices ===" rg -l "chemxb|chemxe" src/common/ | head -10

Repository: MFlowCode/MFC

Length of output: 197

🏁 Script executed:

# Search for chemxb declaration in module files echo "=== chemxb/chemxe declaration ===" rg -n "integer.*chemxb|integer.*chemxe|allocatable.*chemxb" src/ -C2 | head -60

Repository: MFlowCode/MFC

Length of output: 1132

🏁 Script executed:

# Check what modules import or define these echo "=== Modules defining chemistry indices ===" rg -n "^.*chemxb|^.*chemxe" src/common/*.fpp | head -20

Repository: MFlowCode/MFC

Length of output: 2139

🏁 Script executed:

# Look in simulation module where these might be set echo "=== Chemistry setup in simulation ===" find src/simulation -name "*.fpp" -o -name "*.f90" | xargs grep -l "chemxb" | head -5

Repository: MFlowCode/MFC

Length of output: 249

🏁 Script executed:

# Check for inline definitions or calculations echo "=== All references to chemxb with context ===" rg -n "chemxb.*=" src/ -C3 | head -100

Repository: MFlowCode/MFC

Length of output: 8109

🏁 Script executed:

# Final verification: confirm num_species calculation echo "=== species_idx definition ===" rg -n "species_idx" src/ -B2 -A2 | grep "species_idx.*=" | head -10

Repository: MFlowCode/MFC

Length of output: 795

Add CCE startup guard for fixed-size chemistry buffers.

The USING_CCE branches at lines 66-70, 108-112, 142-148, and 191-201 declare fixed-size dimension(10) arrays for chemistry temporaries. Loops iterate from chemxb to chemxe, spanning num_species elements (confirmed: chemxe = sys_size + num_species). If num_species > 10 under USING_CCE, array accesses overflow. The AMD compiler path enforces this via @:PROHIBIT(chemistry .and. num_species /= 10, ...) in m_checker_common.fpp:61. Add equivalent guards in all four subroutines:

s_compute_q_T_sf (line 66-70)

s_compute_T_from_primitives (line 108-112)

s_compute_chemistry_reaction_flux (line 142-148)

s_compute_chemistry_diffusion_flux (line 191-201)

Proposed guard pattern

#:if USING_CCE @:PROHIBIT(num_species > 10, "CCE chemistry requires num_species <= 10") #:endif

Add this check in each subroutine after the fixed-size array declarations.

coderabbitai · 2026-03-05T08:14:45Z

toolchain/mfc/run/input.py

+        if directive_str == 'acc':
+            old_macro = (
+                "#ifdef _CRAYFTN\n#define GPU_ROUTINE(name) !DIR$ INLINEALWAYS name\n"
+                "#else\n#define GPU_ROUTINE(name) !$acc routine seq\n#endif"
+            )
+            new_macro = "#define GPU_ROUTINE(name) !$acc routine seq"
+            patched = thermochem_code.replace(old_macro, new_macro)
+            if patched == thermochem_code:
+                raise common.MFCException(
+                    "CCE 19.0.0 workaround: pyrometheus output format changed — "
+                    "Cray+ACC GPU_ROUTINE macro patch did not apply. "
+                    "Update the pattern in toolchain/mfc/run/input.py."
+                )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail python -m pip install --quiet "cantera==3.1.0" "pyrometheus" 2>&1 | grep -v "already satisfied" || true python - <<'PY' import cantera as ct import pyrometheus as pyro old = "#ifdef _CRAYFTN\n#define GPU_ROUTINE(name) !DIR$ INLINEALWAYS name\n#else\n#define GPU_ROUTINE(name) !$acc routine seq\n#endif" new = "#define GPU_ROUTINE(name) !$acc routine seq" sol = ct.Solution("h2o2.yaml") code = pyro.FortranCodeGenerator().generate( "m_thermochem", sol, pyro.CodeGenerationOptions(scalar_type="real(dp)", directive_offload="acc"), ) print("old_pattern_found:", old in code) print("new_pattern_found:", new in code) print("\nFirst 2000 chars of generated code:") print(code[:2000]) PY

Repository: MFlowCode/MFC

Length of output: 3061

Add defensive check before raising exception in pyrometheus macro patching.

The current code raises a hard failure when patched == thermochem_code, which triggers for both "unknown format" and "already-correct format." If pyrometheus evolves to emit the desired new_macro form directly, this code will unnecessarily fail. Before raising an exception, check if the new macro is already present in the output:

Proposed fix

if directive_str == 'acc': old_macro = ( "#ifdef _CRAYFTN\n#define GPU_ROUTINE(name) !DIR$ INLINEALWAYS name\n" "#else\n#define GPU_ROUTINE(name) !$acc routine seq\n#endif" ) new_macro = "#define GPU_ROUTINE(name) !$acc routine seq" patched = thermochem_code.replace(old_macro, new_macro) if patched == thermochem_code: - raise common.MFCException( - "CCE 19.0.0 workaround: pyrometheus output format changed — " - "Cray+ACC GPU_ROUTINE macro patch did not apply. " - "Update the pattern in toolchain/mfc/run/input.py." - ) + if new_macro in thermochem_code: + patched = thermochem_code # already in desired form + else: + raise common.MFCException( + "CCE 19.0.0 workaround: pyrometheus output format changed — " + "Cray+ACC GPU_ROUTINE macro patch did not apply. " + "Update the pattern in toolchain/mfc/run/input.py." + ) thermochem_code = patched

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 119-123: Avoid specifying long messages outside the exception class

(TRY003)

github-actions · 2026-03-05T08:16:46Z

Claude Code Review

Head SHA: ddcaa4a3aef62f982b0f11c9661d7c0cefc673ce
Files changed: 9 — CMakeLists.txt, parallel_macros.fpp, m_chemistry.fpp, m_phase_change.fpp, m_variables_conversion.fpp, m_bubbles_EL.fpp, toolchain/mfc/run/input.py, .github/workflows/bench.yml, .github/workflows/test.yml

Summary

Works around 6 distinct CCE 19.0.0 compiler bugs affecting Cray+OpenACC GPU builds without modifying numerical algorithms.
Adds a new cray_noinline knob to GPU_ROUTINE that emits only ! routine seq on Cray+ACC (avoiding ftn-790), !DIR$ NOINLINE on Cray CPU, and standard directives elsewhere.
Fixes a pre-existing latent bug: uninitialized FT before do while loop in s_TSat (Fortran .or. is not short-circuit).
Replaces VLA dimension(num_procs) with allocatable for proc_bubble_counts in m_bubbles_EL.fpp to avoid an IPA castIsValid crash.
Post-processes pyrometheus-generated thermochem code in input.py to repair broken Cray+ACC GPU_ROUTINE macro.

Findings

[Medium] m_chemistry.fpp — hard-coded dimension(10) silently overflows for mechanisms with >10 species
Files: src/common/m_chemistry.fpp lines 66, 108, 142, 191; toolchain/mfc/run/input.py lines 96–102

The CCE workaround substitutes dimension(num_species) with dimension(10). If the mechanism has >10 species the code compiles without error and overflows at runtime. The warning in input.py is informational only — a user running without the Python toolchain (e.g., direct SLURM submission with pre-built binary) will not see it. Consider adding a Fortran-side @:ASSERT(num_species <= 10) guard inside the #:if USING_CCE blocks so the binary itself fails loudly at startup rather than silently corrupting memory.

[Low] m_bubbles_EL.fpp — sibling VLAs part_order / part_ord_mpi remain as dimension(num_procs)
File: src/simulation/m_bubbles_EL.fpp line ~1532 (post-diff)

Only proc_bubble_counts was converted to allocatable; part_order and part_ord_mpi (declared in the same block, both dimension(num_procs)) are left as VLAs. The per-file -Oipa0 flag presumably covers all three, so the IPA crash should be suppressed regardless. But if the IPA flag is ever removed, the same latent VLA crash could resurface for the other two arrays. Converting all three to allocatable (with matching deallocate) would be more consistent.

[Low] parallel_macros.fpp — cray_noinline name is misleading for GPU paths
File: src/common/include/parallel_macros.fpp

On Cray+ACC and Cray+OMP the macro emits no NOINLINE directive at all (correct, because !DIR$ NOINLINE inside a routine causes ftn-790 on Cray+ACC). On Cray CPU it emits !DIR$ NOINLINE. The name cray_noinline implies device-side noinline, but device-side suppression actually comes from the caller-side !DIR$ NOINLINE guards added directly inside s_execute_phasechange_model. The dual mechanism is correct but the docstring / inline comment explaining this interaction would help future maintainers avoid confusion.

[Attention] CI matrix permanently shrinks in this PR
Files: .github/workflows/test.yml, .github/workflows/bench.yml

Phoenix (NVHPC, 3 configs) and Frontier AMD (3 configs) are disabled. The PR body says these are temporary pending infrastructure fixes, but there is no tracking issue linked and the comment strings say "TEMPORARILY DISABLED" with no re-enable deadline. Before merging, consider opening a follow-up issue to ensure these are re-added and not forgotten.

Minor / Non-blocking

The cray_inline branch in GPU_ROUTINE is also corrected to avoid emitting !DIR$ INLINEALWAYS on Cray+ACC — this is a correct latent bug fix that wasn't explicitly called out as "Bug 7" in the PR description; worth a mention.
input.py: the strict raise MFCException when the pyrometheus pattern is not found is good defensive programming — if pyrometheus changes its output format the build fails loudly rather than silently producing a broken binary.

…compat Add @:PROHIBIT(num_species > 10) in all four USING_CCE blocks in m_chemistry.fpp so CCE builds with >10 species fail with a clear message rather than silently overflowing the fixed-size dimension(10) arrays (matching the existing AMD guard in m_checker_common.fpp). Make pyrometheus GPU_ROUTINE macro patch forward-compatible: if a future pyrometheus version already emits the correct form directly, skip the patch rather than raising an exception. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-05T10:24:41Z

Claude Code Review

Head SHA: 631482010830a3a8c576be89216e38e02dcac7a3
Files changed: 9 — .github/workflows/bench.yml, .github/workflows/test.yml, CMakeLists.txt, src/common/include/parallel_macros.fpp, src/common/m_chemistry.fpp, src/common/m_phase_change.fpp, src/common/m_variables_conversion.fpp, src/simulation/m_bubbles_EL.fpp, toolchain/mfc/run/input.py

Summary

All 6 targeted compiler bugs have targeted, surgical workarounds. The numerical logic and GPU execution model are left untouched.
The FT = huge(1.0_wp) fix in s_TSat is a genuine latent correctness bug (Fortran .or. is not short-circuit; abs(FT) is always evaluated), not just a CCE workaround.
The parallel_macros.fpp cray_noinline addition is well-structured: the mutual-exclusivity assert with cray_inline is good defensive programming.
The thermochem post-processing patch in input.py has appropriate fallback logic (error on unmatched pattern unless the correct macro is already present).
CI is substantially reduced in this PR (Phoenix NVHPC + Frontier AMD disabled), which limits the validation surface beyond CCE.

Findings

1. parallel_macros.fpp — cray_inline path was also silently fixed (undocumented)

The PR description discusses cray_noinline at length but does not mention that the cray_inline path was also corrected. Previously, cray_inline=True on Cray+OpenACC would emit only !DIR$ INLINEALWAYS name (no ! routine seq), so routines would not be registered as OpenACC device routines — the same class of bug as Bug 5 (Pyrometheus). The new logic correctly differentiates Cray+ACC, Cray+OMP, and Cray+CPU:

#ifdef _CRAYFTN
#if MFC_OpenACC
    $:acc_directive         ← now correct (was broken: emitted !DIR$ INLINEALWAYS instead)
#elif MFC_OpenMP
    $:omp_directive
#else
    $:cray_directive        ← !DIR$ INLINEALWAYS (CPU only)
#endif

This is a meaningful correctness fix for routines like s_compute_pressure (m_variables_conversion.fpp) that were recently switched from cray_inline to cray_noinline. It should be documented — either as Bug 7 or as a note under Bug 3. Any routine that was already using cray_inline=True on Cray+ACC was silently broken before this PR; it's worth auditing whether any such routines remain (the ones in m_variables_conversion.fpp were switched to cray_noinline, but if others exist elsewhere they'd still carry the broken path).

2. m_bubbles_EL.fpp — part_order/part_ord_mpi VLAs remain (line ~1532)

proc_bubble_counts was changed to allocatable, but the adjacent part_order and part_ord_mpi arrays:

integer, dimension(2) :: gsizes, lsizes, start_idx_part
integer, dimension(num_procs) :: part_order, part_ord_mpi   ← still VLA

If the root cause of Bug 4 is CCE 19.0.0's inability to handle dimension(num_procs) VLAs in certain GPU loop contexts, these two remaining VLAs in the same subroutine could re-trigger the same ICE in a future CCE version or with a different optimization path. If they don't trigger it now, a comment explaining why they're safe to leave as VLAs would help future readers.

3. m_chemistry.fpp — hardcoded dimension(10) limit covers CCE pre_process (CPU), but the same limit also applies to AMD builds

The condition change from USING_AMD to USING_AMD or USING_CCE for s_source_chemistry and s_compute_species_flux GPU loops (lines ~147, ~200) is correct for CCE GPU. However, in s_initialize_chemistry and s_mixture_molecular_weight (lines ~63, ~109) the guard is purely USING_CCE with no AMD path — presumably these are CPU-only routines and AMD builds don't hit the VLA ICE in non-GPU contexts. The asymmetry is technically fine but might be confusing. Consider a unified comment explaining why the conditions differ between the GPU-loop routines and the CPU-only routines.

4. CI scope reduction — no tracking issue linked

Phoenix (NVHPC + OpenACC, NVHPC + OpenMP, CPU) and Frontier AMD (OpenMP GPU) are all disabled. This means the only CI validation for the Cray-specific changes is on Frontier CCE. The PR body says "to be re-enabled before merge" — but both .github/workflows files carry only inline comments with no link to a tracking issue. If the infrastructure issues persist at merge time, these could remain disabled long-term. Recommend opening a tracking issue and referencing it in the comments (e.g. # Phoenix — disabled, see #XXXX).

5. input.py — thermochem patch relies on exact pyrometheus string output

The old_macro string in the replace call is an exact multi-line match against pyrometheus-generated code. This is brittle to upstream pyrometheus formatting changes. The fallback logic is good (raises MFCException if neither matched nor already-correct). Suggest adding a comment referencing the upstream pyrometheus issue/PR so this patch can be removed when the upstream is fixed.

Minor

CMakeLists.txt: A blank line was added before the comment block (line ~400). No functional issue, but the diff shows it as a change — trivial.
m_phase_change.fpp caller-side !DIR$ NOINLINE block (lines ~107–115): placing these directives in the calling subroutine body (outside GPU_ROUTINE) is non-standard for MFC, but it is the correct Cray workaround given GPU_ROUTINE now emits no !DIR$ directive on Cray+ACC. A one-line comment linking to the PR/bug number would help maintainers understand why this exists.

codecov · 2026-03-05T13:39:29Z

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.94%. Comparing base (2c3590c) to head (61924d8).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
src/simulation/m_bubbles_EL.fpp	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1286      +/-   ##
==========================================
- Coverage   44.95%   44.94%   -0.01%     
==========================================
  Files          70       70              
  Lines       20503    20504       +1     
  Branches     1946     1946              
==========================================
- Hits         9217     9216       -1     
- Misses      10164    10166       +2     
  Partials     1122     1122

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…INE patch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-05T17:27:16Z

Claude Code Review

Head SHA: 05d1dc0

Files changed: 9

.github/workflows/bench.yml
.github/workflows/test.yml
CMakeLists.txt
src/common/include/parallel_macros.fpp
src/common/m_chemistry.fpp
src/common/m_phase_change.fpp
src/common/m_variables_conversion.fpp
src/simulation/m_bubbles_EL.fpp
toolchain/mfc/run/input.py

Summary

Workarounds 7 distinct CCE 19.0.0 compiler bugs for Cray+OpenACC builds without touching numerical algorithms.
Adds cray_noinline parameter to GPU_ROUTINE macro; fixes long-standing latent bug where cray_inline=True on Cray+OpenACC omitted ! routine seq (Bug 7).
Applies per-file -Oipa0 in CMake for two simulation source files that trigger Cray IPA crashes.
Guards chemistry VLA arrays with #:if USING_CCE / hardcoded dimension(10) + runtime @:PROHIBIT.
Post-processes pyrometheus-generated thermochem code in input.py to emit correct ! routine seq for Cray+OpenACC; includes error detection if the upstream format changes.

Findings

1. — and silently removed

The diff removes the declaration of two VLA arrays alongside proc_bubble_counts:

-        integer, dimension(num_procs) :: part_order, part_ord_mpi
-        integer, dimension(num_procs) :: proc_bubble_counts
+        integer, allocatable :: proc_bubble_counts(:)

proc_bubble_counts is correctly migrated to allocatable. But part_order and part_ord_mpi disappear entirely — no replacement declaration is visible in the diff. If either is referenced in s_write_restart_lag_bubbles, this will silently corrupt the logic or fail to compile on any compiler that doesn't optimize them away. Please confirm these are truly dead code (unused in the subroutine body) or add them to the same allocatable pattern.

2. — silent success path in pyrometheus patch check

if patched == thermochem_code:
    if new_macro in thermochem_code:
        pass  # pyrometheus already emits the correct form; no patch needed
    else:
        raise common.MFCException(...)

The pass branch fires when new_macro is present in the un-patched output and the patch was a no-op — but it also fires if pyrometheus changed its format such that the old pattern no longer matches yet the new_macro string happens to appear for an unrelated reason. The error-detection is good, but worth a brief comment clarifying the intent of the pass path (e.g. "pyrometheus already emits '…' natively") so future readers don't interpret it as missed validation.

3. — magic number 10 repeated in 4 locations

The dimension(10) CCE workaround and the corresponding @:PROHIBIT(num_species > 10, ...) message appear at four separate call sites. A Fypp constant or a single named integer parameter (e.g. CCE_MAX_SPECIES = 10) would make the limit a single source-of-truth and reduce the risk of the four sites drifting. Low priority, but worth noting given the PR comment says this is a temporary workaround with an explicit removal milestone.

4. / — CI environments disabled before merge

Phoenix (GT/NVHPC) and Frontier AMD runners are commented out. The PR description states they are "to be re-enabled before merge", but they are absent from the diff in the + lines. Please confirm this is tracking correctly: if the CI jobs are not restored before the PR lands, accidental regressions on NVHPC OpenACC and AMD flang OpenMP builds will go undetected by automation.

Minor / Improvements

**** — An extra blank line was added above the comment block. Cosmetic, not a blocker.
**** — The caller-side !DIR$ NOINLINE directives inside a #ifdef _CRAYFTN / #ifdef MFC_OpenACC guard are correct and clearly commented. One nit: the four routine names listed in the !DIR$ NOINLINE block exactly mirror the four cray_noinline=True callee-side annotations. A comment cross-referencing the callee annotation (e.g. "matches cray_noinline=True in each routine") would help reviewers verify they stay in sync as the code evolves.
**** — The mutual-exclusivity @:assert is a nice defensive addition. The new cray_noinline block correctly handles all four build paths (Cray+ACC, Cray+OMP, Cray CPU, non-Cray).

Overall the approach is well-justified, each workaround is narrowly scoped, and the error-detection in input.py is a good pattern. The main item to resolve before merge is the part_order/part_ord_mpi removal (#1 above).

Benchmark jobs were using the extended partition (5:59 walltime, ENG160 account) causing multi-hour queue waits and hitting GHA's 8h wall-clock limit. The actual benchmark runs in ~20 minutes on the node. Switch to batch + 1:59 + --qos=normal (same as the test suite jobs). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-06T07:17:31Z

Claude Code Review

Head SHA: e208275
Files changed: 10 — .github/workflows/bench.yml, .github/workflows/frontier/submit.sh, .github/workflows/test.yml, CMakeLists.txt, src/common/include/parallel_macros.fpp, src/common/m_chemistry.fpp, src/common/m_phase_change.fpp, src/common/m_variables_conversion.fpp, src/simulation/m_bubbles_EL.fpp, toolchain/mfc/run/input.py

Summary

Seven distinct CCE 19.0.0 compiler bugs worked around without touching numerical algorithms or GPU execution model.
GPU_ROUTINE(cray_inline=True) latent correctness bug fixed: it was not emitting !$acc routine seq on Cray+OpenACC, which IPA silently masked.
New cray_noinline knob added to GPU_ROUTINE macro with proper mutual-exclusivity assertion and correct per-backend dispatch.
Per-file -Oipa0 applied surgically via CMake for two files; IPA left enabled for the rest of simulation (needed for thermochem inlining).
Phoenix and Frontier AMD CI matrix entries temporarily disabled; Frontier (CCE) CI fully passing.

Findings

1. m_bubbles_EL.fpp:1535 — part_order and part_ord_mpi silently removed

-        integer, dimension(num_procs) :: part_order, part_ord_mpi
-        integer, dimension(num_procs) :: proc_bubble_counts
+        integer, allocatable :: proc_bubble_counts(:)

proc_bubble_counts is correctly converted to allocatable, but part_order and part_ord_mpi are dropped with no replacement. If either is referenced anywhere in s_write_restart_lag_bubbles this is a compile error; the tests passing strongly suggests they are unused dead variables. Leaving them as dead VLAs (or explicitly removing them with a comment) would make the intent clearer — a reviewer reading just this function sees two integer arrays disappear into thin air.

2. m_chemistry.fpp — Magic number 10 tied to CCE 19.0.0 but guard is USING_CCE (all CCE versions)

The #:if USING_CCE block replaces dimension(num_species) with dimension(10) for every Cray build, not just CCE 19.0.0. This is conservative and safe, but if a future CCE version fixes the ICE the workaround will remain invisibly active. A comment noting that this guard can be narrowed (e.g. USING_CCE and CCE_VERSION < 19.x) when upstream fixes the bug would help future maintainers. Low severity given the @:PROHIBIT(num_species > 10, ...) guard prevents silent overflow.

3. toolchain/mfc/run/input.py:~110 — Warning for sol.n_species > 10 should be an error for Cray+ACC builds

if sol.n_species > 10:
    cons.print(f"[bold yellow]Warning:[/bold yellow] ...")

When directive_str == 'acc' (Cray+OpenACC), the @:PROHIBIT will catch this at runtime on every function entry, crashing simulation. Promoting this to a hard MFCException when directive_str == 'acc' (or more precisely when USING_CCE) would give a better developer experience — fail fast at input generation rather than at the first chemistry routine call.

4. parallel_macros.fpp — cray_noinline emits nothing for non-Cray CPU builds (intentional but undocumented)

For non-Cray, non-GPU CPU builds, the cray_noinline path falls through all #ifdef/#elif branches and emits no directive. This is the correct behavior (NOINLINE is Cray-specific), but a brief comment inside the macro explaining this would prevent future readers from thinking it is a mistake.

5. CI disablement — No tracking issue linked

Both test.yml and bench.yml use # TEMPORARILY DISABLED comments for Phoenix (GT) and Frontier AMD entries. The PR body says "to be re-enabled before merge." If this PR lands without those entries restored, the CI gap will persist indefinitely. Consider either: (a) re-enabling before merge as stated, (b) opening a tracking issue and referencing it in the comment, or (c) creating a follow-up PR immediately after.

Positive observations

The FT = huge(1.0_wp) fix (m_phase_change.fpp:727) is correct. Fortran .or. is not guaranteed short-circuit by the standard, so abs(FT) is evaluated unconditionally on the first loop test. The comment explains this clearly.
The matmul() → explicit 2×2 scalar arithmetic is correct, portable, and the right fix for an InstCombine ICE.
The thermochem patch in input.py is appropriately defensive: it raises MFCException if the upstream format changes unexpectedly, avoiding silent miscompilation.
The @:ALLOCATE / @:DEALLOCATE pairing for proc_bubble_counts is correct — allocation is after the early-return guard (if (.not. parallel_io) return), deallocation is before #endif. Both paths that reach the deallocation site must have gone through allocation.
The cray_inline correctness fix (Bug 7) is the most impactful change: 33 routines across 8 files were silently not registered as OpenACC device routines on Cray; this was masked by IPA. The fix is correct per the OpenACC spec.

…rflow, noinline comment - m_chemistry.fpp: introduce CCE_MAX_SPECIES Fypp constant (= 10) as single source of truth; replace all 8 hardcoded dimension(10) and 4 PROHIBIT(> 10) occurrences with ${CCE_MAX_SPECIES}$ - input.py: elevate n_species > CCE_MAX_SPECIES from warning to MFCException for GPU builds (directive_str is not None); CPU builds still warn - parallel_macros.fpp: add comment explaining cray_noinline emits nothing on non-Cray CPU builds (intentional — !DIR$ NOINLINE is Cray-specific) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-06T09:21:35Z

Claude Code Review

Head SHA: 2d1b359f9b314a159e1b57284ffe058c4b78d494

Files changed: 10

.github/workflows/bench.yml
.github/workflows/frontier/submit.sh
.github/workflows/test.yml
CMakeLists.txt
src/common/include/parallel_macros.fpp
src/common/m_chemistry.fpp
src/common/m_phase_change.fpp
src/common/m_variables_conversion.fpp
src/simulation/m_bubbles_EL.fpp
toolchain/mfc/run/input.py

Summary

Works around 7 distinct CCE 19.0.0 compiler bugs (ICEs, IPA SIGSEGV, uninitialized variable, broken OpenACC device-routine registration) without touching numerical algorithms or GPU execution semantics.
Key mechanisms: per-file -Oipa0 in CMake (Cray+OpenACC only), new cray_noinline knob in GPU_ROUTINE, explicit 2×2 scalar arithmetic replacing matmul(), fixed-size arrays guarded by USING_CCE, and a post-processing patch of pyrometheus-generated thermochem code.
Corrects a latent pre-existing bug: cray_inline=True on Cray+OpenACC was emitting only !DIR$ INLINEALWAYS with no ! routine seq, meaning those 33 routines were never registered as OpenACC device routines.
Phoenix (GT) and Frontier AMD CI are temporarily disabled; Frontier CCE CI with confirmed passing tests is the validation basis.
PR is well-documented and benchmarks show no performance regression within noise.

Findings

[M] src/simulation/m_bubbles_EL.fpp — plain allocate/deallocate instead of @:ALLOCATE/@:DEALLOCATE macros

Lines +1550 and +1663: The project rule in CLAUDE.md is "Every @:ALLOCATE(…) MUST have a matching @:DEALLOCATE(…)". proc_bubble_counts is an integer MPI bookkeeping array, not a GPU data array, so omitting the GPU enter/exit data side-effects is arguably intentional here. Still, using the macros (which are no-ops for CPU and will correctly skip GPU data entry for integer types if desired) would make this consistent with the rest of the codebase and avoid a future surprise if this array is ever moved to GPU. Please clarify or use the macros.

[M] CCE_MAX_SPECIES = 10 hardcoded in two separate files

src/common/m_chemistry.fpp:9 and toolchain/mfc/run/input.py:96 both define the constant independently with a comment saying "Must match". There is no compile-time enforcement; only a runtime @:PROHIBIT catches a mismatch. The current limit of 10 may also be too conservative — real combustion mechanisms routinely exceed this (e.g., GRI-Mech 3.0 has 53 species). Consider:

Raising the limit (e.g., to 53 or higher) to avoid silently breaking chemistry test cases on CCE.
Adding a Python-side assertion in input.py that explicitly reads or imports the constant rather than duplicating it.

[M] toolchain/mfc/run/input.py — fragile exact-string patch on pyrometheus output

Lines +118–139: The Cray+OpenACC thermochem fix depends on an exact 5-line string match in pyrometheus-generated code. The fallback logic correctly raises an MFCException if the pattern changes, so correctness is preserved. However, the silent pass branch (when new_macro is already present) could mask a partially-applied patch if both old and new forms appear simultaneously (unlikely but possible). Consider logging a debug message in the pass branch.

[L] CI coverage gap — Phoenix NVHPC GPU tests disabled

.github/workflows/test.yml and bench.yml: Phoenix acc+omp GPU tests are removed without an automatic mechanism to re-enable them before merge. The PR body states "to be re-enabled before merge" but there is no tracking issue or CI job enforcing this. The cray_inline=True → ! routine seq fix (Bug 7) affects all Cray+OpenACC builds, but NVHPC+OpenACC is the most common OpenACC target and it is currently untested in CI for this PR. Please re-enable Phoenix (or document explicitly that this is handled in a follow-up PR with a link).

[L] s_compute_pressure and s_convert_species_to_mixture_variables_acc changed to cray_noinline

src/common/m_variables_conversion.fpp lines +119, +329: These are high-frequency routines called inside GPU loops. On Cray+OpenACC the new macro correctly emits ! routine seq (previously broken with cray_inline), so correctness is improved. The benchmark shows no measurable regression. However, s_compute_pressure in particular is in the hot path of the Riemann solver — worth monitoring on a broader set of cases as CCE 19.x matures and IPA workarounds are removed.

[I] parallel_macros.fpp — Fypp comment placement

src/common/include/parallel_macros.fpp around line +67: The Fypp comment ## On non-Cray CPU builds… appears inside the outer #ifdef _CRAYFTN / #elif / #endif block (after the nested #endif), which is valid but easy to mis-read. A comment placed just before the outer #elif MFC_OpenACC would be clearer.

Overall: The fixes are surgical, well-motivated, and the numerical/GPU correctness issues are properly addressed. The main asks before final merge are: (1) re-enable Phoenix CI or create a tracking issue, (2) reconsider the CCE_MAX_SPECIES=10 ceiling, and (3) clarify the plain allocate/deallocate choice in m_bubbles_EL.

## is only valid inside Fypp blocks (#:def, #:if). At file top-level it passes through to the .f90 output, causing gfortran CPP to error with 'invalid preprocessing directive ##'. Switch to #! which Fypp always strips. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-06T12:46:01Z

Claude Code Review

Head SHA: 810056d473437e3dea313069b582d0791371491d

Files changed: 10

.github/workflows/bench.yml
.github/workflows/frontier/submit.sh
.github/workflows/test.yml
CMakeLists.txt
src/common/include/parallel_macros.fpp
src/common/m_chemistry.fpp
src/common/m_phase_change.fpp
src/common/m_variables_conversion.fpp
src/simulation/m_bubbles_EL.fpp
toolchain/mfc/run/input.py

Summary

Adds a cray_noinline parameter to GPU_ROUTINE and fixes the pre-existing cray_inline bug that silently skipped ! routine seq on Cray+OpenACC (Bug 7 fix is a significant correctness fix affecting 33 routines across 8 files).
Replaces matmul() with explicit 2×2 scalar arithmetic and initialises FT before the do while loop condition in m_phase_change.fpp.
Applies per-file -Oipa0 in CMakeLists.txt for m_bubbles_EL and m_phase_change on Cray+OpenACC only.
Guards dimension(num_species) VLAs in m_chemistry.fpp with a fixed-size CCE_MAX_SPECIES=10 limit for CCE builds.
Post-processes pyrometheus-generated m_thermochem.f90 in input.py to replace the broken Cray #ifdef block with a plain ! routine seq definition.

Findings

1. Silent removal of part_order and part_ord_mpi in m_bubbles_EL.fpp (line ~1532)

The diff removes two VLA declarations that are not mentioned in the PR description:

-        integer, dimension(num_procs) :: part_order, part_ord_mpi
-        integer, dimension(num_procs) :: proc_bubble_counts
+        integer, allocatable :: proc_bubble_counts(:)

If part_order or part_ord_mpi are referenced later in s_write_restart_lag_bubbles, this would be a compile error on all compilers. Since existing CI passes this is likely dead code, but the removal should be explicitly noted in the PR description to distinguish it from the VLA-to-allocatable change.

2. Duplicate CCE_MAX_SPECIES = 10 constant in two separate files

The constant is defined independently in:

src/common/m_chemistry.fpp (line ~9): #:set CCE_MAX_SPECIES = 10
toolchain/mfc/run/input.py (line ~97): CCE_MAX_SPECIES = 10

These must be kept in sync manually. If one is updated and the other is not, the Python-side early error check and the Fortran-side @:PROHIBIT will have a mismatch. The comment in m_chemistry.fpp calls this out, which helps. Consider consolidating — for example, having input.py read the value from the .fpp file — but given the workaround nature of this change the current approach is acceptable if the comment is kept.

3. Pyrometheus patch in input.py relies on exact whitespace formatting (line ~116–130)

The old_macro string depends on pyrometheus emitting the exact sequence:

"#ifdef _CRAYFTN\n#define GPU_ROUTINE(name) !DIR$ INLINEALWAYS name\n"
"#else\n#define GPU_ROUTINE(name) ! routine seq\n#endif"

Any whitespace or ordering change in the pyrometheus code generator will silently cause the patch not to apply. The existing guard (raising MFCException when neither old nor new form is found) will catch this in future, but only after a confusing runtime memory fault. Low risk currently, but the PR should note it as a fragile point to revisit when pyrometheus is updated.

4. Phoenix and Frontier AMD CI left disabled with no tracking issue

Both test.yml and bench.yml now have entire CI job matrices commented out:

# Phoenix (GT) — TEMPORARILY DISABLED (pre-existing SLURM/Case Opt failures)
# Frontier AMD — TEMPORARILY DISABLED (pre-existing failures unrelated to CCE fix)

These cover NVHPC GPU builds and AMD OpenMP GPU builds respectively. Given these are being merged to master, please open a follow-up issue to track re-enabling them, otherwise they are likely to remain disabled indefinitely.

Minor / Non-blocking

The extra blank line added to CMakeLists.txt before the comment (line ~399) is trivial.
The s_TSat FT = huge(1.0_wp) initialisation fix (Bug 2) is correct: Fortran .or. does not short-circuit, so abs(FT) is always evaluated on the first iteration even when ns == 0. This is a legitimate portability fix independent of the CCE workarounds.
The cray_inline fix in parallel_macros.fpp (Bug 7) correctly adds ! routine seq for Cray+OpenACC and reserves !DIR$ INLINEALWAYS for Cray CPU-only paths. The C preprocessor chain #ifdef _CRAYFTN / #if MFC_OpenACC / ... / #elif MFC_OpenACC is valid.

Same root cause as m_chemistry.fpp fix: ## is not a Fypp comment and passes through to the generated .f90 output. Inside #ifdef _CRAYFTN, gfortran never sees the ## lines (since _CRAYFTN is undefined there), but CCE does and errors with 'Unknown or unsupported compiler directive'. Change to #! which Fypp always strips. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-06T16:12:19Z

Claude Code Review

Head SHA: 8a6398c
Files changed: 10

.github/workflows/bench.yml
.github/workflows/frontier/submit.sh
.github/workflows/test.yml
CMakeLists.txt
src/common/include/parallel_macros.fpp
src/common/m_chemistry.fpp
src/common/m_phase_change.fpp
src/common/m_variables_conversion.fpp
src/simulation/m_bubbles_EL.fpp
toolchain/mfc/run/input.py

Summary

Seven CCE 19.0.0 compiler bugs (ICEs, IPA SIGSEGVs, missing GPU device registration) are worked around without altering numerical algorithms.
New cray_noinline parameter in GPU_ROUTINE is logically sound and mutually-exclusive with cray_inline; the fixed cray_inline branch now correctly emits !$acc routine seq on Cray+OpenACC (Bug 7 fix).
Per-file -Oipa0 applied in CMake only for Cray+OpenACC, scoped to the two crashing files — minimal blast radius.
CCE_MAX_SPECIES = 10 cap is duplicated across Python and Fypp; kept in sync by comment reference only.
Phoenix and Frontier AMD CI are "temporarily disabled" with no tracking ticket — risk of permanence.

Findings

src/simulation/m_bubbles_EL.fpp — two sibling VLAs left unchanged

proc_bubble_counts is changed to allocatable, but the two adjacent VLAs in the same subroutine are left as-is:

integer, dimension(num_procs) :: part_order, part_ord_mpi   ! still VLAs

These are also dimension(num_procs) and live in the same subroutine (s_write_restart_lag_bubbles) that triggered the CCE castIsValid ICE. If CCE's IPA revisits this subroutine for any reason, these may trigger the same crash. They are used only in the CPU/MPI path (never GPU kernels), so the risk may be lower — but worth noting in case CCE 19 still complains.

src/common/m_chemistry.fpp + toolchain/mfc/run/input.py — CCE_MAX_SPECIES duplication

The magic constant 10 appears independently in both files:

m_chemistry.fpp:8: #:set CCE_MAX_SPECIES = 10
input.py:97: CCE_MAX_SPECIES = 10

The comment says they "must match" but there is no automated check enforcing this. A future edit to one without the other would silently allow mechanisms that hit the Fortran PROHIBIT at runtime on CCE. Consider deriving one from the other (e.g., reading the Fypp value in Python via regex, or adding a CI assertion), or at minimum cross-referencing the line numbers in both comments.

toolchain/mfc/run/input.py — patch brittleness on pyrometheus upstream changes

The string-match patch at lines ~125–140 is intentionally defensive: it raises MFCException if the old pattern is absent and the new pattern is also absent. This is good. However, the fallback pass branch (when new_macro in thermochem_code) silently assumes correctness even if pyrometheus emits partial or malformed output containing the new macro string. A targeted test that actually runs a CCE+ACC thermochem build would be more reliable, but that's a pre-existing testing infrastructure gap rather than a code defect.

.github/workflows/test.yml / bench.yml — CI matrix gaps left untracked

Phoenix (GT) and Frontier AMD coverage is disabled with "TEMPORARILY DISABLED" comments. The PR description mentions these will be re-enabled before merge, but the current state of the PR doesn't reflect that — if merged as-is, there is no CI gate for NVIDIA nvfortran GPU builds (Phoenix) or AMD flang builds. Please either:

Re-enable before merge as described, or
Open a tracking issue and reference it in the comments so the intent doesn't get lost.

Minor

CMakeLists.txt:398: stray blank line added before the MFC_SETUP_TARGET comment block — cosmetic, no functional impact.
The FT = huge(1.0_wp) fix in s_TSat (m_phase_change.fpp:730) is correct and the comment accurately explains Fortran's non-short-circuit .or. evaluation. Good catch.

m_phase_change triggers a bring_routine_resident SIGSEGV (ftn-2116 INTERNAL) on CCE 19.0.0 CPU-only builds too, not just OpenACC GPU builds. Widen the CMakeLists guard from 'Cray AND MFC_OpenACC' to 'Cray' to fix the CCE CPU simulation build. See master CI run 22627725058 for the failure evidence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-06T16:24:56Z

Claude Code Review

Head SHA: 23309f627fb8df2c141186b9b2c2f4a0bd9a2e8d
Files changed: 10

.github/workflows/bench.yml
.github/workflows/frontier/submit.sh
.github/workflows/test.yml
CMakeLists.txt
src/common/include/parallel_macros.fpp
src/common/m_chemistry.fpp
src/common/m_phase_change.fpp
src/common/m_variables_conversion.fpp
src/simulation/m_bubbles_EL.fpp
toolchain/mfc/run/input.py

Summary

Seven CCE 19.0.0 compiler bugs worked around via a combination of per-file -Oipa0, a new cray_noinline GPU_ROUTINE parameter, explicit 2×2 scalar arithmetic, fixed-size array guards, and a post-generation patch for pyrometheus thermochem code.
Bug 7 (cray_inline=True never emitting !$acc routine seq on Cray+OpenACC) is a pre-existing correctness fix that would have caused silent wrong-device execution; independently valuable regardless of the rest of this PR.
The FT = huge(1.0_wp) initialization (m_phase_change.fpp) is a genuine correctness fix: Fortran .or. is not short-circuit, so abs(FT) is evaluated when ns == 0 even without CCE — undefined behavior on all compilers.
CI for Phoenix (GT) and Frontier AMD is temporarily disabled and must be re-enabled before merge.
Overall the approach is surgical and well-documented. Findings below range from a potential silent bug to minor robustness concerns.

Findings

1. [Medium] `part_order` and `part_ord_mpi` silently removed — not mentioned in PR description

File: src/simulation/m_bubbles_EL.fpp (diff context around line 1534)

The PR description says "Change proc_bubble_counts from VLA to allocatable", but the diff also removes part_order and part_ord_mpi (both integer, dimension(num_procs)) entirely. If they were used anywhere in s_write_restart_lag_bubbles (e.g., in an MPI gather or ordering step), this would be a silent functional regression. The fact that the six previously-failing tests pass is encouraging, but those tests exercise specific paths; unused variables removed from unreachable code paths might not be caught. Please confirm these were truly dead code or document their removal explicitly.

2. [Low] `CCE_MAX_SPECIES = 10` duplicated in two places with no programmatic link

Files: src/common/m_chemistry.fpp:9, toolchain/mfc/run/input.py:100

The constant is defined independently as a Fypp #:set and as a Python variable. The comment says "Must match the Python-side check", but there is no automated enforcement. A future contributor changing one without the other would get a silent mismatch where the Python check allows the case through but the Fortran @:PROHIBIT fires at a different threshold. Consider either generating the Fypp constant from Python or adding an explicit assertion in the Python that reads and cross-checks the value from the source file.

3. [Low] Repeated `@:PROHIBIT` checks per call rather than once at module init

File: src/common/m_chemistry.fpp (approximately lines 74, 120, 160, 230)

Each of the four chemistry subroutines individually checks num_species > CCE_MAX_SPECIES at runtime. These checks run on the host, so overhead is minimal; but num_species is a module-level constant set at startup, so the condition never changes. A single check in s_initialize_chemistry_module (or equivalent init) would be cleaner. Not blocking.

4. [Low] Cray CPU-only performance impact of `cray_noinline=True` is unquantified

Files: src/common/m_phase_change.fpp, src/common/m_variables_conversion.fpp

The eight routines changed from cray_inline=True to cray_noinline=True will emit !DIR$ NOINLINE on Cray CPU (non-GPU) builds, swapping aggressive inlining for forced noinlining. The performance benchmarks in the PR description cover only GPU (Frontier OpenACC) runs. Cray CPU-only builds could regress. If these routines are called in hot paths, this warrants a CPU-only Cray benchmark or at minimum a note acknowledging the tradeoff.

5. [Informational] Resilient pyrometheus patch fallback path should log on success

File: toolchain/mfc/run/input.py (around line 118)

The if patched == thermochem_code: block correctly handles the case where pyrometheus has already been fixed upstream (falls through silently with pass). Consider adding a cons.print info/debug message in that branch so that once pyrometheus is fixed, maintainers are alerted that the workaround code path can be removed.

Positive notes

The mutual-exclusivity assert not (cray_inline and cray_noinline) in the Fypp macro is excellent defensive programming and will catch misuse at build time.
Per-file -Oipa0 scoped to if (CMAKE_Fortran_COMPILER_ID STREQUAL "Cray") correctly limits the workaround to affected builds.
The pyrometheus patch correctly distinguishes "not patched" from "already correct" and raises a hard error for the former, preventing silent wrong behavior.
Caller-side !DIR$ NOINLINE guards are correctly scoped to #ifdef _CRAYFTN + #ifdef MFC_OpenACC, consistent with the IPA workaround intent.
The FT = huge(1.0_wp) fix improves standard conformance independent of compiler.

On Cray+OpenMP, m_thermochem uses !DIR$ INLINEALWAYS (IPA inlining) so disabling IPA for m_phase_change/m_bubbles_EL breaks thermochem on-device calls → Phase Change and Lagrange Bubble tests crash at runtime (gpu-omp). On Cray+OpenACC, the pyrometheus patch emits !\ routine seq instead, so IPA is not needed for thermochem. On Cray CPU, GPU tests are skipped. Condition: Cray AND NOT MFC_OpenMP (covers OpenACC + CPU, excludes OpenMP). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-06T17:51:06Z

Claude Code Review

Head SHA: 5d177b797ac28b2d2cc661dc05b2aba8539c234b

Files changed: 10

.github/workflows/bench.yml
.github/workflows/frontier/submit.sh
.github/workflows/test.yml
CMakeLists.txt
src/common/include/parallel_macros.fpp
src/common/m_chemistry.fpp
src/common/m_phase_change.fpp
src/common/m_variables_conversion.fpp
src/simulation/m_bubbles_EL.fpp
toolchain/mfc/run/input.py

Summary

Workarounds for 7 distinct CCE 19.0.0 compiler bugs across Cray+OpenACC builds, all well-diagnosed and narrowly targeted.
Correctness fix for a pre-existing latent bug: cray_inline=True in GPU_ROUTINE was not emitting ! routine seq on Cray+OpenACC, relying silently on IPA inlining.
FT = huge(1.0_wp) initialization in s_TSat fixes a genuine undefined-behavior bug (Bug 2) regardless of compiler.
matmul() → explicit 2×2 arithmetic in m_phase_change.fpp is a safe, verifiable change with no numerical impact.
Phoenix GT and Frontier AMD CI temporarily disabled; PR body acknowledges these must be re-enabled before merge.

Findings

1. Incomplete VLA fix in m_bubbles_EL.fpp (informational)
src/simulation/m_bubbles_EL.fpp — the original diff line (removed):

integer, dimension(num_procs) :: part_order, part_ord_mpi
integer, dimension(num_procs) :: proc_bubble_counts

Only proc_bubble_counts was converted to allocatable. part_order and part_ord_mpi remain as dimension(num_procs) VLAs in the same subroutine. Since the per-file -Oipa0 flag is the primary guard against the IPA ICE (Bug 4), the remaining VLAs are likely safe under -Oipa0, but this warrants a note in the workaround comment in case part_order/part_ord_mpi cause a similar issue if -Oipa0 is ever lifted.

2. Duplicated CCE_MAX_SPECIES = 10 constant with no enforcement link

src/common/m_chemistry.fpp, line defining #:set CCE_MAX_SPECIES = 10
toolchain/mfc/run/input.py, line defining CCE_MAX_SPECIES = 10

A comment says these must match, but there is no compile-time or test-time enforcement. If a future contributor updates one and not the other, the runtime @:PROHIBIT in m_chemistry.fpp will fire with a confusing message (the Python check and the Fortran check will disagree). Consider a simple assertion in input.py that reads the constant from m_chemistry.fpp (e.g., via grep), or at minimum add a cross-reference comment on both sides that names the other file explicitly.

3. cray_noinline branch emits acc_directive without explanatory comment
src/common/include/parallel_macros.fpp, new cray_noinline=True block:

#ifdef _CRAYFTN
#if MFC_OpenACC
        $:acc_directive        ! <-- emits ! routine seq, NOT !DIR$ NOINLINE

The counterintuitive part — that on Cray+OpenACC the cray_noinline branch intentionally emits ! routine seq instead of !DIR$ NOINLINE (because !DIR$ NOINLINE on Cray+OpenACC causes ftn-790 + downstream castIsValid crash) — is explained in the PR description but not in the macro itself. A one-line comment at the #if MFC_OpenACC site would prevent future confusion and accidental regression.

4. Temporarily disabled CI coverage (tracking concern)
.github/workflows/test.yml and bench.yml: Phoenix GT (NVHPC acc/omp/cpu) and Frontier AMD (omp) are disabled with "TEMPORARILY DISABLED" comments. The PR body acknowledges these will be re-enabled before merge, but there is no linked tracking issue. Recommend filing a follow-up issue (or adding a TODO with the issue number) so this isn't accidentally left disabled post-merge.

Minor

CMakeLists.txt line 398: extra blank line added (cosmetic, no impact).
input.py CCE warning vs. error asymmetry: CPU builds issue a print warning while GPU builds raise MFCException. This is intentional (the @:PROHIBIT will abort CCE CPU builds at runtime), but the asymmetry is subtle and worth a brief comment explaining the rationale.

Overall this is a well-structured, carefully reasoned set of compiler-bug workarounds. Each fix is narrowly targeted, the diagnostics are preserved (PR description, code comments), and the approach of per-file -Oipa0 rather than global IPA disable is sound. The cray_inline correctness fix (Bug 7) is valuable beyond CCE 19. No blocking issues found.

Replace setup-build-cache.sh symlink mechanism with rm -rf build before each test run on Phoenix and Frontier. Benchmark jobs unaffected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-06T18:42:37Z

Claude Code Review

Head SHA: 2cdade93b2534264bff626c833c714f609e16f44

Files changed: 13

.github/workflows/bench.yml, .github/workflows/test.yml (CI matrix edits)
.github/workflows/frontier/build.sh, .github/workflows/frontier/submit.sh
.github/workflows/phoenix/bench.sh, .github/workflows/phoenix/test.sh
CMakeLists.txt
src/common/include/parallel_macros.fpp
src/common/m_chemistry.fpp
src/common/m_phase_change.fpp
src/common/m_variables_conversion.fpp
src/simulation/m_bubbles_EL.fpp
toolchain/mfc/run/input.py

Summary

Addresses 7 distinct CCE 19.0.0 compiler bugs across GPU/CPU paths without modifying numerical algorithms or GPU execution model.
Key changes: new cray_noinline knob in GPU_ROUTINE, fix pre-existing cray_inline correctness bug on Cray+OpenACC, per-file -Oipa0 in CMake, VLA→allocatable in m_bubbles_EL, fixed-size array guard in m_chemistry, pyrometheus output patching in input.py.
Bugs 2 (uninitialized FT) and 1 (matmul ICE) are straightforward correctness fixes; all others are compiler workarounds well-justified in the PR.
All 6 previously-failing Frontier CCE tests confirmed passing; benchmark data shows no regression on OpenACC GPU.

Findings

1. m_bubbles_EL.fpp — part_order and part_ord_mpi remain as VLAs
src/simulation/m_bubbles_EL.fpp — around line 1533 in the diff:

integer, dimension(num_procs) :: part_order, part_ord_mpi

Only proc_bubble_counts was converted to allocatable; part_order and part_ord_mpi retain dimension(num_procs) VLA form. If the IPA/InstCombine crash is triggered by the mere presence of any dimension(num_procs) VLA in the function, not just the one used in the crashing path, these two could trigger the same bug. The fix appears sufficient given the confirmed test results, but worth documenting why these are safe to leave as VLAs.

2. Magic constant CCE_MAX_SPECIES = 10 duplicated across two files

src/common/m_chemistry.fpp: #:set CCE_MAX_SPECIES = 10 (Fypp constant)
toolchain/mfc/run/input.py: CCE_MAX_SPECIES = 10 (Python constant)

A mismatch between these would be caught only at Fortran runtime via @:PROHIBIT. Consider adding a build-time or test assertion, or at minimum a cross-reference comment in both files. The PROHIBIT guard is functional but silent until runtime.

3. Pyrometheus macro patch is brittle
toolchain/mfc/run/input.py lines ~120–138:
The exact multi-line string old_macro is compared against pyrometheus output for replacement. The fallback check (verify new_macro is already present) is good defensive coding, but a pyrometheus version bump that changes whitespace or adds a comment would silently break this. A comment recording the specific pyrometheus commit/version that generates the known-broken format would help future maintainers determine when the workaround can be removed.

4. CI coverage gap — Phoenix (nvfortran) and Frontier AMD disabled in merged code
.github/workflows/test.yml, .github/workflows/bench.yml:
Both workflows contain # TEMPORARILY DISABLED comments that remove Phoenix (nvfortran GPU) and Frontier AMD (flang GPU OMP) from the CI matrix. The PR description says these will be re-enabled before merge, but the code as-is will merge with those runners absent. nvfortran is the primary GPU CI target for most MFC users. Consider opening a tracking issue and adding a TODO(#issue) reference in the comment.

5. Cray CPU performance regression not benchmarked
All benchmark data in the PR description is for CCE+OpenACC GPU. The switch from cray_inline=True to cray_noinline=True for s_compute_pressure, s_convert_species_to_mixture_variables_acc, s_compute_species_fraction, and s_compute_fast_magnetosonic_speed in m_variables_conversion.fpp changes their optimization from aggressive inlining to explicit no-inline on Cray CPU. For CPU-only CCE builds these routines are now explicitly !DIR$ NOINLINE, which could affect CPU performance. Low priority given these are compiler workarounds, but worth noting for completeness.

Minor Notes

The FT = huge(1.0_wp) fix in m_phase_change.fpp:~727 is correct. The in-code comment is accurate: Fortran does not short-circuit .or. evaluation, so abs(FT) is unconditionally evaluated before the loop body executes.
The matmul → explicit 2×2 arithmetic replacement is numerically identical and clearly correct.
The proc_bubble_counts allocate/deallocate placement is correct: the allocate is after the if (.not. parallel_io) return guard, and deallocate is only reachable when the array was allocated.
The mutual-exclusivity @:assert not (cray_inline and cray_noinline) in parallel_macros.fpp is good defensive macro hygiene.

When the runner process is killed (exit 137) before the SLURM job completes, sacct is used to verify the job's final state. If the SLURM job completed with exit 0:0, the CI step passes regardless of the monitor's exit code. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-06T19:21:50Z

Claude Code Review

Head SHA: 6e97695
Files changed: 14 — CMakeLists.txt, parallel_macros.fpp, m_phase_change.fpp, m_variables_conversion.fpp, m_chemistry.fpp, m_bubbles_EL.fpp, toolchain/mfc/run/input.py, 7 CI workflow files

Summary

Addresses 7 distinct CCE 19.0.0 compiler bugs (ICEs, SIGSEGV, uninitialized use) for Cray+OpenACC builds, none of which affect CPU or NVHPC builds.
The latent cray_inline / ! routine seq omission on Cray+OpenACC (Bug 7) is a real correctness fix; IPA had been silently masking it.
Per-file -Oipa0 in CMake is scoped narrowly and benchmarked to show no measurable regression.
The pyrometheus post-processing patch in input.py is fragile by necessity but includes robust fallback detection.
Six previously-failing Frontier CCE tests confirmed passing.

Findings

1. Silent removal of part_order and part_ord_mpi VLAs — src/simulation/m_bubbles_EL.fpp:1535
The original line integer, dimension(num_procs) :: part_order, part_ord_mpi is deleted entirely with no replacement or mention in the PR description. Only proc_bubble_counts was supposed to change. If either variable is referenced elsewhere in s_write_restart_lag_bubbles, this is a compile error. If they are genuinely unused dead variables, their removal is correct but should be noted. The diff does not make this clear, and the PR description only mentions changing proc_bubble_counts to allocatable.

2. -Oipa0 applied to Cray CPU builds, not just Cray+ACC — CMakeLists.txt:647
The guard is CMAKE_Fortran_COMPILER_ID STREQUAL "Cray" AND NOT MFC_OpenMP, which also fires for Cray CPU-only builds. The comment says "Cray+OpenACC and Cray CPU", so this is intentional, but the IPA crashes described (Bugs 3, 4) were only observed on Cray+OpenACC. Disabling IPA for Cray CPU is conservative but may suppress optimization that was working fine. Low risk given benchmarks show no regression, but worth a brief comment on why CPU is also included.

3. CCE_MAX_SPECIES = 10 duplicated in two files — src/common/m_chemistry.fpp:9 and toolchain/mfc/run/input.py:103
Both have a comment saying "must match", but a future change to one without the other would silently diverge. A small improvement would be to emit the Python value as a Fypp -D define at build time so only one source of truth exists, but the current approach with matching comments is acceptable for a workaround.

4. cray_noinline on non-Cray non-GPU CPU builds emits nothing — src/common/include/parallel_macros.fpp:65–80
When _CRAYFTN is not defined and no GPU backend is active, the cray_noinline branch emits no directive at all. The inline comment marks this intentional. Correct, since !DIR$ NOINLINE is Cray-specific. But the four routines in m_variables_conversion.fpp that were previously cray_inline=True (which also emitted nothing useful on non-Cray GPU builds before the Bug 7 fix) — the behavioral change is now correct on Cray+ACC. No issue, just noting the prior silent incorrectness.

5. Pyrometheus string patch relies on exact whitespace match — toolchain/mfc/run/input.py:120–135
The old_macro pattern is a multi-line exact string. If pyrometheus ever adjusts newline formatting or macro spacing, the if patched == thermochem_code check will catch it and raise MFCException. The fallback (if new_macro in thermochem_code: pass) correctly handles the case where pyrometheus is already fixed upstream. Fragile but safe.

Minor / non-blocking

m_bubbles_EL.fpp: the plain allocate(proc_bubble_counts(num_procs)) is correct here since the variable is only used in MPI I/O (no GPU kernel access), so skipping @:ALLOCATE is justified — worth a brief comment confirming this.
bench.yml removes Phoenix (NVHPC) matrix entries. The comment "TEMPORARILY DISABLED" is clear. Ensure there is a tracking issue or this gets re-enabled before merge, since Phoenix+NVHPC is the primary non-Cray GPU CI path.
The frontier/build.sh change replaces the build-cache setup with rm -rf build. This is a deliberate regression to avoid stale-cache issues, but it will increase CI build times. Fine for a correctness fix PR.

All three submit.sh scripts (phoenix, frontier, frontier_amd symlink) now call a single helper that wraps monitor_slurm_job.sh with sacct fallback: if the monitor is killed before the SLURM job completes, the helper re-checks the job's final state and exits 0 if it succeeded. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-06T19:33:28Z

Claude Code Review

Head SHA: 61924d8
Files changed: 15

.github/scripts/run_monitored_slurm_job.sh (new)
.github/workflows/bench.yml
.github/workflows/frontier/build.sh
.github/workflows/frontier/submit.sh
.github/workflows/phoenix/bench.sh
.github/workflows/phoenix/submit.sh
.github/workflows/phoenix/test.sh
.github/workflows/test.yml
CMakeLists.txt
src/common/include/parallel_macros.fpp
src/common/m_chemistry.fpp
src/common/m_phase_change.fpp
src/common/m_variables_conversion.fpp
src/simulation/m_bubbles_EL.fpp
toolchain/mfc/run/input.py

Summary

Workarounds for 7 distinct CCE 19.0.0 compiler bugs (InstCombine ICEs, IPA SIGSEGV, VLA crashes, and a pre-existing Cray+OpenACC GPU_ROUTINE correctness bug now exposed).
Latent cray_inline=True correctness bug fixed — on Cray+OpenACC those routines were never registered as OpenACC device routines; was only working because IPA inlined them.
New run_monitored_slurm_job.sh wrapper adds sacct-based recovery when the SLURM monitor is killed mid-job.
Phoenix and Frontier AMD CI temporarily disabled due to pre-existing infrastructure failures.
All 6 previously-failing Frontier CCE tests confirmed passing (SLURM job 4172615).

Findings

1. `src/simulation/m_bubbles_EL.fpp:1535` — `part_order` and `part_ord_mpi` silently removed

-        integer, dimension(num_procs) :: part_order, part_ord_mpi
-        integer, dimension(num_procs) :: proc_bubble_counts
+        integer, allocatable :: proc_bubble_counts(:)

part_order and part_ord_mpi are deleted with no allocatable replacement and no mention in the PR description. The PR description covers only proc_bubble_counts. If these variables are used anywhere in s_write_restart_lag_bubbles (not shown in the diff), this would be a silent compile error caught only on the affected paths. Since Frontier tests pass they are likely dead code, but this should be confirmed and noted in the commit.

Action: Please confirm part_order / part_ord_mpi are dead code (never referenced after declaration) and add a comment or remove them explicitly.

2. `CMakeLists.txt:644-647` — `set_source_files_properties` path for `m_phase_change`

"${CMAKE_BINARY_DIR}/fypp/simulation/m_phase_change.fpp.f90"

m_phase_change.fpp lives in src/common/, not src/simulation/. The Fypp preprocessed output path depends on how HANDLE_SOURCES organises files. If the preprocessed copy for the simulation target is placed under fypp/simulation/, this is correct. If it is placed under fypp/common/ (with a symlink or shared object), the property silently applies to nothing and -Oipa0 is never passed, leaving Bug 3 unmitigated on Cray+OpenACC. The test suite passing on Frontier suggests the path is correct, but it is worth an explicit comment confirming the layout.

3. `toolchain/mfc/run/input.py:116-130` — Pyrometheus patch is fragile

The string replacement matches an exact multi-line #ifdef _CRAYFTN block in the pyrometheus-generated m_thermochem.f90. Any change in whitespace, line-endings, or ordering by an upstream pyrometheus update will silently skip the patch (and new_macro won't be present either), triggering the exception. The error guard is good and will catch drift at input-generation time rather than at runtime, so this is acceptable as a short-term workaround. A TODO comment pointing to the pyrometheus upstream issue would help future maintainers know when this can be removed.

4. `src/common/m_chemistry.fpp` — `CCE_MAX_SPECIES = 10` is a hard chemistry limit for Cray

The fixed-size workaround limits all Cray builds (not just case-optimized pre_process) to ≤ 10 species. The @:PROHIBIT guards will abort at runtime rather than silently truncate, which is correct. However, GRI-Mech 3.0 and most real combustion mechanisms have far more than 10 species, so Cray chemistry users hitting this limit will see an opaque abort unless they read the message carefully. A more descriptive error message (e.g. pointing to the CCE 19 bug tracker entry or the PR) would improve debuggability.

Also note: @:PROHIBIT is placed as executable code in subroutines that are called inside GPU parallel loops (e.g. s_compute_reaction_source). On GPU builds the abort path may not be reachable at device execution, but the check fires on the CPU path through the loop body before GPU_PARALLEL_LOOP. Confirm this is the intended check location and does not cause issues on GPU kernels.

5. `src/common/include/parallel_macros.fpp:50-73` — `cray_noinline` on non-Cray CPU builds emits nothing

Per the inline comment this is intentional: !DIR$ NOINLINE is a Cray-specific directive and no equivalent is needed on gfortran/nvfortran/ifx. This is correct, but worth verifying that the missing !DIR$ NOINLINE on non-Cray does not cause correctness issues via inlining on other compilers (the IPA crash is CCE-specific, so this should be fine).

6. CI coverage gap (`.github/workflows/test.yml`)

Phoenix (NVHPC) and Frontier AMD CI are disabled for this PR. The PR description states these failures are pre-existing and unrelated, and that both will be re-enabled before merge. Please track the re-enablement — merging with reduced CI coverage means cray_inline fix impact on AMD OpenMP (--gpu mp) and NVHPC builds is not fully validated by the test suite.

Minor

CMakeLists.txt:400: extra blank line added between HANDLE_SOURCES calls — harmless.
run_monitored_slurm_job.sh:31: sleep 30 is necessary for SLURM epilog but could time out if the epilog is slow on congested nodes. Fine as-is.

Copilot AI review requested due to automatic review settings March 3, 2026 00:02

Copilot started reviewing on behalf of sbryngelson March 3, 2026 00:02 View session

This comment was marked as outdated.

Sign in to view

sbryngelson changed the title ~~Fix CCE 19.0.0 optcg ICE: GPU_ROUTINE cray_inline emits both !DIR$ INLINEALWAYS and ! routine seq~~ Fix CCE 19.0.0 IPA crash: add cray_noinline parameter to GPU_ROUTINE Mar 3, 2026

Spencer Bryngelson added 2 commits March 5, 2026 02:47

Temporarily disable Phoenix + Frontier AMD CI (pre-existing failures …

1aa4cf5

…unrelated to CCE fix)

sbryngelson force-pushed the fix/cce-cray-inline-routine branch from 6f97c9f to 1aa4cf5 Compare March 5, 2026 07:50

MFlowCode deleted a comment from github-actions bot Mar 5, 2026

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

Spencer Bryngelson and others added 2 commits March 5, 2026 11:51

Add comment noting pyrometheus upstream issue for thermochem GPU_ROUT…

c274109

…INE patch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'master' into fix/cce-cray-inline-routine

05d1dc0

Spencer Bryngelson and others added 2 commits March 6, 2026 02:15

Fix bench.yml: restore timeout-minutes to 480 (revert accidental 240)

e208275

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sbryngelson and others added 2 commits March 6, 2026 13:39

Remove persistent build cache for self-hosted test runners

9fc072a

Replace setup-build-cache.sh symlink mechanism with rm -rf build before each test run on Phoenix and Frontier. Benchmark jobs unaffected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove build cache from benchmark jobs on Phoenix and Frontier

2cdade9

sbryngelson mentioned this pull request Mar 6, 2026

Fix self-hosted CI robustness: build cache, SLURM QOS, and submit resilience #1295

Open

3 tasks

Conversation

sbryngelson commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug 1 — InstCombine ICE in `matmul()` (`m_phase_change.fpp`)

Bug 2 — Uninitialized `FT` in `s_TSat` (`m_phase_change.fpp`)

Bug 3 — IPA `bring_routine_resident` SIGSEGV (`m_phase_change.fpp`)

Bug 4 — IPA `castIsValid` ICE (`m_bubbles_EL.fpp`)

Bug 5 — Pyrometheus-generated `m_thermochem.f90` missing `!$acc routine seq` on Cray+OpenACC

Bug 6 — VLA `dimension(num_species)` ICE in case-optimized `pre_process` builds (`m_chemistry.fpp`)

Bug 7 — `cray_inline=True` in `GPU_ROUTINE` was broken on Cray+OpenACC (latent correctness bug)

Files changed

Testing

Performance (CCE 19.0.0 + OpenACC, Frontier)

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 5, 2026

Claude Code Review

Summary

Findings

Minor / Non-blocking

Uh oh!

github-actions bot commented Mar 5, 2026

Claude Code Review

Summary

Findings

Minor

Uh oh!

codecov bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Mar 5, 2026

Claude Code Review

Summary

Findings

1. — and silently removed

2. — silent success path in pyrometheus patch check

3. — magic number 10 repeated in 4 locations

4. / — CI environments disabled before merge

Minor / Improvements

Uh oh!

github-actions bot commented Mar 6, 2026

Claude Code Review

Summary

Findings

Positive observations

Uh oh!

github-actions bot commented Mar 6, 2026

Claude Code Review

Summary

Findings

Uh oh!

github-actions bot commented Mar 6, 2026

Claude Code Review

Findings

Minor / Non-blocking

Uh oh!

github-actions bot commented Mar 6, 2026

Claude Code Review

Summary

Findings

Minor

sbryngelson commented Mar 3, 2026 •

edited

Loading

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

codecov bot commented Mar 5, 2026 •

edited

Loading

1. [Medium] `part_order` and `part_ord_mpi` silently removed — not mentioned in PR description

2. [Low] `CCE_MAX_SPECIES = 10` duplicated in two places with no programmatic link

3. [Low] Repeated `@:PROHIBIT` checks per call rather than once at module init

4. [Low] Cray CPU-only performance impact of `cray_noinline=True` is unquantified

1. `src/simulation/m_bubbles_EL.fpp:1535` — `part_order` and `part_ord_mpi` silently removed

2. `CMakeLists.txt:644-647` — `set_source_files_properties` path for `m_phase_change`

3. `toolchain/mfc/run/input.py:116-130` — Pyrometheus patch is fragile

4. `src/common/m_chemistry.fpp` — `CCE_MAX_SPECIES = 10` is a hard chemistry limit for Cray

5. `src/common/include/parallel_macros.fpp:50-73` — `cray_noinline` on non-Cray CPU builds emits nothing

6. CI coverage gap (`.github/workflows/test.yml`)