Work around CCE 19.0.0 compiler bugs for Cray+OpenACC builds#1286
Work around CCE 19.0.0 compiler bugs for Cray+OpenACC builds#1286sbryngelson wants to merge 18 commits intoMFlowCode:masterfrom
Conversation
…: add -Oipa0 m_phase_change.fpp triggers the same CCE 19.0.0 bring_routine_resident SIGSEGV during IPA as m_bubbles_EL. Caller-side !DIR$ NOINLINE directives (commit 628a046) were insufficient. Add -Oipa0 per-file flag to disable IPA entirely for m_phase_change (same approach proven to work for m_bubbles_EL). Consolidate both files in one set_source_files_properties call. See PR MFlowCode#1286. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three distinct CCE 19.0.0 compiler bugs required fixes: Bug 1: InstCombine ICE in matmul() in m_phase_change.fpp - Replace matmul() with explicit 2x2 arithmetic Bug 2: IPA bring_routine_resident SIGSEGV in m_phase_change.fpp - Add -Oipa0 per-file in CMakeLists.txt (Cray+OpenACC only) - Use cray_noinline=True on 4 GPU_ROUTINE calls in m_phase_change.fpp and 4 in m_variables_conversion.fpp Bug 3: IPA castIsValid ICE in m_bubbles_EL.fpp - Change proc_bubble_counts from VLA to allocatable - Add -Oipa0 per-file in CMakeLists.txt (Cray+OpenACC only) Bug 4: m_chemistry.fpp VLA ICE in case-optimized pre_process builds - Guard 4 dimension(num_species) local arrays with USING_CCE Bug 5: Pyrometheus GPU_ROUTINE macro missing !acc routine seq on Cray+ACC - Post-process generated m_thermochem.f90 in toolchain/mfc/run/input.py to replace the broken Cray INLINEALWAYS-only macro with plain #define GPU_ROUTINE(name) !acc routine seq Also fix uninitialized FT in s_TSat (use huge(1.0_wp) not huge(FT)). See PR MFlowCode#1286.
…unrelated to CCE fix)
6f97c9f to
1aa4cf5
Compare
📝 WalkthroughWalkthroughThis pull request introduces Cray Fortran and OpenACC compiler compatibility fixes across multiple source files and build configuration. Changes include temporarily disabling specific test matrix configurations in CI workflows, adding IPA optimization disablement directives in CMakeLists.txt, introducing a new 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
.github/workflows/test.yml (1)
173-174: Add explicit tracking metadata to temporary CI disablements.Line 173, Line 204, and Line 260 mark whole lanes as temporarily disabled; adding an issue ID + target re-enable date in these comments would reduce the chance of silent long-term coverage loss.
Also applies to: 204-205, 260-261
toolchain/mfc/run/input.py (2)
97-100: Scope this warning to affected compiler/toolchain paths.This warning is emitted for all builds, but the message is explicitly CCE-specific. Please gate it to the relevant path(s) to avoid false alarms on unaffected compilers.
105-105: PEP 8: remove spaces around=in keyword arguments.Use
scalar_type=real_typeanddirective_offload=directive_str.As per coding guidelines
toolchain/**/*.py: Python toolchain code. Follow PEP 8.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ebd771ac-83df-419a-b4cc-266e0eaf4acc
📒 Files selected for processing (8)
.github/workflows/test.ymlCMakeLists.txtsrc/common/include/parallel_macros.fppsrc/common/m_chemistry.fppsrc/common/m_phase_change.fppsrc/common/m_variables_conversion.fppsrc/simulation/m_bubbles_EL.fpptoolchain/mfc/run/input.py
| #:if USING_CCE | ||
| real(wp), dimension(10) :: Ys | ||
| #:else | ||
| real(wp), dimension(num_species) :: Ys | ||
| #:endif |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check the exact state of m_chemistry.fpp at the specified lines
if [ -f "src/common/m_chemistry.fpp" ]; then
echo "=== Lines 66-70 (first fixed-size buffer) ==="
sed -n '66,70p' src/common/m_chemistry.fpp
echo ""
echo "=== Lines 108-112 (second fixed-size buffer) ==="
sed -n '108,112p' src/common/m_chemistry.fpp
echo ""
echo "=== Lines 142-148 (third fixed-size buffer) ==="
sed -n '142,148p' src/common/m_chemistry.fpp
echo ""
echo "=== Lines 191-201 (fourth fixed-size buffer) ==="
sed -n '191,201p' src/common/m_chemistry.fpp
else
echo "File not found"
fiRepository: MFlowCode/MFC
Length of output: 1635
🏁 Script executed:
# Search for all USING_CCE occurrences in chemistry module
echo "=== All USING_CCE branches in m_chemistry.fpp ==="
rg -n "USING_CCE" src/common/m_chemistry.fpp -B2 -A3Repository: MFlowCode/MFC
Length of output: 1161
🏁 Script executed:
# Check for existing CCE-specific species count guards
echo "=== CCE species guards in common modules ==="
rg -n "USING_CCE" src/common/ -A5 | grep -A5 "num_species\|species\|PROHIBIT" || trueRepository: MFlowCode/MFC
Length of output: 1395
🏁 Script executed:
# Check AMD pattern for comparison
echo "=== AMD guard pattern in m_checker_common.fpp ==="
rg -n "USING_AMD\|num_fluids\|PROHIBIT" src/common/m_checker_common.fpp -B2 -A2Repository: MFlowCode/MFC
Length of output: 106
🏁 Script executed:
# Check toolchain input.py for CCE/species handling
if [ -f "toolchain/mfc/run/input.py" ]; then
echo "=== Species limit handling in toolchain ==="
rg -n "USING_CCE\|num_species\|species.*10\|Warning" toolchain/mfc/run/input.py -C3 || echo "No matches found"
else
echo "toolchain/mfc/run/input.py not found"
fiRepository: MFlowCode/MFC
Length of output: 117
🏁 Script executed:
# Find m_checker_common.fpp
find src -name "*checker*" -type f | head -20Repository: MFlowCode/MFC
Length of output: 178
🏁 Script executed:
# Search for PROHIBIT statements in all common modules
echo "=== PROHIBIT statements in src/common ==="
rg -n "PROHIBIT" src/common/ -C2 || echo "No PROHIBIT found"Repository: MFlowCode/MFC
Length of output: 1523
🏁 Script executed:
# Search for any species limit checks or assertions
echo "=== Species limit checks anywhere ==="
rg -n "num_species.*10\|species.*limit\|species.*<\|species.*>" src/ -C2 || echo "No limits found"Repository: MFlowCode/MFC
Length of output: 110
🏁 Script executed:
# Check if s_check_amd exists and where
echo "=== s_check_amd subroutine ==="
rg -n "s_check_amd\|num_fluids.*3" src/ -C3 || echo "Not found"Repository: MFlowCode/MFC
Length of output: 97
🏁 Script executed:
# Search toolchain more broadly for CCE handling
echo "=== CCE handling in toolchain ==="
find toolchain -name "*.py" -type f | xargs grep -l "USING_CCE\|cce" 2>/dev/null || echo "No CCE in toolchain"Repository: MFlowCode/MFC
Length of output: 812
🏁 Script executed:
# Check what currently guards CCE or species in input preparation
echo "=== All files mentioning chemistry or species constraints ==="
rg -n "num_species\|n_species" toolchain/ src/common/m_chemistry.fpp -C1 | head -40 || echo "Not found"Repository: MFlowCode/MFC
Length of output: 118
🏁 Script executed:
# Check how the dimension(10) arrays are actually used - are they indexed by num_species?
echo "=== Usage of Ys in s_compute_q_T_sf (line 66-70 context) ==="
sed -n '60,90p' src/common/m_chemistry.fppRepository: MFlowCode/MFC
Length of output: 1448
🏁 Script executed:
# Check the other fixed-size array usages
echo "=== Usage of Ys in s_compute_T_from_primitives (line 108-112 context) ==="
sed -n '100,130p' src/common/m_chemistry.fppRepository: MFlowCode/MFC
Length of output: 1428
🏁 Script executed:
# Check the bigger context for lines 142-148
echo "=== s_compute_chemistry_reaction_flux usage (140-170) ==="
sed -n '140,170p' src/common/m_chemistry.fppRepository: MFlowCode/MFC
Length of output: 1453
🏁 Script executed:
# Check the diffusion flux subroutine (191-220)
echo "=== s_compute_chemistry_diffusion_flux usage (185-220) ==="
sed -n '185,220p' src/common/m_chemistry.fppRepository: MFlowCode/MFC
Length of output: 1829
🏁 Script executed:
# Look for any loops that iterate over num_species with these arrays
echo "=== Loops indexing the fixed-size arrays ==="
rg -n "do.*i.*num_species\|do.*species\|Ys\(i\)\|omega\(i\)" src/common/m_chemistry.fpp -B2 -A2 | head -60Repository: MFlowCode/MFC
Length of output: 101
🏁 Script executed:
# Look for any CCE-specific guards or checks
echo "=== Search for USING_CCE guards or checks ==="
rg -n "USING_CCE" src/ toolchain/ -C3 | grep -E "PROHIBIT|WARNING|check|limit|species" -C2 || echo "No CCE guards found"Repository: MFlowCode/MFC
Length of output: 663
🏁 Script executed:
# What are chemxb and chemxe?
echo "=== Definition of chemxb and chemxe ==="
rg -n "chemxb|chemxe" src/common/ -B1 -A1 | head -30Repository: MFlowCode/MFC
Length of output: 2478
🏁 Script executed:
# Check if there's a difference in how AMD vs CCE are handled
echo "=== Comparison: AMD check vs CCE ==="
sed -n '55,65p' src/common/m_checker_common.fppRepository: MFlowCode/MFC
Length of output: 687
🏁 Script executed:
# Check entire m_checker_common guard section around num_fluids and num_species
echo "=== Full guard section in m_checker_common.fpp ==="
sed -n '50,70p' src/common/m_checker_common.fppRepository: MFlowCode/MFC
Length of output: 860
🏁 Script executed:
# Verify the exact condition AMD enforces
echo "=== AMD chemistry constraint details ==="
sed -n '59,62p' src/common/m_checker_common.fppRepository: MFlowCode/MFC
Length of output: 464
🏁 Script executed:
# Check if there's a separate CCE compiler check subroutine
echo "=== Search for CCE-specific startup checks ==="
rg -n "s_check.*cce\|USING_CCE.*PROHIBIT\|CCE.*species" src/ -i -C3 || echo "No CCE-specific startup checks found"Repository: MFlowCode/MFC
Length of output: 140
🏁 Script executed:
# Verify chemxb and chemxe are defined based on num_species
echo "=== chemxb/chemxe definition ==="
rg -n "chemxb.*=|chemxe.*=" src/common/ -B2 -A2 | head -40Repository: MFlowCode/MFC
Length of output: 3113
🏁 Script executed:
# Verify that the offset calculation (eqn - chemxb + 1) is the indexing pattern
echo "=== Array indexing pattern in chemistry loops ==="
rg -n "Ys\(.*chemxb\|omega\(.*chemxb\|eqn - chemxb" src/common/m_chemistry.fpp | head -20Repository: MFlowCode/MFC
Length of output: 106
🏁 Script executed:
# Check toolchain or build files for CCE vs AMD species constraints
echo "=== Build or toolchain constraints for CCE ==="
rg -n "USING_CCE\|CCE.*species\|cce.*num_species" toolchain/ -C3 -i || echo "Not found in toolchain"Repository: MFlowCode/MFC
Length of output: 126
🏁 Script executed:
# Check if there's documentation about CCE limits
echo "=== CCE documentation ==="
find . -name "*.md" -o -name "*.txt" | xargs grep -l "CCE\|cce" 2>/dev/null | head -5Repository: MFlowCode/MFC
Length of output: 205
🏁 Script executed:
# Verify the actual loop ranges by checking sys_size definitions
echo "=== Check if chemxb/chemxe are set based on num_species ==="
rg -n "chemxb\s*=|chemxe\s*=" src/common/ -B3 -A1 | head -50Repository: MFlowCode/MFC
Length of output: 116
🏁 Script executed:
# Search more broadly for where chemxb/chemxe are set
echo "=== chemxb/chemxe variable assignments ==="
rg -n "chemxb|chemxe" src/common/m_global_variables.fpp -B2 -A2 | head -60Repository: MFlowCode/MFC
Length of output: 190
🏁 Script executed:
# Check the global variables module
echo "=== Global variables that determine chemistry indices ==="
grep -n "chemxb\|chemxe\|num_species" src/common/m_global_variables.fpp | head -30Repository: MFlowCode/MFC
Length of output: 198
🏁 Script executed:
# Look for initialization of these variables
echo "=== Where are chemistry indices initialized ==="
rg -n "chemxb\s*=" src/common/ -B2 -A2 | head -50Repository: MFlowCode/MFC
Length of output: 104
🏁 Script executed:
# Check if there's a pattern - does chemxe - chemxb equal num_species?
echo "=== Logic connecting chemistry indices to num_species ==="
sed -n '1,100p' src/common/m_global_variables.fpp | grep -n "chem\|species"Repository: MFlowCode/MFC
Length of output: 208
🏁 Script executed:
# Find the correct global variables file
echo "=== Global variables files ==="
find src/common -name "*global*" -o -name "*variables*" | head -20Repository: MFlowCode/MFC
Length of output: 125
🏁 Script executed:
# Search across all common modules for chemxb definition
echo "=== chemxb definition search ==="
rg -n "chemxb" src/common/ -B5 -A2 | head -80Repository: MFlowCode/MFC
Length of output: 5901
🏁 Script executed:
# Find where these indices are calculated based on species
echo "=== Module using chemistry indices ==="
rg -l "chemxb|chemxe" src/common/ | head -10Repository: MFlowCode/MFC
Length of output: 197
🏁 Script executed:
# Search for chemxb declaration in module files
echo "=== chemxb/chemxe declaration ==="
rg -n "integer.*chemxb|integer.*chemxe|allocatable.*chemxb" src/ -C2 | head -60Repository: MFlowCode/MFC
Length of output: 1132
🏁 Script executed:
# Check what modules import or define these
echo "=== Modules defining chemistry indices ==="
rg -n "^.*chemxb|^.*chemxe" src/common/*.fpp | head -20Repository: MFlowCode/MFC
Length of output: 2139
🏁 Script executed:
# Look in simulation module where these might be set
echo "=== Chemistry setup in simulation ==="
find src/simulation -name "*.fpp" -o -name "*.f90" | xargs grep -l "chemxb" | head -5Repository: MFlowCode/MFC
Length of output: 249
🏁 Script executed:
# Check for inline definitions or calculations
echo "=== All references to chemxb with context ==="
rg -n "chemxb.*=" src/ -C3 | head -100Repository: MFlowCode/MFC
Length of output: 8109
🏁 Script executed:
# Final verification: confirm num_species calculation
echo "=== species_idx definition ==="
rg -n "species_idx" src/ -B2 -A2 | grep "species_idx.*=" | head -10Repository: MFlowCode/MFC
Length of output: 795
Add CCE startup guard for fixed-size chemistry buffers.
The USING_CCE branches at lines 66-70, 108-112, 142-148, and 191-201 declare fixed-size dimension(10) arrays for chemistry temporaries. Loops iterate from chemxb to chemxe, spanning num_species elements (confirmed: chemxe = sys_size + num_species). If num_species > 10 under USING_CCE, array accesses overflow. The AMD compiler path enforces this via @:PROHIBIT(chemistry .and. num_species /= 10, ...) in m_checker_common.fpp:61. Add equivalent guards in all four subroutines:
s_compute_q_T_sf(line 66-70)s_compute_T_from_primitives(line 108-112)s_compute_chemistry_reaction_flux(line 142-148)s_compute_chemistry_diffusion_flux(line 191-201)
Proposed guard pattern
#:if USING_CCE
@:PROHIBIT(num_species > 10, "CCE chemistry requires num_species <= 10")
#:endifAdd this check in each subroutine after the fixed-size array declarations.
toolchain/mfc/run/input.py
Outdated
| if directive_str == 'acc': | ||
| old_macro = ( | ||
| "#ifdef _CRAYFTN\n#define GPU_ROUTINE(name) !DIR$ INLINEALWAYS name\n" | ||
| "#else\n#define GPU_ROUTINE(name) !$acc routine seq\n#endif" | ||
| ) | ||
| new_macro = "#define GPU_ROUTINE(name) !$acc routine seq" | ||
| patched = thermochem_code.replace(old_macro, new_macro) | ||
| if patched == thermochem_code: | ||
| raise common.MFCException( | ||
| "CCE 19.0.0 workaround: pyrometheus output format changed — " | ||
| "Cray+ACC GPU_ROUTINE macro patch did not apply. " | ||
| "Update the pattern in toolchain/mfc/run/input.py." | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python -m pip install --quiet "cantera==3.1.0" "pyrometheus" 2>&1 | grep -v "already satisfied" || true
python - <<'PY'
import cantera as ct
import pyrometheus as pyro
old = "#ifdef _CRAYFTN\n#define GPU_ROUTINE(name) !DIR$ INLINEALWAYS name\n#else\n#define GPU_ROUTINE(name) !$acc routine seq\n#endif"
new = "#define GPU_ROUTINE(name) !$acc routine seq"
sol = ct.Solution("h2o2.yaml")
code = pyro.FortranCodeGenerator().generate(
"m_thermochem",
sol,
pyro.CodeGenerationOptions(scalar_type="real(dp)", directive_offload="acc"),
)
print("old_pattern_found:", old in code)
print("new_pattern_found:", new in code)
print("\nFirst 2000 chars of generated code:")
print(code[:2000])
PYRepository: MFlowCode/MFC
Length of output: 3061
Add defensive check before raising exception in pyrometheus macro patching.
The current code raises a hard failure when patched == thermochem_code, which triggers for both "unknown format" and "already-correct format." If pyrometheus evolves to emit the desired new_macro form directly, this code will unnecessarily fail. Before raising an exception, check if the new macro is already present in the output:
Proposed fix
if directive_str == 'acc':
old_macro = (
"#ifdef _CRAYFTN\n#define GPU_ROUTINE(name) !DIR$ INLINEALWAYS name\n"
"#else\n#define GPU_ROUTINE(name) !$acc routine seq\n#endif"
)
new_macro = "#define GPU_ROUTINE(name) !$acc routine seq"
patched = thermochem_code.replace(old_macro, new_macro)
if patched == thermochem_code:
- raise common.MFCException(
- "CCE 19.0.0 workaround: pyrometheus output format changed — "
- "Cray+ACC GPU_ROUTINE macro patch did not apply. "
- "Update the pattern in toolchain/mfc/run/input.py."
- )
+ if new_macro in thermochem_code:
+ patched = thermochem_code # already in desired form
+ else:
+ raise common.MFCException(
+ "CCE 19.0.0 workaround: pyrometheus output format changed — "
+ "Cray+ACC GPU_ROUTINE macro patch did not apply. "
+ "Update the pattern in toolchain/mfc/run/input.py."
+ )
thermochem_code = patched🧰 Tools
🪛 Ruff (0.15.2)
[warning] 119-123: Avoid specifying long messages outside the exception class
(TRY003)
Claude Code ReviewHead SHA: Summary
Findings[Medium] The CCE workaround substitutes [Low] Only [Low] On Cray+ACC and Cray+OMP the macro emits no [Attention] CI matrix permanently shrinks in this PR Phoenix (NVHPC, 3 configs) and Frontier AMD (3 configs) are disabled. The PR body says these are temporary pending infrastructure fixes, but there is no tracking issue linked and the comment strings say "TEMPORARILY DISABLED" with no re-enable deadline. Before merging, consider opening a follow-up issue to ensure these are re-added and not forgotten. Minor / Non-blocking
|
…compat Add @:PROHIBIT(num_species > 10) in all four USING_CCE blocks in m_chemistry.fpp so CCE builds with >10 species fail with a clear message rather than silently overflowing the fixed-size dimension(10) arrays (matching the existing AMD guard in m_checker_common.fpp). Make pyrometheus GPU_ROUTINE macro patch forward-compatible: if a future pyrometheus version already emits the correct form directly, skip the patch rather than raising an exception. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: Summary
Findings1. The PR description discusses This is a meaningful correctness fix for routines like 2.
integer, dimension(2) :: gsizes, lsizes, start_idx_part
integer, dimension(num_procs) :: part_order, part_ord_mpi ← still VLAIf the root cause of Bug 4 is CCE 19.0.0's inability to handle 3. The condition change from 4. CI scope reduction — no tracking issue linked Phoenix (NVHPC + OpenACC, NVHPC + OpenMP, CPU) and Frontier AMD (OpenMP GPU) are all disabled. This means the only CI validation for the Cray-specific changes is on Frontier CCE. The PR body says "to be re-enabled before merge" — but both 5. The Minor
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1286 +/- ##
==========================================
- Coverage 44.95% 44.94% -0.01%
==========================================
Files 70 70
Lines 20503 20504 +1
Branches 1946 1946
==========================================
- Hits 9217 9216 -1
- Misses 10164 10166 +2
Partials 1122 1122 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…INE patch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: 05d1dc0 Files changed: 9
Summary
Findings1. — and silently removedThe diff removes the declaration of two VLA arrays alongside - integer, dimension(num_procs) :: part_order, part_ord_mpi
- integer, dimension(num_procs) :: proc_bubble_counts
+ integer, allocatable :: proc_bubble_counts(:)
2. — silent success path in pyrometheus patch checkif patched == thermochem_code:
if new_macro in thermochem_code:
pass # pyrometheus already emits the correct form; no patch needed
else:
raise common.MFCException(...)The 3. — magic number 10 repeated in 4 locationsThe 4. / — CI environments disabled before mergePhoenix (GT/NVHPC) and Frontier AMD runners are commented out. The PR description states they are "to be re-enabled before merge", but they are absent from the diff in the Minor / Improvements
Overall the approach is well-justified, each workaround is narrowly scoped, and the error-detection in |
Benchmark jobs were using the extended partition (5:59 walltime, ENG160 account) causing multi-hour queue waits and hitting GHA's 8h wall-clock limit. The actual benchmark runs in ~20 minutes on the node. Switch to batch + 1:59 + --qos=normal (same as the test suite jobs). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: e208275 Summary
Findings1. - integer, dimension(num_procs) :: part_order, part_ord_mpi
- integer, dimension(num_procs) :: proc_bubble_counts
+ integer, allocatable :: proc_bubble_counts(:)
2. The 3. if sol.n_species > 10:
cons.print(f"[bold yellow]Warning:[/bold yellow] ...")When 4. For non-Cray, non-GPU CPU builds, the 5. CI disablement — No tracking issue linked Both Positive observations
|
…rflow, noinline comment
- m_chemistry.fpp: introduce CCE_MAX_SPECIES Fypp constant (= 10) as single
source of truth; replace all 8 hardcoded dimension(10) and 4 PROHIBIT(> 10)
occurrences with ${CCE_MAX_SPECIES}$
- input.py: elevate n_species > CCE_MAX_SPECIES from warning to MFCException
for GPU builds (directive_str is not None); CPU builds still warn
- parallel_macros.fpp: add comment explaining cray_noinline emits nothing on
non-Cray CPU builds (intentional — !DIR$ NOINLINE is Cray-specific)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: Files changed: 10
Summary
Findings[M] Lines [M]
[M] Lines [L] CI coverage gap — Phoenix NVHPC GPU tests disabled
[L]
[I]
Overall: The fixes are surgical, well-motivated, and the numerical/GPU correctness issues are properly addressed. The main asks before final merge are: (1) re-enable Phoenix CI or create a tracking issue, (2) reconsider the CCE_MAX_SPECIES=10 ceiling, and (3) clarify the plain |
## is only valid inside Fypp blocks (#:def, #:if). At file top-level it passes through to the .f90 output, causing gfortran CPP to error with 'invalid preprocessing directive ##'. Switch to #! which Fypp always strips. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: Files changed: 10 Summary
Findings1. Silent removal of The diff removes two VLA declarations that are not mentioned in the PR description: - integer, dimension(num_procs) :: part_order, part_ord_mpi
- integer, dimension(num_procs) :: proc_bubble_counts
+ integer, allocatable :: proc_bubble_counts(:)If 2. Duplicate The constant is defined independently in:
These must be kept in sync manually. If one is updated and the other is not, the Python-side early error check and the Fortran-side 3. Pyrometheus patch in The "#ifdef _CRAYFTN\n#define GPU_ROUTINE(name) !DIR$ INLINEALWAYS name\n"
"#else\n#define GPU_ROUTINE(name) ! routine seq\n#endif"Any whitespace or ordering change in the pyrometheus code generator will silently cause the patch not to apply. The existing guard (raising 4. Phoenix and Frontier AMD CI left disabled with no tracking issue Both # Phoenix (GT) — TEMPORARILY DISABLED (pre-existing SLURM/Case Opt failures)
# Frontier AMD — TEMPORARILY DISABLED (pre-existing failures unrelated to CCE fix)These cover NVHPC GPU builds and AMD OpenMP GPU builds respectively. Given these are being merged to Minor / Non-blocking
|
Same root cause as m_chemistry.fpp fix: ## is not a Fypp comment and passes through to the generated .f90 output. Inside #ifdef _CRAYFTN, gfortran never sees the ## lines (since _CRAYFTN is undefined there), but CCE does and errors with 'Unknown or unsupported compiler directive'. Change to #! which Fypp always strips. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: 8a6398c
Summary
Findings
integer, dimension(num_procs) :: part_order, part_ord_mpi ! still VLAsThese are also
The magic constant
The comment says they "must match" but there is no automated check enforcing this. A future edit to one without the other would silently allow mechanisms that hit the Fortran
The string-match patch at lines ~125–140 is intentionally defensive: it raises
Phoenix (GT) and Frontier AMD coverage is disabled with "TEMPORARILY DISABLED" comments. The PR description mentions these will be re-enabled before merge, but the current state of the PR doesn't reflect that — if merged as-is, there is no CI gate for NVIDIA nvfortran GPU builds (Phoenix) or AMD flang builds. Please either:
Minor
|
m_phase_change triggers a bring_routine_resident SIGSEGV (ftn-2116 INTERNAL) on CCE 19.0.0 CPU-only builds too, not just OpenACC GPU builds. Widen the CMakeLists guard from 'Cray AND MFC_OpenACC' to 'Cray' to fix the CCE CPU simulation build. See master CI run 22627725058 for the failure evidence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA:
Summary
Findings1. [Medium]
|
On Cray+OpenMP, m_thermochem uses !DIR$ INLINEALWAYS (IPA inlining) so disabling IPA for m_phase_change/m_bubbles_EL breaks thermochem on-device calls → Phase Change and Lagrange Bubble tests crash at runtime (gpu-omp). On Cray+OpenACC, the pyrometheus patch emits !\ routine seq instead, so IPA is not needed for thermochem. On Cray CPU, GPU tests are skipped. Condition: Cray AND NOT MFC_OpenMP (covers OpenACC + CPU, excludes OpenMP). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: Files changed: 10 Summary
Findings1. Incomplete VLA fix in integer, dimension(num_procs) :: part_order, part_ord_mpi
integer, dimension(num_procs) :: proc_bubble_countsOnly 2. Duplicated
A comment says these must match, but there is no compile-time or test-time enforcement. If a future contributor updates one and not the other, the runtime 3. #ifdef _CRAYFTN
#if MFC_OpenACC
$:acc_directive ! <-- emits ! routine seq, NOT !DIR$ NOINLINEThe counterintuitive part — that on Cray+OpenACC the 4. Temporarily disabled CI coverage (tracking concern) Minor
Overall this is a well-structured, carefully reasoned set of compiler-bug workarounds. Each fix is narrowly targeted, the diagnostics are preserved (PR description, code comments), and the approach of per-file |
Replace setup-build-cache.sh symlink mechanism with rm -rf build before each test run on Phoenix and Frontier. Benchmark jobs unaffected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: Files changed: 13
Summary
Findings1. integer, dimension(num_procs) :: part_order, part_ord_mpiOnly 2. Magic constant
A mismatch between these would be caught only at Fortran runtime via 3. Pyrometheus macro patch is brittle 4. CI coverage gap — Phoenix (nvfortran) and Frontier AMD disabled in merged code 5. Cray CPU performance regression not benchmarked Minor Notes
|
When the runner process is killed (exit 137) before the SLURM job completes, sacct is used to verify the job's final state. If the SLURM job completed with exit 0:0, the CI step passes regardless of the monitor's exit code. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: 6e97695 Summary
Findings1. Silent removal of 2. 3. 4. 5. Pyrometheus string patch relies on exact whitespace match — Minor / non-blocking
|
All three submit.sh scripts (phoenix, frontier, frontier_amd symlink) now call a single helper that wraps monitor_slurm_job.sh with sacct fallback: if the monitor is killed before the SLURM job completes, the helper re-checks the job's final state and exits 0 if it succeeded. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: 61924d8
Summary
Findings1.
|
Summary
CCE 19.0.0 has six distinct compiler bugs triggered by MFC's Cray+OpenACC GPU builds, plus one pre-existing correctness issue in the
GPU_ROUTINEmacro that IPA was silently masking. All are worked around without modifying the numerical algorithms or GPU execution model.Bug 1 — InstCombine ICE in `matmul()` (`m_phase_change.fpp`)
CCE 19.0.0's InstCombine pass crashes on `matmul()` inside a GPU kernel.
Fix: Replace `matmul()` with explicit 2×2 scalar arithmetic.
Bug 2 — Uninitialized `FT` in `s_TSat` (`m_phase_change.fpp`)
`huge(FT)` before `FT` was declared caused undefined behavior caught by CCE.
Fix: Use `huge(1.0_wp)` instead.
Bug 3 — IPA `bring_routine_resident` SIGSEGV (`m_phase_change.fpp`)
CCE 19.0.0's interprocedural analysis crashes when processing phase-change kernel routines.
Two sub-approaches combined:
Applies `cray_noinline=True` to 4 routines in `m_phase_change.fpp` and 4 in `m_variables_conversion.fpp`.
Bug 4 — IPA `castIsValid` ICE (`m_bubbles_EL.fpp`)
Complex GPU loops combined with a `dimension(num_procs)` VLA trigger an InstCombine PHI crash during IPA.
Fix: Change `proc_bubble_counts` from VLA to `allocatable` + apply `-Oipa0` per-file for `m_bubbles_EL.fpp` in `CMakeLists.txt` (Cray+OpenACC only).
Bug 5 — Pyrometheus-generated `m_thermochem.f90` missing `!$acc routine seq` on Cray+OpenACC
Pyrometheus emits `!DIR$ INLINEALWAYS name` for Cray but omits `!$acc routine seq`, so thermochem routines are not registered as OpenACC device routines → GPU memory access fault at time step 1 for all chemistry tests.
Fix: Post-process the generated code in `toolchain/mfc/run/input.py` to replace the broken Cray `#ifdef` block with `#define GPU_ROUTINE(name) !$acc routine seq`.
Bug 6 — VLA `dimension(num_species)` ICE in case-optimized `pre_process` builds (`m_chemistry.fpp`)
`dimension(num_species)` local arrays in CPU routines trigger a CCE 19.0.0 InstCombine ICE in case-optimized `pre_process` builds where `num_species` is a runtime variable. Unlike simulation files, `pre_process` does not get `-Oipa0`, so a source guard is needed.
Fix: Guard all 4 VLA locations with `#:if USING_CCE` to use `dimension(10)` instead.
Bug 7 — `cray_inline=True` in `GPU_ROUTINE` was broken on Cray+OpenACC (latent correctness bug)
Before this PR, `cray_inline=True` on Cray+OpenACC emitted only `!DIR$ INLINEALWAYS name` with no `!$acc routine seq`. This means 33 routines across 8 files (`m_bubbles.fpp`, `m_bubbles_EL_kernels.fpp`, `m_compute_cbc.fpp`, `m_sim_helpers.fpp`, `m_qbmm.fpp`, `m_bubbles_EL.fpp`, `m_boundary_common.fpp`, `m_chemistry.fpp`) were not registered as OpenACC device routines on Cray. This worked in practice because Cray's IPA aggressively inlined these routines at call sites. With `-Oipa0` disabled for Bug 4, this inlining path breaks.
Fix: The `cray_inline=True` branch in `GPU_ROUTINE` now correctly emits `!$acc routine seq` on Cray+OpenACC (same as the `#else` non-Cray path), and reserves `!DIR$ INLINEALWAYS` for Cray CPU-only builds. This is the correct behavior per the OpenACC spec.
Files changed
Testing
All 6 previously-failing tests confirmed passing on Frontier with CCE 19.0.0 + OpenACC (SLURM job 4172615):
Performance (CCE 19.0.0 + OpenACC, Frontier)
No measurable regressions from the `-Oipa0` per-file flags and `cray_inline` fix. Benchmark grind times vs master (all differences ≤ 2%, within GPU run-to-run noise of ~5–10%):
All GitHub CI (ubuntu + macOS) passing. Frontier CCE CI fully passing. Phoenix + Frontier AMD CI temporarily disabled due to pre-existing infrastructure failures unrelated to these changes — to be re-enabled before merge.
🤖 Generated with Claude Code