Improved error reporting for failed tolerance checks #988

DimAdam-01 · 2025-08-18T10:50:03Z

User description

Description

Please include a summary of the changes and the related issue(s) if they exist.
Please also include relevant motivation and context.

Improves the tolerance validation system by adding comprehensive error diagnostics when tests fail.

Current behavior:

Fails fast on the first variable that exceeds tolerance (either absolute OR relative)
Reports only the location of the first failure
Limited diagnostic information for debugging numerical issues

Enhanced behavior:

Still fails fast on first tolerance violation (maintains performance)
After failure, scans ALL files to identify variables that fail tolerance checks
Reports both the first failure location AND the maximum errors among failing variables
Provides targeted diagnostic information including:
- Where the maximum absolute error occurs among failing variables (with its relative error)
- Where the maximum relative error occurs among failing variables (with its absolute error)

Benefits:

Better debugging of numerical accuracy issues
Identifies worst-case errors across entire dataset
Maintains fast-fail performance for passing tests
Comprehensive error analysis for failed test cases

This enhancement helps developers quickly identify both where tests first fail and where the most significant numerical errors occur in the simulation results.

Fixes #(issue) [optional]

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Something else

Scope

This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration

Test A
Test B

Test Configuration:

What computers and compilers did you use to test this:

Checklist

I have added comments for the new code
I added Doxygen docstrings to the new code
I have made corresponding changes to the documentation (docs/)
I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
I have added example cases in examples/ that demonstrate my new feature performing as expected.
They run to completion and demonstrate "interesting physics"
I ran ./mfc.sh format before committing my code
New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
This PR does not introduce any repeated code (it follows the DRY principle)
I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in `src/simulation`)

To make sure the code is performing as expected on GPU devices, I have:

Checked that the code compiles using NVHPC compilers
Checked that the code compiles using CRAY compilers
Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
Enclosed the new feature via nvtx ranges so that they can be identified in profiles
Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
Ran a Rocprof Systems profile using ./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.
Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

PR Type

Enhancement

Description

Enhanced tolerance validation with comprehensive error diagnostics
Added maximum error scanning across all files after failure
Improved error reporting with detailed diagnostic information
Maintained fast-fail performance for passing tests

Diagram Walkthrough

flowchart LR
  A["Tolerance Check"] --> B["First Failure Detected"]
  B --> C["Scan ALL Files"]
  C --> D["Find Max Absolute Error"]
  C --> E["Find Max Relative Error"]
  D --> F["Format Diagnostic Report"]
  E --> F
  F --> G["Enhanced Error Message"]

File Walkthrough

Relevant files

Enhancement

tol.py `Enhanced tolerance validation with comprehensive error diagnostics` toolchain/mfc/packer/tol.py Replaced simple error reporting with comprehensive diagnostics Added `find_maximum_errors()` function to scan all files for worst-case errors Added `format_diagnostic_message()` function for detailed error formatting Enhanced error messages to include maximum absolute and relative error locations	+76/-7

qodo-merge-pro · 2025-08-18T10:50:23Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue Helper function defined inside the loop calls `find_maximum_errors(candidate, golden)` on every early-return path, which re-scans all files. This could be expensive and also relies on `golden.entries.items()` ordering; ensure deterministic ordering if messages are compared in tests and consider computing diagnostics once per failure or memoizing. def raise_err_with_max_diagnostics(msg: str): # Find maximum errors across ALL files for diagnostics max_errors = find_maximum_errors(candidate, golden) diagnostic_msg = format_diagnostic_message(max_errors) return None, f"""\ Variable n°{valIndex+1} (1-indexed) in {gFilepath} {msg}: - Candidate: {cVal} - Golden: {gVal} - Error: {error} - Tolerance: {tol}{diagnostic_msg} """ if math.isnan(gVal): return raise_err_with_max_diagnostics("is NaN in the golden file") if math.isnan(cVal): return raise_err_with_max_diagnostics("is NaN in the pack file") if not is_close(error, tol): return raise_err_with_max_diagnostics("is not within tolerance") Formatting/Consistency Diagnostic message mixes plain numbers and scientific notation; relative error sometimes shows "NaN". Consider consistent formatting for candidate/golden values and guarding against "NaN" strings in max-relative section to avoid confusing output. def format_diagnostic_message(max_errors: typing.Tuple[typing.Optional[typing.Tuple[str, int, float, float, float, float]], typing.Optional[typing.Tuple[str, int, float, float, float, float]]]) -> str: """Format the diagnostic message showing maximum errors.""" max_abs_info, max_rel_info = max_errors diagnostic_msg = "" if max_abs_info: filepath, var_idx, golden_val, candidate_val, abs_error, rel_error = max_abs_info rel_error_str = f"{rel_error:.2E}" if not math.isnan(rel_error) else "NaN" diagnostic_msg += f"\n\nDiagnostics - Maximum absolute error across ALL files:\n" \ f" - File: {filepath}\n" \ f" - Variable n°{var_idx+1}\n" \ f" - Candidate: {candidate_val}\n" \ f" - Golden: {golden_val}\n" \ f" - Absolute Error: {abs_error:.2E}\n" \ f" - Relative Error: {rel_error_str}" if max_rel_info: filepath, var_idx, golden_val, candidate_val, rel_error, abs_error = max_rel_info diagnostic_msg += f"\n\nDiagnostics - Maximum relative error across ALL files:\n" \ f" - File: {filepath}\n" \ f" - Variable n°{var_idx+1}\n" \ f" - Candidate: {candidate_val}\n" \ f" - Golden: {golden_val}\n" \ f" - Relative Error: {rel_error:.2E}\n" \ f" - Absolute Error: {abs_error:.2E}" return diagnostic_msg Edge Cases `find_maximum_errors` skips NaNs entirely; if all values are NaN or only NaN in one file, both maxima return None leading to no diagnostics block. Consider explicit messaging when maxima are unavailable, and verify behavior when candidate is missing entries (currently just continues). def find_maximum_errors(candidate: Pack, golden: Pack) -> typing.Tuple[typing.Optional[typing.Tuple[str, int, float, float, float, float]], typing.Optional[typing.Tuple[str, int, float, float, float, float]]]: """ Scan all files to find the maximum absolute and relative errors. Returns tuple of: - max_abs_info: (filepath, var_index, golden_val, candidate_val, absolute_error, relative_error) - max_rel_info: (filepath, var_index, golden_val, candidate_val, relative_error, absolute_error) """ max_abs_error = -1.0 max_abs_info = None max_rel_error = -1.0 max_rel_info = None for gFilepath, gEntry in golden.entries.items(): cEntry = candidate.find(gFilepath) if cEntry is None: continue for valIndex, (gVal, cVal) in enumerate(zip(gEntry.doubles, cEntry.doubles)): # Skip NaN values in golden or candidate if math.isnan(gVal) or math.isnan(cVal): continue error = compute_error(cVal, gVal) # Track maximum absolute error if error.absolute > max_abs_error: max_abs_error = error.absolute max_abs_info = (gFilepath, valIndex, gVal, cVal, error.absolute, error.relative) # Track maximum relative error (only if it's not NaN) if not math.isnan(error.relative) and error.relative > max_rel_error: max_rel_error = error.relative max_rel_info = (gFilepath, valIndex, gVal, cVal, error.relative, error.absolute) return max_abs_info, max_rel_info

qodo-merge-pro · 2025-08-18T10:51:10Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
High-level	Potential O(N) rescans on failure The new failure path calls find_maximum_errors during the first detected violation, which rescans all files and all variables, potentially making a single failure cost O(total dataset) work. If compare is used frequently or on large packs, this could significantly degrade worst-case performance and violate the “fails fast” intent. Consider precomputing maxima in the main loop (amortized single pass) or gating the full scan behind a configurable flag to preserve performance on large datasets. Examples: toolchain/mfc/packer/tol.py [48-66] def raise_err_with_max_diagnostics(msg: str): # Find maximum errors across ALL files for diagnostics max_errors = find_maximum_errors(candidate, golden) diagnostic_msg = format_diagnostic_message(max_errors) return None, f"""\ Variable n°{valIndex+1} (1-indexed) in {gFilepath} {msg}: - Candidate: {cVal} - Golden: {gVal} - Error: {error} ... (clipped 9 lines) Solution Walkthrough: Before: def compare(candidate, golden, tol): for gFilepath, gEntry in golden.entries.items(): for valIndex, (gVal, cVal) in enumerate(zip(gEntry.doubles, cEntry.doubles)): # ... if not is_close(error, tol): # This triggers a full rescan of all files and variables max_errors = find_maximum_errors(candidate, golden) # ... format and return error return avg_err.get(), None After: def compare(candidate, golden, tol): first_failure = None max_abs_info, max_rel_info = None, None for gFilepath, gEntry in golden.entries.items(): for valIndex, (gVal, cVal) in enumerate(zip(gEntry.doubles, cEntry.doubles)): # ... compute error # ... update max_abs_info and max_rel_info if first_failure is None and not is_close(error, tol): first_failure = (gFilepath, valIndex, ...) if first_failure is not None: # ... format and return error using first_failure and max error info # This avoids a second full scan. return avg_err.get(), None Suggestion importance[1-10]: 9 __ Why: This suggestion correctly identifies a significant performance degradation in the failure case, where a full rescan of the data is triggered, contradicting the "fail fast" goal for performance.	High
General	Handle empty diagnostics case Guard against the case where no comparable values are found, returning a clear note instead of an empty string. This prevents confusing blank diagnostics when all values are NaN or files are missing. toolchain/mfc/packer/tol.py [72-98] def format_diagnostic_message(max_errors: typing.Tuple[typing.Optional[typing.Tuple[str, int, float, float, float, float]], typing.Optional[typing.Tuple[str, int, float, float, float, float]]]) -> str: """Format the diagnostic message showing maximum errors.""" max_abs_info, max_rel_info = max_errors diagnostic_msg = "" if max_abs_info: filepath, var_idx, golden_val, candidate_val, abs_error, rel_error = max_abs_info rel_error_str = f"{rel_error:.2E}" if not math.isnan(rel_error) else "NaN" diagnostic_msg += f"\n\nDiagnostics - Maximum absolute error across ALL files:\n" \ f" - File: {filepath}\n" \ f" - Variable n°{var_idx+1}\n" \ f" - Candidate: {candidate_val}\n" \ f" - Golden: {golden_val}\n" \ f" - Absolute Error: {abs_error:.2E}\n" \ f" - Relative Error: {rel_error_str}" if max_rel_info: filepath, var_idx, golden_val, candidate_val, rel_error, abs_error = max_rel_info diagnostic_msg += f"\n\nDiagnostics - Maximum relative error across ALL files:\n" \ f" - File: {filepath}\n" \ f" - Variable n°{var_idx+1}\n" \ f" - Candidate: {candidate_val}\n" \ f" - Golden: {golden_val}\n" \ f" - Relative Error: {rel_error:.2E}\n" \ f" - Absolute Error: {abs_error:.2E}" + if not diagnostic_msg: + diagnostic_msg = "\n\nDiagnostics: No comparable values found across files." + return diagnostic_msg Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that an empty diagnostic message can be confusing and proposes adding a clear message, which improves the user experience of the error reporting.	Low
General	Use -inf for initial maxima Initialize maxima with negative infinity to avoid incorrect comparisons when actual errors can be negative or when domain assumptions change. This makes the intent explicit and robust across edge cases. toolchain/mfc/packer/tol.py [101-122] def find_maximum_errors(candidate: Pack, golden: Pack) -> typing.Tuple[typing.Optional[typing.Tuple[str, int, float, float, float, float]], typing.Optional[typing.Tuple[str, int, float, float, float, float]]]: ... - max_abs_error = -1.0 + max_abs_error = float("-inf") max_abs_info = None - max_rel_error = -1.0 + max_rel_error = float("-inf") max_rel_info = None ... for valIndex, (gVal, cVal) in enumerate(zip(gEntry.doubles, cEntry.doubles)): # Skip NaN values in golden or candidate if math.isnan(gVal) or math.isnan(cVal): continue ... Suggestion importance[1-10]: 5 __ Why: This is a good practice suggestion that improves code robustness by initializing maxima with `float("-inf")`, which correctly handles all possible float values for errors.	Low
Update

sbryngelson · 2025-08-18T12:48:13Z

just replace the existing error number with the new one?

Copilot

Pull Request Overview

This PR enhances the tolerance validation system in the MFC test suite by adding comprehensive error diagnostics when tolerance checks fail. The enhancement maintains the existing fast-fail behavior for performance while providing detailed analysis of numerical errors across all test files when failures occur.

Key changes:

Enhanced error reporting with maximum error analysis across all files
Added comprehensive diagnostic information including worst-case absolute and relative errors
Maintained fast-fail performance for passing tests

Copilot · 2025-08-18T12:54:12Z

toolchain/mfc/packer/tol.py


    # Return the average relative error
    return avg_err.get(), None
+
+
+def format_diagnostic_message(max_errors: typing.Tuple[typing.Optional[typing.Tuple[str, int, float, float, float, float]], typing.Optional[typing.Tuple[str, int, float, float, float, float]]]) -> str:


The function signature uses deeply nested tuple types that are difficult to read and maintain. Consider defining named tuple types or dataclasses for the error information to improve code clarity.

Copilot · 2025-08-18T12:54:12Z

toolchain/mfc/packer/tol.py

+    Scan all files to find the maximum absolute and relative errors.
+    Returns tuple of:
+    - max_abs_info: (filepath, var_index, golden_val, candidate_val, absolute_error, relative_error)
+    - max_rel_info: (filepath, var_index, golden_val, candidate_val, relative_error, absolute_error)


The return type annotation uses the same deeply nested tuple structure as format_diagnostic_message. This complex type signature makes the code harder to understand and maintain. Consider using named tuples or dataclasses.

Suggested change

- max_rel_info: (filepath, var_index, golden_val, candidate_val, relative_error, absolute_error)

def format_diagnostic_message(max_errors: Tuple[Optional[ErrorInfo], Optional[ErrorInfo]]) -> str:

"""Format the diagnostic message showing maximum errors."""

max_abs_info, max_rel_info = max_errors

diagnostic_msg = ""

if max_abs_info:

rel_error_str = f"{max_abs_info.relative_error:.2E}" if not math.isnan(max_abs_info.relative_error) else "NaN"

diagnostic_msg += f"\n\nDiagnostics - Maximum absolute error across ALL files:\n" \

f" - File: {max_abs_info.filepath}\n" \

f" - Variable n°{max_abs_info.var_index+1}\n" \

f" - Candidate: {max_abs_info.candidate_val}\n" \

f" - Golden: {max_abs_info.golden_val}\n" \

f" - Absolute Error: {max_abs_info.absolute_error:.2E}\n" \

f" - Relative Error: {rel_error_str}"

if max_rel_info:

diagnostic_msg += f"\n\nDiagnostics - Maximum relative error across ALL files:\n" \

f" - File: {max_rel_info.filepath}\n" \

f" - Variable n°{max_rel_info.var_index+1}\n" \

f" - Candidate: {max_rel_info.candidate_val}\n" \

f" - Golden: {max_rel_info.golden_val}\n" \

f" - Relative Error: {max_rel_info.relative_error:.2E}\n" \

f" - Absolute Error: {max_rel_info.absolute_error:.2E}"

return diagnostic_msg

def find_maximum_errors(candidate: Pack, golden: Pack) -> Tuple[Optional[ErrorInfo], Optional[ErrorInfo]]:

"""

Scan all files to find the maximum absolute and relative errors.

Returns tuple of:

- max_abs_info: ErrorInfo or None

- max_rel_info: ErrorInfo or None

Copilot · 2025-08-18T12:54:13Z

toolchain/mfc/packer/tol.py

+    max_abs_error = -1.0
+    max_abs_info = None
+
+    max_rel_error = -1.0


Using -1.0 as an initial value for maximum error tracking is unclear and could be problematic if all errors are actually negative (though unlikely in this context). Consider using float('-inf') or None with explicit checks for better clarity.

Suggested change

max_rel_error = -1.0

max_abs_error = float('-inf')

max_abs_info = None

max_rel_error = float('-inf')

Copilot · 2025-08-18T12:54:13Z

toolchain/mfc/packer/tol.py

+    max_abs_error = -1.0
+    max_abs_info = None
+
+    max_rel_error = -1.0


Same issue as with max_abs_error - using -1.0 as initial value is unclear. Consider using float('-inf') or None with explicit checks for better clarity.

Suggested change

max_rel_error = -1.0

max_abs_error = float('-inf')

max_abs_info = None

max_rel_error = float('-inf')

codecov · 2025-08-18T15:11:22Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.93%. Comparing base (88c0a11) to head (34c618a).

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #988   +/-   ##
=======================================
  Coverage   40.93%   40.93%           
=======================================
  Files          70       70           
  Lines       20288    20288           
  Branches     2517     2517           
=======================================
  Hits         8305     8305           
  Misses      10447    10447           
  Partials     1536     1536

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Dimitrios Adam added 8 commits August 18, 2025 00:14

Print Maximum Absolute and Relative Error

e31d6a0

Small Errors

93053e1

Whitespace

8f9793f

Whitespace ..

a84e311

Reduce number of variables in Function

c27c941

White Space

c817dfe

White Space2

9123f8b

White Space3

35fbc7e

DimAdam-01 requested a review from a team as a code owner August 18, 2025 10:50

qodo-merge-pro bot added the Review effort 3/5 label Aug 18, 2025

sbryngelson requested a review from Copilot August 18, 2025 12:49

Update .pr_agent.toml

dd38696

sbryngelson self-requested a review as a code owner August 18, 2025 12:50

Copilot AI reviewed Aug 18, 2025

View reviewed changes

Dimitrios Adam added 3 commits August 19, 2025 09:17

Small Changes

6329b04

Small Changes

a7f38f2

Small Changes2

34c618a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved error reporting for failed tolerance checks #988

Improved error reporting for failed tolerance checks #988

Uh oh!

DimAdam-01 commented Aug 18, 2025 •

edited

Loading

Uh oh!

qodo-merge-pro bot commented Aug 18, 2025

Uh oh!

qodo-merge-pro bot commented Aug 18, 2025 •

edited

Loading

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

sbryngelson commented Aug 18, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 18, 2025

Uh oh!

Copilot AI Aug 18, 2025

Uh oh!

Copilot AI Aug 18, 2025

Uh oh!

Copilot AI Aug 18, 2025

Uh oh!

codecov bot commented Aug 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

-    - max_rel_info: (filepath, var_index, golden_val, candidate_val, relative_error, absolute_error)
+def format_diagnostic_message(max_errors: Tuple[Optional[ErrorInfo], Optional[ErrorInfo]]) -> str:
+    """Format the diagnostic message showing maximum errors."""
+    max_abs_info, max_rel_info = max_errors
+    diagnostic_msg = ""
+    if max_abs_info:
+        rel_error_str = f"{max_abs_info.relative_error:.2E}" if not math.isnan(max_abs_info.relative_error) else "NaN"
+        diagnostic_msg += f"\n\nDiagnostics - Maximum absolute error across ALL files:\n" \
+                         f" - File: {max_abs_info.filepath}\n" \
+                         f" - Variable n°{max_abs_info.var_index+1}\n" \
+                         f" - Candidate: {max_abs_info.candidate_val}\n" \
+                         f" - Golden: {max_abs_info.golden_val}\n" \
+                         f" - Absolute Error: {max_abs_info.absolute_error:.2E}\n" \
+                         f" - Relative Error: {rel_error_str}"
+    if max_rel_info:
+        diagnostic_msg += f"\n\nDiagnostics - Maximum relative error across ALL files:\n" \
+                         f" - File: {max_rel_info.filepath}\n" \
+                         f" - Variable n°{max_rel_info.var_index+1}\n" \
+                         f" - Candidate: {max_rel_info.candidate_val}\n" \
+                         f" - Golden: {max_rel_info.golden_val}\n" \
+                         f" - Relative Error: {max_rel_info.relative_error:.2E}\n" \
+                         f" - Absolute Error: {max_rel_info.absolute_error:.2E}"
+    return diagnostic_msg
+def find_maximum_errors(candidate: Pack, golden: Pack) -> Tuple[Optional[ErrorInfo], Optional[ErrorInfo]]:
+    """
+    Scan all files to find the maximum absolute and relative errors.
+    Returns tuple of:
+    - max_abs_info: ErrorInfo or None
+    - max_rel_info: ErrorInfo or None

Improved error reporting for failed tolerance checks #988

Are you sure you want to change the base?

Improved error reporting for failed tolerance checks #988

Uh oh!

Conversation

DimAdam-01 commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Description

Type of change

Scope

How Has This Been Tested?

Checklist

If your code changes any code source files (anything in src/simulation)

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-merge-pro bot commented Aug 18, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-pro bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

sbryngelson commented Aug 18, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

DimAdam-01 commented Aug 18, 2025 •

edited

Loading

If your code changes any code source files (anything in `src/simulation`)

qodo-merge-pro bot commented Aug 18, 2025 •

edited

Loading

codecov bot commented Aug 18, 2025 •

edited

Loading