feat: handle masked forces in test #4893

caic99 · 2025-08-15T08:17:00Z

Summary by CodeRabbit

New Features
- Added per-atom weighting (masking) for force error metrics in evaluation. When provided, weighted MAE/RMSE are computed and reported, reflecting only the selected atoms. Handles zero-weight cases gracefully.
Tests
- Introduced end-to-end validation ensuring masked-atom force metrics align with evaluator outputs, improving confidence in results.

for more information, see https://pre-commit.ci

codecov · 2025-08-15T09:13:55Z

Codecov Report

❌ Patch coverage is 87.50000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.29%. Comparing base (dc99ba2) to head (13d43e2).
⚠️ Report is 1 commits behind head on devel.

Files with missing lines	Patch %	Lines
deepmd/entrypoints/test.py	87.50%	2 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##            devel    #4893   +/-   ##
=======================================
  Coverage   84.29%   84.29%           
=======================================
  Files         702      702           
  Lines       68664    68677   +13     
  Branches     3572     3572           
=======================================
+ Hits        57883    57894   +11     
- Misses       9641     9643    +2     
  Partials     1140     1140

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai · 2025-08-18T02:35:07Z

📝 Walkthrough

Walkthrough

Adds optional per-atom weighting for force error metrics in deepmd/entrypoints/test.py using atom_pref and find_atom_pref. Updates returned MAE/RMSE tuples to reflect weighted sample size. Introduces a new PyTorch test validating masked-atom force metrics end-to-end against dp_test_ener.

Changes

Cohort / File(s)	Summary of Changes
Weighted force metrics in dp_test_ener `deepmd/entrypoints/test.py`	Adds atom_pref data field and find_atom_pref flag. Computes weighted force residuals, MAE, RMSE, and effective sample size (size_f). Updates returned metric tuples to use size_f; falls back to unweighted when disabled.
Force-mask test workflow `source/tests/pt/test_dp_test.py`	Adds TestDPTestForceMask with setup/teardown. Prepares masked system (atom_pref), runs model eval with DeepEval/DeepmdData, compares computed masked MAE/RMSE with dp_test_ener outputs. Saves/loads scripted model for the test.

Sequence Diagram(s)

sequenceDiagram
  participant Tester as dp_test_ener
  participant Data as test_data
  participant Metrics as MetricCalc

  Tester->>Data: get(force, test_forces, atom_pref, find_atom_pref)
  alt find_atom_pref == 1
    Tester->>Metrics: compute weighted diff_f = (force - test_forces) * atom_pref
    Metrics-->>Tester: mae_f, rmse_f, size_f = sum(atom_pref)
  else
    Tester->>Metrics: compute unweighted diff_f = (force - test_forces)
    Metrics-->>Tester: mae_f, rmse_f, size_f = force.size
  end
  Tester-->>Tester: package results with (value, size_f)

sequenceDiagram
  participant Test as TestDPTestForceMask
  participant Prep as _prepare_masked_system
  participant Trainer as Trainer/Script
  participant Eval as DeepEval
  participant Loader as DeepmdData

  Test->>Prep: duplicate data, set atom_pref mask
  Test->>Trainer: train & script model, save/load
  Test->>Loader: load masked test set
  Test->>Eval: run dp_test_ener on test data
  Eval-->>Test: returns mae_f, rmse_f with size_f
  Test-->>Test: independently compute masked MAE/RMSE
  Test-->>Test: assert equality with dp_test_ener

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

(fix)Make the weighted avarange fit for all kinds of systems #4593 — Also modifies test_ener to adjust conditional reporting and weighted force/error handling, closely aligned with this PR’s atom_pref weighting logic.

Suggested reviewers

iProzd
njzjz

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

deepmd/entrypoints/test.py (1)
421-435: Weighted force MAE/RMSE look correct; add a shape guard for robustness

The weighting logic (sum of abs residuals / sum of weights and sqrt(weighted MSE)) is sound, and size_f=sum(weights) makes the system-level aggregation consistent. To tolerate atom_pref provided as (nframes, natoms) without depending on repeat=3 at load time, add a defensive broadcast/validation.
-    if find_atom_pref == 1:
-        atom_pref = test_data["atom_pref"][:numb_test]
-        diff_f = diff_f * atom_pref
+    if find_atom_pref == 1:
+        atom_pref = test_data["atom_pref"][:numb_test]
+        # Accept both (nframes, natoms) and (nframes, natoms*3)
+        if atom_pref.shape != diff_f.shape:
+            if atom_pref.shape[1] * 3 == diff_f.shape[1]:
+                atom_pref = np.repeat(atom_pref, 3, axis=1)
+            else:
+                raise ValueError(
+                    f"atom_pref shape {atom_pref.shape} is incompatible with force shape {diff_f.shape}"
+                )
+        diff_f = diff_f * atom_pref
         size_f = np.sum(atom_pref)
         if size_f > 0:
             mae_f = np.sum(np.abs(diff_f)) / size_f
             rmse_f = np.sqrt(np.sum(diff_f * diff_f) / size_f)
         else:
             mae_f = 0.0
             rmse_f = 0.0
If desired, we can also factor this into small helpers (weighted_mae/weighted_rmse) to keep this block concise.
source/tests/pt/test_dp_test.py (2)
168-181: Mask preparation is correct; tiny robustness note about copytree target

Copying the sample system into a mkdtemp dir and writing atom_pref.npy as (nframes, natoms) is consistent with repeat=3 in the loader. Setting the last atom’s weight to 0 validates masking well. One minor robustness nit: shutil.copytree(..., dirs_exist_ok=True) is fine here, but if src ever contains symlinks or permissions need preserving, consider copy_function/copy2. Not required for this test data.

182-230: Good end-to-end assertions; add tolerances and verify the reported sample size

The comparison path is sound: run dp_test_ener, independently compute the masked MAE/RMSE from dp.eval outputs, and compare. To avoid flaky failures from float32 arithmetic and to verify the new size_f behavior, consider:

Use explicit tolerances in assert_allclose.

Assert that the returned sizes equal the mask sum.
-        np.testing.assert_allclose(err["mae_f"][0], mae_expected)
-        np.testing.assert_allclose(err["rmse_f"][0], rmse_expected)
+        np.testing.assert_allclose(err["mae_f"][0], mae_expected, rtol=1e-6, atol=1e-8)
+        np.testing.assert_allclose(err["rmse_f"][0], rmse_expected, rtol=1e-6, atol=1e-8)
+        # Also verify the effective sample size matches the mask sum
+        np.testing.assert_equal(err["mae_f"][1], denom)
+        np.testing.assert_equal(err["rmse_f"][1], denom)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between dc99ba2 and 13d43e2.

📒 Files selected for processing (2)

deepmd/entrypoints/test.py (4 hunks)
source/tests/pt/test_dp_test.py (2 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

deepmd/entrypoints/test.py (4)

deepmd/utils/data_system.py (1)

add (342-395)

deepmd/utils/data.py (1)

add (136-189)

deepmd/pt/utils/stat.py (1)

rmse (525-526)

deepmd/dpmodel/output_def.py (1)

size (230-231)

🪛 Ruff (0.12.2)

source/tests/pt/test_dp_test.py

187-187: Use a context manager for opening files

(SIM115)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)

GitHub Check: Analyze (python)
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Analyze (c-cpp)
GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Test C++ (false)
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Test C++ (true)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (6, 3.12)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (4, 3.9)

🔇 Additional comments (6)

deepmd/entrypoints/test.py (3)

294-295: Correctly declares atom-level preference mask aligned with force components

Adding atom_pref as an atomic item with repeat=3 matches the flattened (natoms*3) force layout. Keeping must=False ensures backward compatibility (feature enabled only when the file is present).

317-317: Presence flag retrieval is consistent with existing pattern

Using test_data.get("find_atom_pref") mirrors the existing find_* flags and safely disables weighting when absent.

470-471: Return sizes reflect the effective weighted sample size

Reporting (value, size_f) for mae_f/rmse_f is the right choice for weighted aggregation across systems. No change needed.

source/tests/pt/test_dp_test.py (3)

18-21: Imports for dp_test_ener/DeepEval/DeepmdData are appropriate

These imports are minimal and scoped to the new test’s needs.

Also applies to: 28-30

150-167: Solid setup for masked-force test system

Bootstrapping a tiny training config and pointing both training/validation to the prepared system keeps the test self-contained. No issues spotted.

231-234: Clean teardown

Delegating to the base teardown and removing the temp system directory prevents test artifacts from leaking. Looks good.

anyangml

LGTM

Copilot

Pull Request Overview

This PR adds support for masked forces in testing by implementing per-atom weighting for force error metrics. When atom preferences (masks) are provided, the evaluation computes weighted MAE/RMSE that only consider selected atoms, with graceful handling of zero-weight cases.

Key changes:

Modified force error calculation to apply atom-level masking when atom_pref data is available
Added comprehensive test coverage for masked force evaluation scenarios
Updated error metric reporting to reflect the actual number of atoms considered in calculations

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
deepmd/entrypoints/test.py	Implements masked force error calculation with weighted MAE/RMSE computation
source/tests/pt/test_dp_test.py	Adds end-to-end test validating masked force metrics against expected calculations

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

deepmd/entrypoints/test.py

source/tests/pt/test_dp_test.py

njzjz

This PR assumes the atom_pref is the mask, but it may not as it is not limited to 0 and 1. One may set it to something like 1 1 1 1 5 1 1 1 to let some atom has larger weights.

caic99 · 2025-08-19T10:12:23Z

deepmd/entrypoints/test.py

-    mae_f = mae(diff_f)
-    rmse_f = rmse(diff_f)
+    if find_atom_pref == 1:
+        atom_pref = test_data["atom_pref"][:numb_test]


Suggested change

atom_pref = test_data["atom_pref"][:numb_test]

atom_pref = test_data["atom_pref"][:numb_test].astype(bool)

@njzjz Is this what you mean?

No, I mean atom_pref was not designed (when it was proposed in #41) as the mask. One may use it for other purposes and doesn't expect to see a masked result.

perhaps the solution is providing both original results and the results multiply atom_pref

I don’t see any reason why the current implementation cannot handle arbitrary weights. The only question is whether the weights should be considered when calculating the error; my intuition is that they should not.

And in that case, we can simply convert them to bool type.

feat: handle masked forces in test

41fb1b9

caic99 requested a review from anyangml August 15, 2025 08:17

github-actions bot added the Python label Aug 15, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

3e1401d

for more information, see https://pre-commit.ci

caic99 marked this pull request as ready for review August 18, 2025 02:26

Copilot AI review requested due to automatic review settings August 18, 2025 02:26

Merge branch 'devel' into codex/update-test_ener-to-handle-masked-forces

13d43e2

This comment was marked as outdated.

Sign in to view

coderabbitai bot reviewed Aug 18, 2025

View reviewed changes

anyangml requested a review from iProzd August 18, 2025 03:00

anyangml approved these changes Aug 19, 2025

View reviewed changes

caic99 requested a review from Copilot August 19, 2025 05:42

Copilot AI reviewed Aug 19, 2025

View reviewed changes

deepmd/entrypoints/test.py Show resolved Hide resolved

source/tests/pt/test_dp_test.py Show resolved Hide resolved

njzjz requested changes Aug 19, 2025

View reviewed changes

caic99 commented Aug 19, 2025

View reviewed changes

caic99 requested a review from njzjz August 22, 2025 03:04

	atom_pref = test_data["atom_pref"][:numb_test]
	atom_pref = test_data["atom_pref"][:numb_test].astype(bool)

feat: handle masked forces in test #4893

Are you sure you want to change the base?

feat: handle masked forces in test #4893

Uh oh!

Conversation

caic99 commented Aug 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

codecov bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

coderabbitai bot commented Aug 18, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

anyangml left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

njzjz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

caic99 Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

njzjz Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

njzjz Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

anyangml Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

anyangml Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

caic99 commented Aug 15, 2025 •

edited by coderabbitai bot

Loading

codecov bot commented Aug 15, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

njzjz left a comment •

edited

Loading