-
Notifications
You must be signed in to change notification settings - Fork 569
feat: handle masked forces in test #4893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
feat: handle masked forces in test #4893
Conversation
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #4893 +/- ##
=======================================
Coverage 84.29% 84.29%
=======================================
Files 702 702
Lines 68664 68677 +13
Branches 3572 3572
=======================================
+ Hits 57883 57894 +11
- Misses 9641 9643 +2
Partials 1140 1140 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
📝 WalkthroughWalkthroughAdds optional per-atom weighting for force error metrics in deepmd/entrypoints/test.py using atom_pref and find_atom_pref. Updates returned MAE/RMSE tuples to reflect weighted sample size. Introduces a new PyTorch test validating masked-atom force metrics end-to-end against dp_test_ener. Changes
Sequence Diagram(s)sequenceDiagram
participant Tester as dp_test_ener
participant Data as test_data
participant Metrics as MetricCalc
Tester->>Data: get(force, test_forces, atom_pref, find_atom_pref)
alt find_atom_pref == 1
Tester->>Metrics: compute weighted diff_f = (force - test_forces) * atom_pref
Metrics-->>Tester: mae_f, rmse_f, size_f = sum(atom_pref)
else
Tester->>Metrics: compute unweighted diff_f = (force - test_forces)
Metrics-->>Tester: mae_f, rmse_f, size_f = force.size
end
Tester-->>Tester: package results with (value, size_f)
sequenceDiagram
participant Test as TestDPTestForceMask
participant Prep as _prepare_masked_system
participant Trainer as Trainer/Script
participant Eval as DeepEval
participant Loader as DeepmdData
Test->>Prep: duplicate data, set atom_pref mask
Test->>Trainer: train & script model, save/load
Test->>Loader: load masked test set
Test->>Eval: run dp_test_ener on test data
Eval-->>Test: returns mae_f, rmse_f with size_f
Test-->>Test: independently compute masked MAE/RMSE
Test-->>Test: assert equality with dp_test_ener
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
deepmd/entrypoints/test.py (1)
421-435
: Weighted force MAE/RMSE look correct; add a shape guard for robustnessThe weighting logic (sum of abs residuals / sum of weights and sqrt(weighted MSE)) is sound, and size_f=sum(weights) makes the system-level aggregation consistent. To tolerate atom_pref provided as (nframes, natoms) without depending on repeat=3 at load time, add a defensive broadcast/validation.
- if find_atom_pref == 1: - atom_pref = test_data["atom_pref"][:numb_test] - diff_f = diff_f * atom_pref + if find_atom_pref == 1: + atom_pref = test_data["atom_pref"][:numb_test] + # Accept both (nframes, natoms) and (nframes, natoms*3) + if atom_pref.shape != diff_f.shape: + if atom_pref.shape[1] * 3 == diff_f.shape[1]: + atom_pref = np.repeat(atom_pref, 3, axis=1) + else: + raise ValueError( + f"atom_pref shape {atom_pref.shape} is incompatible with force shape {diff_f.shape}" + ) + diff_f = diff_f * atom_pref size_f = np.sum(atom_pref) if size_f > 0: mae_f = np.sum(np.abs(diff_f)) / size_f rmse_f = np.sqrt(np.sum(diff_f * diff_f) / size_f) else: mae_f = 0.0 rmse_f = 0.0If desired, we can also factor this into small helpers (weighted_mae/weighted_rmse) to keep this block concise.
source/tests/pt/test_dp_test.py (2)
168-181
: Mask preparation is correct; tiny robustness note about copytree targetCopying the sample system into a mkdtemp dir and writing atom_pref.npy as (nframes, natoms) is consistent with repeat=3 in the loader. Setting the last atom’s weight to 0 validates masking well. One minor robustness nit: shutil.copytree(..., dirs_exist_ok=True) is fine here, but if src ever contains symlinks or permissions need preserving, consider copy_function/copy2. Not required for this test data.
182-230
: Good end-to-end assertions; add tolerances and verify the reported sample sizeThe comparison path is sound: run dp_test_ener, independently compute the masked MAE/RMSE from dp.eval outputs, and compare. To avoid flaky failures from float32 arithmetic and to verify the new size_f behavior, consider:
- Use explicit tolerances in assert_allclose.
- Assert that the returned sizes equal the mask sum.
- np.testing.assert_allclose(err["mae_f"][0], mae_expected) - np.testing.assert_allclose(err["rmse_f"][0], rmse_expected) + np.testing.assert_allclose(err["mae_f"][0], mae_expected, rtol=1e-6, atol=1e-8) + np.testing.assert_allclose(err["rmse_f"][0], rmse_expected, rtol=1e-6, atol=1e-8) + # Also verify the effective sample size matches the mask sum + np.testing.assert_equal(err["mae_f"][1], denom) + np.testing.assert_equal(err["rmse_f"][1], denom)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
deepmd/entrypoints/test.py
(4 hunks)source/tests/pt/test_dp_test.py
(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
deepmd/entrypoints/test.py (4)
deepmd/utils/data_system.py (1)
add
(342-395)deepmd/utils/data.py (1)
add
(136-189)deepmd/pt/utils/stat.py (1)
rmse
(525-526)deepmd/dpmodel/output_def.py (1)
size
(230-231)
🪛 Ruff (0.12.2)
source/tests/pt/test_dp_test.py
187-187: Use a context manager for opening files
(SIM115)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: Analyze (python)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C library (2.14, >=2.5.0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test C++ (false)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test C++ (true)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (4, 3.9)
🔇 Additional comments (6)
deepmd/entrypoints/test.py (3)
294-295
: Correctly declares atom-level preference mask aligned with force componentsAdding atom_pref as an atomic item with repeat=3 matches the flattened (natoms*3) force layout. Keeping must=False ensures backward compatibility (feature enabled only when the file is present).
317-317
: Presence flag retrieval is consistent with existing patternUsing test_data.get("find_atom_pref") mirrors the existing find_* flags and safely disables weighting when absent.
470-471
: Return sizes reflect the effective weighted sample sizeReporting (value, size_f) for mae_f/rmse_f is the right choice for weighted aggregation across systems. No change needed.
source/tests/pt/test_dp_test.py (3)
18-21
: Imports for dp_test_ener/DeepEval/DeepmdData are appropriateThese imports are minimal and scoped to the new test’s needs.
Also applies to: 28-30
150-167
: Solid setup for masked-force test systemBootstrapping a tiny training config and pointing both training/validation to the prepared system keeps the test self-contained. No issues spotted.
231-234
: Clean teardownDelegating to the base teardown and removing the temp system directory prevents test artifacts from leaking. Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for masked forces in testing by implementing per-atom weighting for force error metrics. When atom preferences (masks) are provided, the evaluation computes weighted MAE/RMSE that only consider selected atoms, with graceful handling of zero-weight cases.
Key changes:
- Modified force error calculation to apply atom-level masking when
atom_pref
data is available - Added comprehensive test coverage for masked force evaluation scenarios
- Updated error metric reporting to reflect the actual number of atoms considered in calculations
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
deepmd/entrypoints/test.py | Implements masked force error calculation with weighted MAE/RMSE computation |
source/tests/pt/test_dp_test.py | Adds end-to-end test validating masked force metrics against expected calculations |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR assumes the atom_pref
is the mask, but it may not as it is not limited to 0 and 1. One may set it to something like 1 1 1 1 5 1 1 1
to let some atom has larger weights.
mae_f = mae(diff_f) | ||
rmse_f = rmse(diff_f) | ||
if find_atom_pref == 1: | ||
atom_pref = test_data["atom_pref"][:numb_test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atom_pref = test_data["atom_pref"][:numb_test] | |
atom_pref = test_data["atom_pref"][:numb_test].astype(bool) |
@njzjz Is this what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean atom_pref
was not designed (when it was proposed in #41) as the mask. One may use it for other purposes and doesn't expect to see a masked result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps the solution is providing both original results and the results multiply atom_pref
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t see any reason why the current implementation cannot handle arbitrary weights. The only question is whether the weights should be considered when calculating the error; my intuition is that they should not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in that case, we can simply convert them to bool type.
Summary by CodeRabbit
New Features
Tests