fix(c++): fix NULL type in custom op #4889

iProzd · 2025-08-14T13:26:07Z

Replaces usage of lmp_list send/recv arrays with new vectors that map indices using fwd_map and synchronize counts via MPI. Updates tensor construction to use these new vectors, improving correctness and flexibility in distributed communication.

Summary by CodeRabbit

Refactor
- Reworked MPI-backed message passing for distributed runs, improving scalability, stability, and consistency without changing the public interface.
Bug Fixes
- Prevented errors from invalid or mismatched send indices by remapping/discarding them and correcting receive counts and ordering.
- Improved behavior when an MPI world/communicator is unavailable to avoid failures during distributed execution.

Replaces usage of lmp_list send/recv arrays with new vectors that map indices using fwd_map and synchronize counts via MPI. Updates tensor construction to use these new vectors, improving correctness and flexibility in distributed communication.

for more information, see https://pre-commit.ci

coderabbitai · 2025-08-14T13:33:00Z

📝 Walkthrough

Walkthrough

Implements MPI-gated remapped message-passing in DeepPotPT::compute: introduces new send/recv count and list arrays, maps indices via fwd_map, exchanges recv counts via MPI_Sendrecv (TAG_BASE 0x7a31) when world exists, computes prefix sums, rebuilds Torch tensors from new arrays, updates comm_dict, and conditionally includes MPI headers.

Changes

Cohort / File(s)	Summary
MPI remapped communication path `source/api_cc/src/DeepPotPT.cc`	- Conditionally includes `mpi.h` and `mpi-ext.h` under USE_MPI/MPI_FOUND. - Adds remapped arrays: `sendnum_new`, `sendlist_new`, `recvnum_new`, `firstrecv_new` built via `fwd_map` and compaction. - Uses `MPI_Sendrecv` (TAG_BASE=0x7a31) to obtain `recvnum_new` when `lmp_list.world` exists; mirrors send counts otherwise. - Computes `firstrecv_new` as prefix sum of `recvnum_new`. - Rebuilds tensors: `firstrecv_tensor`, `recvnum_tensor`, `sendnum_tensor`, `sendlist_tensor` from new arrays; uses `static_cast<long>` for sizes. - Populates `comm_dict` with updated tensors: `"send_list"`, `"send_proc"`, `"recv_proc"`, `"send_num"`, `"recv_num"`, `"communicator"`. - Comments out prior usage of `lmp_list.firstrecv/recvnum/sendnum` and blob-based `sendlist` construction. - No public API signature changes; minor structural/formatting updates.

Sequence Diagram(s)

sequenceDiagram
  participant DP as DeepPotPT::compute
  participant MP as MPI World (lmp_list.world)
  participant Map as fwd_map
  participant T as Torch Tensors

  DP->>Map: Map old send indices -> forwarded indices
  Map-->>DP: sendlist_new, sendnum_new (invalids dropped)

  alt world exists
    DP->>MP: MPI_Sendrecv(sendnum_new) [TAG_BASE=0x7a31]
    MP-->>DP: recvnum_new
  else no world
    DP-->>DP: recvnum_new = sendnum_new
  end

  DP-->>DP: firstrecv_new = prefix_sum(recvnum_new)
  DP->>T: Build tensors (firstrecv, recvnum, sendnum, sendlist)
  DP-->>DP: Update comm_dict with new tensors

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

source/api_cc/src/DeepPotPT.cc (2)
251-257: Remove stale commented-out code

Dead commented code obscures the current data path and makes maintenance harder.

Apply this diff:
-//      torch::Tensor firstrecv_tensor =
-//          torch::from_blob(lmp_list.firstrecv, {nswap}, int32_option);
-//      torch::Tensor recvnum_tensor =
-//          torch::from_blob(lmp_list.recvnum, {nswap}, int32_option);
-//      torch::Tensor sendnum_tensor =
-//          torch::from_blob(lmp_list.sendnum, {nswap}, int32_option);
266-269: Remove redundant commented-out legacy code

Same reasoning; commented legacy path is preserved in git history.

Apply this diff:
-//      int total_send =
-//          std::accumulate(lmp_list.sendnum, lmp_list.sendnum + nswap, 0);
-//      torch::Tensor sendlist_tensor =
-//          torch::from_blob(lmp_list.sendlist, {total_send}, int32_option);

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between accc331 and f99ad4d.

📒 Files selected for processing (1)

source/api_cc/src/DeepPotPT.cc (3 hunks)

🔇 Additional comments (2)

source/api_cc/src/DeepPotPT.cc (2)

185-204: Remapping logic LGTM

Correctly rebuilds per-swap send counts and a dense send list using fwd_map with bounds checks and filtering. Reserving capacity via the accumulated legacy counts is a good optimization.

226-232: firstrecv_new is unused and not required — original comment is incorrect

Short: deepmd/pt/model/descriptor/repflows.py builds comm_dict and calls torch.ops.deepmd.border_op with send_list, send_proc, recv_proc, send_num, recv_num, communicator (no first_recv). The computed firstrecv_new/firstrecv_tensor in the PT wrappers is dead code — remove it or document why it is kept.

Files to update:

source/api_cc/src/DeepPotPT.cc

Remove the firstrecv_new prefix-sum computation (around lines 226–231) and the unused firstrecv_tensor creation (around line 238).

source/api_cc/src/DeepSpinPT.cc

Same pattern: firstrecv_tensor is created around lines 187–191 but never used/inserted.

Suggested change (remove unused code) — example diff for DeepPotPT.cc:
@@
    std::vector<int> firstrecv_new(nswap, 0);
     int acc = 0;
     for (int s = 0; s < nswap; ++s) {
       firstrecv_new[s] = acc;
       acc += recvnum_new[s];
     }
    /* firstrecv computation removed — not used by border_op */
@@
   torch::Tensor firstrecv_tensor =
 torch::from_blob(firstrecv_new.data(), {nswap}, int32_option).clone();
   /* firstrecv tensor omitted — border_op expects recv_num, not first_recv */
If you prefer to keep the computation for clarity, add a short comment explaining it's intentionally unused.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-08-14T13:33:03Z