Fix cuda overflow by lucylq · Pull Request #19487 · pytorch/executorch

lucylq · 2026-05-11T23:01:47Z

Update copy_ method to check

this->sizes() == other.sizes()
this->dim() == other.dim()
Use overflow-safe arithmetic.

pytorch-bot · 2026-05-11T23:01:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19487

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

❌ 1 Pending, 1 Unclassified Failure

As of commit ce6fc1b with merge base 23a91d5 ():

UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:

Check Labels / Check labels (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: GraphQL query

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-05-11T23:02:32Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

lucylq · 2026-05-11T23:20:35Z

@claude review

claude · 2026-05-11T23:21:16Z

Claude finished @lucylq's task in 1m 6s —— View job

Code Review: Fix cuda overflow

Gather context and read the changed file
Analyze the diff for correctness, security, and style
Post review feedback

Overall: This is a solid security hardening PR. The changes add proper dimension/size validation and replace unchecked arithmetic with overflow-safe operations in the non-contiguous copy_ path. The approach is correct and well-structured.

Strengths

Dimension and size checks (lines 443-449): Good defense-in-depth. Checking dim() and sizes() equality before the existing numel() check catches shape mismatches that happen to have the same total element count (e.g., [2,3] vs [3,2]).
Overflow-safe offset computation (lines 474-484): Using c10::mul_overflows and c10::add_overflows with early-exit on overflow is the right pattern. The accumulation loop correctly builds both src and dst offsets simultaneously.
Byte offset overflow check (lines 488-498): Good catch — the element-to-byte conversion (offset * elem_size) is another multiplication that can overflow, and the non-negativity assertion on src_offset/dst_offset prevents signed-to-unsigned conversion issues.

Minor observations

Redundant numel() check: With dim() and sizes() equality now enforced, the numel() check on line 451 is logically redundant (equal sizes implies equal numel). It's fine to keep as a belt-and-suspenders assertion, but worth noting it can never fire if the prior checks pass.
compute_numel in size_util.h is still unchecked (slim/util/size_util.h:23-29): The refresh_numel() path calls compute_numel which does n *= s without overflow protection. If an attacker can control sizes passed to set_sizes_and_strides, the numel could overflow before copy_ is ever called. This is out of scope for this PR but worth a follow-up.
nbytes() on line 101: numel() * itemsize() is also a potential overflow site for the contiguous fast path (storage_->copy_(this->data_ptr(), other.data_ptr(), other.nbytes(), ...)). Again out of scope, but relevant to the broader hardening effort.

Verdict

LGTM — the changes are correct, focused, and improve the security posture of the non-contiguous copy path. The new checks are properly ordered (cheapest first) and the overflow detection uses battle-tested compiler builtins via c10::mul_overflows/c10::add_overflows.

Copilot

Pull request overview

This PR hardens SlimTensor::copy_ in the AOTI slim runtime by enforcing stricter shape compatibility and making strided offset/byte computations overflow-safe (particularly relevant for CUDA pointer arithmetic).

Changes:

Added precondition checks that dim() and sizes() match before copying.
Reworked strided element-wise copy offset computation to use c10::{mul,add}_overflows and to compute byte offsets explicitly.
Updated CPU/CUDA memcpy call sites to use the computed byte offsets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+      ET_CHECK_MSG(
+          src_offset >= 0 && dst_offset >= 0 &&
+              !c10::mul_overflows(
+                  static_cast<size_t>(src_offset),
+                  elem_size,
+                  &src_byte_offset) &&
+              !c10::mul_overflows(
+                  static_cast<size_t>(dst_offset), elem_size, &dst_byte_offset),
+          "copy_: byte offset overflow");


+    ET_CHECK_MSG(
+        this->dim() == other.dim(),
+        "copy_: dim of tensors must match (%zu vs %zu)",
+        this->dim(),
+        other.dim());
+    ET_CHECK_MSG(
+        this->sizes() == other.sizes(), "copy_: sizes of tensors must match");


Gasoonjia · 2026-05-12T01:11:24Z

+        int64_t src_term = 0;
+        int64_t dst_term = 0;
+        ET_CHECK_MSG(
+            !c10::mul_overflows(counter[d], other.stride(d), &src_term) &&


i feel like we only need two overflow calculation, one for src = counter[d] * this->stride(d), the other for dst_offset = dst_offset + src, thoughts?

yeah - thank you!

Gasoonjia · 2026-05-12T01:32:35Z

   */
  SlimTensor& copy_(const SlimTensor& other) {
+    ET_CHECK_MSG(
+        this->dim() == other.dim(),


im not a huge fan of adding this new limitation. Two tensor should be copyable as long as their numel are the same. Maybe we can remove this assertion?

I think we'd also need to update the loop below if we want tensors to be copyable with different dim/sizes..

Gasoonjia · 2026-05-12T01:33:33Z

thx for adding the overflow check! plz solve the broken cuda cis as well.

Gasoonjia · 2026-05-12T16:34:53Z

   * @return Reference to this tensor
   */
  SlimTensor& copy_(const SlimTensor& other) {
+    ET_CHECK_MSG(


This check also enforces two tensors having same shape -- can we remove it?

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

+              !::c10::mul_overflows(
+                  static_cast<size_t>(src_offset),
+                  elem_size,
+                  &src_byte_offset) &&
+              !::c10::mul_overflows(
+                  static_cast<size_t>(dst_offset), elem_size, &dst_byte_offset),


+      ET_CHECK_MSG(
+          src_offset >= 0 && dst_offset >= 0 &&
+              !::c10::mul_overflows(


+    ET_CHECK_MSG(
+        this->sizes() == other.sizes(), "copy_: sizes of tensors must match");
    ET_CHECK_MSG(
        this->numel() == other.numel(), "copy_: numel of tensors must match");
    ET_CHECK_MSG(this->dtype() == other.dtype(), "copy_: dtype must match");


Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

      // Copy elem_size bytes from src to dst
      if (this->device().is_cpu() && other.device().is_cpu()) {
        std::memcpy(
-            dst_data + dst_offset * elem_size,
-            src_data + src_offset * elem_size,
-            elem_size);
+            dst_data + dst_byte_offset, src_data + src_byte_offset, elem_size);
      } else if (this->device().is_cuda() || other.device().is_cuda()) {


  SlimTensor& copy_(const SlimTensor& other) {
+    ET_CHECK_MSG(
+        this->dim() == other.dim(),
+        "copy_: dim of tensors must match (%zu vs %zu)",
+        this->dim(),


Gasoonjia · 2026-05-12T22:20:43Z

@claude review the PR

claude · 2026-05-12T22:21:28Z

Claude finished @Gasoonjia's task in 1m 49s —— View job

Code Review: Fix cuda overflow

Gather context and read the changed file
Analyze the diff for correctness and security
Review feedback from other reviewers
Post review feedback

Overall: Good security hardening. The overflow-safe arithmetic is correct and uses the right primitives. There are a few open design questions and minor issues worth addressing.

Correctness & Security

Overflow checks are sound — c10::mul_overflows/c10::add_overflows wrap compiler builtins (__builtin_mul_overflow etc.) and correctly detect overflow before the result is used. The pattern of computing into temporary terms then accumulating is correct.
static_cast<size_t>(src_offset) after >= 0 check (slim_tensor.h:492-498) — The non-negativity check ensures the cast doesn't reinterpret a negative value as a large unsigned number. On 64-bit targets (which all CUDA platforms are), size_t == uint64_t, so no truncation occurs. On a hypothetical 32-bit build, values > UINT32_MAX would truncate before the overflow check. This is extremely unlikely in practice for SlimTensor's use cases, but if you want to be rigorous, add a static_assert(sizeof(size_t) >= sizeof(int64_t)) or a range check.
Negative strides (slim_tensor.h:492) — The src_offset >= 0 && dst_offset >= 0 assertion rejects tensors whose strided traversal produces negative offsets (e.g., torch.flip). This is fine for SlimTensor (which likely never produces negative strides), but should be called out explicitly since it's a behavioral constraint that isn't obvious from the function signature.

Design Feedback

dim()/sizes() equality restriction (slim_tensor.h:443-449) — Agreeing with @Gasoonjia's feedback: standard PyTorch copy_ only requires equal numel(), allowing e.g. [6].copy_([2,3]). The current strided loop assumes equal dims (uses this->dim() to iterate both src and dst), so enforcing this is correct for the current implementation. If relaxing this is desired later, the loop would need to iterate dimensions independently per tensor. For now, I'd suggest keeping the check but updating the doc comment to document this restriction explicitly.
Redundant numel() check (slim_tensor.h:450-451) — With equal sizes() enforced, numel() equality is mathematically guaranteed. Fine to keep as defense-in-depth, but if you'd like to slim down, it can be removed.
Stale doc comment (slim_tensor.h:437) — Says "Currently only supports CPU-to-CPU copy (contiguous tensors only)" but the implementation handles CUDA and non-contiguous. Should be updated to reflect reality.

Follow-up Items (out of scope but worth tracking)

compute_numel in size_util.h:23-28 uses unchecked n *= s. If sizes are attacker-controlled, numel can overflow before reaching copy_. This is a broader issue than this PR but worth a follow-up.
nbytes() at slim_tensor.h:101 (numel() * itemsize()) is also unchecked and used in the contiguous fast path at line 461. Same follow-up category.
CI failures — @Gasoonjia mentioned broken CUDA CIs. These should be investigated separately — the overflow logic itself looks correct for both CPU and CUDA paths.

Verdict

The core security fix (overflow-safe offset arithmetic) is correct and well-implemented. The main discussion point is the dim()/sizes() restriction (items 4, addressed in review thread). Updating the stale doc comment and considering the 32-bit cast edge case would be nice cleanups.

Gasoonjia · 2026-05-12T22:21:58Z

Update the doc as well to show that two tensors should have same sizes
And can you do me a favor to remove Currently only supports CPU-to-CPU copy (contiguous tensors only). line? it has out of date loll😛

lucylq · 2026-05-13T01:13:55Z

@claude review

claude · 2026-05-13T01:14:42Z

Claude finished @lucylq's task in 1m 47s —— View job

Code Review: Fix cuda overflow

Gather context and read the current state of the changed file
Analyze the diff against merge base
Review against prior feedback (Gasoonjia's comments, copilot suggestions)
Post review feedback

Overall: The security hardening looks correct and well-structured. The overflow-safe arithmetic uses the right primitives, and the doc comment has been properly updated. The PR addresses the prior review feedback well (stale comment removed, sizes restriction documented). A few remaining items:

Looks Good

Overflow detection pattern (lines 479-484): Using c10::mul_overflows/c10::add_overflows with short-circuit evaluation is correct — if any operation overflows, the check fires before the corrupted value propagates.
Byte offset overflow check (lines 490-498): The non-negativity guard before static_cast<size_t> prevents signed-to-unsigned reinterpretation. On all CUDA platforms (64-bit), sizeof(size_t) == sizeof(int64_t), so no truncation.
Doc comment (lines 433-439): Updated to match the new preconditions. Stale "CPU-to-CPU" reference removed per @Gasoonjia's request.

Remaining Items

data_ptr() has unchecked arithmetic (slim_tensor.h:212):
```
return static_cast<char*>(storage_->data()) + storage_offset_ * itemsize();
```
This multiplication can overflow for the same reason the strided offsets could. It's called in the contiguous fast path (line 459-460) — a malicious storage_offset_ could produce an OOB pointer before the new overflow checks even run. Worth a follow-up PR.
nbytes() unchecked (slim_tensor.h:100-101): numel() * itemsize() feeds directly into the contiguous storage_->copy_() call at line 460. Same category as above.
compute_numel unchecked (size_util.h:23-28): Uses n *= s without overflow detection. If sizes are attacker-controlled via set_sizes_and_strides, numel can silently wrap before copy_ is called, potentially causing the numel() check (line 450) to pass incorrectly.
CI failures: @Gasoonjia mentioned broken CUDA CIs — these should be investigated separately as the logic here looks correct for both CPU and CUDA paths.

Items 1-3 are pre-existing issues and out of scope for this PR, but worth tracking as follow-ups to complete the hardening effort.

Verdict

LGTM — the core fix is correct and focused. The new checks are ordered cheaply (dim/sizes comparison before the per-element loop) and the overflow detection uses well-tested compiler builtins. The remaining unchecked paths (data_ptr, nbytes, compute_numel) are pre-existing and appropriate for follow-up work.

lucylq changed the title ~~overflow~~ Fix cuda overflow May 11, 2026

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 11, 2026

lucylq force-pushed the security20-cuda branch 2 times, most recently from e5dfd0f to df170ce Compare May 11, 2026 23:20

lucylq requested a review from Gasoonjia May 11, 2026 23:20

lucylq marked this pull request as ready for review May 11, 2026 23:22

Copilot AI review requested due to automatic review settings May 11, 2026 23:22

Copilot started reviewing on behalf of lucylq May 11, 2026 23:22 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

Gasoonjia added the ciflow/cuda label May 11, 2026

Gasoonjia reviewed May 12, 2026

View reviewed changes

lucylq force-pushed the security20-cuda branch from df170ce to 9a1980d Compare May 12, 2026 16:13

Gasoonjia reviewed May 12, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings May 12, 2026 20:57

lucylq force-pushed the security20-cuda branch from 9a1980d to 02e0ad0 Compare May 12, 2026 20:57

Copilot started reviewing on behalf of lucylq May 12, 2026 20:58 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

lucylq force-pushed the security20-cuda branch from 02e0ad0 to b78a69d Compare May 12, 2026 22:10

Copilot AI review requested due to automatic review settings May 12, 2026 22:13

lucylq force-pushed the security20-cuda branch from b78a69d to bddf2a8 Compare May 12, 2026 22:13

Copilot started reviewing on behalf of lucylq May 12, 2026 22:14 View session

lucylq force-pushed the security20-cuda branch from bddf2a8 to ebc9978 Compare May 12, 2026 22:16

Copilot AI reviewed May 12, 2026

View reviewed changes

Gasoonjia reviewed May 12, 2026

View reviewed changes

Gasoonjia approved these changes May 12, 2026

View reviewed changes

overflow

ce6fc1b

lucylq force-pushed the security20-cuda branch from ebc9978 to ce6fc1b Compare May 12, 2026 22:25

Conversation

lucylq commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19487

❗ 1 Active SEVs

❌ 1 Pending, 1 Unclassified Failure

Uh oh!

github-actions Bot commented May 11, 2026

This PR needs a release notes: label

Uh oh!

lucylq commented May 11, 2026

Uh oh!

claude Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Fix cuda overflow

Strengths

Minor observations

Verdict

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Gasoonjia May 12, 2026

Choose a reason for hiding this comment

Uh oh!

lucylq May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Gasoonjia May 12, 2026

Choose a reason for hiding this comment

Uh oh!

lucylq May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Gasoonjia commented May 12, 2026

Uh oh!

Gasoonjia May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Gasoonjia commented May 12, 2026

Uh oh!

claude Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Fix cuda overflow

Correctness & Security

Design Feedback

Follow-up Items (out of scope but worth tracking)

Verdict

Uh oh!

Gasoonjia May 12, 2026

Choose a reason for hiding this comment

Uh oh!

lucylq commented May 13, 2026

Uh oh!

claude Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Fix cuda overflow

Looks Good

Remaining Items

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

lucylq commented May 11, 2026 •

edited

Loading

pytorch-bot Bot commented May 11, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented May 11, 2026 •

edited

Loading

claude Bot commented May 12, 2026 •

edited

Loading

claude Bot commented May 13, 2026 •

edited

Loading