Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

miri: optimize zeroed alloc #136035

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

SpecificProtagonist
Copy link

@SpecificProtagonist SpecificProtagonist commented Jan 25, 2025

When allocating zero-initialized memory in MIR interpretation, rustc allocates zeroed memory, marks it as initialized and then re-zeroes it. Remove the last step.

I don't expect this to have much of an effect on performance normally, but in my case in which I'm creating a large allocation via mmap it gets in the way.

@rustbot
Copy link
Collaborator

rustbot commented Jan 25, 2025

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @wesleywiser (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 25, 2025
@rustbot
Copy link
Collaborator

rustbot commented Jan 25, 2025

Some changes occurred to the CTFE machinery

cc @rust-lang/wg-const-eval

The Miri subtree was changed

cc @rust-lang/miri

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri, @rust-lang/wg-const-eval

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri

@rust-log-analyzer

This comment has been minimized.

@jieyouxu
Copy link
Member

r? miri

@rustbot rustbot assigned oli-obk and unassigned wesleywiser Jan 25, 2025
@SpecificProtagonist
Copy link
Author

Sorry, I'm not sure how I closed this – misclick?

@Kobzol
Copy link
Contributor

Kobzol commented Jan 25, 2025

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 25, 2025
@bors
Copy link
Contributor

bors commented Jan 25, 2025

⌛ Trying commit bd28faf with merge 837b710...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 25, 2025
…c, r=<try>

miri: optimize zeroed alloc

When allocating zero-initialized memory in MIR interpretation, rustc allocates zeroed memory, marks it as initialized and then re-zeroes it. Remove the last step.

I don't expect this to have much of an effect on performance normally, but in my case in which I'm creating a large allocation via mmap miri is unusable without this.

There's probably a better way – with less code duplication – to implement this. Maybe adding a zero_init flag to the relevant methods, but then `Allocation::uninit` & co need a new name :)
@bors
Copy link
Contributor

bors commented Jan 25, 2025

☀️ Try build successful - checks-actions
Build commit: 837b710 (837b710e5dd54b53b888f1ab109a3b93efc9a144)

@rust-timer

This comment has been minimized.

@oli-obk
Copy link
Contributor

oli-obk commented Jan 25, 2025

This is not gonna show up in perf. No code path outside miri is changed

@RalfJung
Copy link
Member

We should definitely explore ways to do this with less code duplication. :)
Adding a flag sounds like a good idea. The name of the method could just be Allocation::new etc?

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (837b710): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 2
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (primary -2.2%, secondary 2.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.2% [2.2%, 2.2%] 2
Improvements ✅
(primary)
-2.2% [-2.2%, -2.2%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.2% [-2.2%, -2.2%] 1

Cycles

Results (primary -1.6%, secondary -0.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.3% [1.3%, 1.3%] 1
Improvements ✅
(primary)
-1.6% [-1.6%, -1.6%] 1
Improvements ✅
(secondary)
-1.6% [-1.6%, -1.6%] 1
All ❌✅ (primary) -1.6% [-1.6%, -1.6%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 771.261s -> 771.165s (-0.01%)
Artifact size: 325.82 MiB -> 325.82 MiB (0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 25, 2025
@SpecificProtagonist
Copy link
Author

Adding a flag sounds like a good idea. The name of the method could just be Allocation::new etc?

Changed 👍

@@ -289,7 +291,7 @@ impl<'tcx, M: Machine<'tcx>> InterpCx<'tcx, M> {

// For simplicities' sake, we implement reallocate as "alloc, copy, dealloc".
// This happens so rarely, the perf advantage is outweighed by the maintenance cost.
let new_ptr = self.allocate_ptr(new_size, new_align, kind)?;
let new_ptr = self.allocate_ptr(new_size, new_align, kind, zero_init)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we'll memcopy right over the alloc, zeroing is not needed for realloc, just use false here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memcopy only initializes part of the allocation, but mremap needs everything to be initialized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also seems worth saying that we assume here that zeroing the entire allocation is more efficient than zeroing just the parts not copied from the old allocation.

@@ -315,8 +320,8 @@ impl<Prov: Provenance, Bytes: AllocBytes> Allocation<Prov, (), Bytes> {

/// Try to create an Allocation of `size` bytes, failing if there is not enough memory
/// available to the compiler to do so.
pub fn try_uninit<'tcx>(size: Size, align: Align) -> InterpResult<'tcx, Self> {
Self::uninit_inner(size, align, || {
pub fn try_new<'tcx>(size: Size, align: Align, zero_init: bool) -> InterpResult<'tcx, Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bools are a bit icky, as the call sites don't make it clear what the bool means. Maybe an enum with Uninit and Zeroed variants would be better? What do you think @RalfJung

Copy link
Author

@SpecificProtagonist SpecificProtagonist Jan 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not happy with the unlabeled bools. An additional (very simple) possibility could be to use comments – this is used by existing code: mem_copy(ptr, new_ptr.into(), old_size.min(new_size), /*nonoverlapping*/ true).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a dedicated enum makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants