Skip to content

Missed code elimination with std::mem::replace/std::mem::swap #44701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
batonius opened this issue Sep 19, 2017 · 6 comments
Open

Missed code elimination with std::mem::replace/std::mem::swap #44701

batonius opened this issue Sep 19, 2017 · 6 comments
Labels
C-bug Category: This is a bug. I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@batonius
Copy link

batonius commented Sep 19, 2017

Edit: As pointed out by @oyvindln, the problem appears to be a regression introduced in 1.20.

Demonstration: https://godbolt.org/g/5uuzVL

Version: rustc 1.22.0-nightly (277476c 2017-09-16) , -C opt-level=3 -C target-cpu=native

Code:

use std::rc::Rc;
use std::cell::RefCell;

pub struct Buffer {
    buf: Vec<u8>,
    pool: Rc<RefCell<Vec<Vec<u8>>>>,
}

impl Drop for Buffer {
    fn drop(&mut self) {
        self.pool.borrow_mut()
            .push(std::mem::replace(&mut self.buf, vec![]));        
    }
}

Expected result: An optimal code would move self.buf directly into self.pool and then reset self.buf in-place. An acceptable code would moveself.buf into a temporary on the stack, move the temporary into self.pool and reset self.buf in-place.

Observed result:

  1. Space for two std::Vec<u8>(each 24 bytes) is allocated on the stack, -48(%rbp) (A) and -96(%rbp) (B).
  2. self.buf is copied to A.
  3. self.buf is reset in-place.
  4. A is copied to B.
  5. B is copied to A.
  6. A is inserted into self.pool.

Steps 4 and 5 is a completely unnecessary copying of 48 bytes and could be safely removed. Replacing std::mem::replace with an equivalent std::mem::swap call produces a slightly different code with the same basic problem.

Compiler output:

<example::Buffer as core::ops::drop::Drop>::drop:
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %rbx
        subq    $88, %rsp
        movq    %rdi, %rax
        movq    24(%rax), %rbx
        cmpq    $0, 16(%rbx)
        jne     .LBB6_6
        leaq    16(%rbx), %rcx
        movq    $-1, 16(%rbx)
        leaq    24(%rbx), %rdi
        movq    %rdi, -64(%rbp)
        movq    %rcx, -56(%rbp)
;2
        movq    16(%rax), %rcx
        movq    %rcx, -32(%rbp)
        vmovups (%rax), %xmm0
        vmovaps %xmm0, -48(%rbp)
;3
        movq    $1, (%rax)
        vxorps  %xmm0, %xmm0, %xmm0
        vmovups %xmm0, 8(%rax)
;4
        movq    -32(%rbp), %rax
        movq    %rax, -80(%rbp)
        vmovaps -48(%rbp), %xmm0
        vmovaps %xmm0, -96(%rbp)
;5
        movq    -80(%rbp), %rax
        movq    %rax, -32(%rbp)
        vmovaps -96(%rbp), %xmm0
        vmovaps %xmm0, -48(%rbp)
;6
        movq    40(%rbx), %rax
        cmpq    32(%rbx), %rax
        jne     .LBB6_4
        callq   <alloc::raw_vec::RawVec<T, A>>::double
        movq    40(%rbx), %rax
.LBB6_4:
        movq    24(%rbx), %rcx
        leaq    (%rax,%rax,2), %rax
        movq    -32(%rbp), %rdx
        movq    %rdx, 16(%rcx,%rax,8)
        vmovaps -48(%rbp), %xmm0
        vmovups %xmm0, (%rcx,%rax,8)
        incq    40(%rbx)
        movq    $0, 16(%rbx)
        addq    $88, %rsp
        popq    %rbx
        popq    %rbp
        retq
.LBB6_6:
        callq   core::result::unwrap_failed
        movq    %rax, %rbx
        leaq    -48(%rbp), %rdi
        callq   core::ptr::drop_in_place
        leaq    -64(%rbp), %rdi
        callq   core::ptr::drop_in_place
        movq    %rbx, %rdi
        callq   _Unwind_Resume@PLT
@aidanhs aidanhs added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Sep 19, 2017
@oyvindln
Copy link
Contributor

Haven't studied the ASM in detail, but with 1.19 and earlier the drop function contains a lot less assembly output. Maybe this is related to #40454 ?

@batonius
Copy link
Author

Indeed, 1.19 doesn't use a temporary, and #40454 looks suspicious, so I've investigated a little: https://godbolt.org/g/Fo2bZ1 . BufferOld uses the old version of std::swap, and BufferNew - the relevant part of the new version.

With 1.20, beta and nightly, BufferNew::drop compiles as described above, but BufferOld::drop uses only a single on-stack temporary and doesn't waste time copying data around, so at first it looks like the problem is the new std::swap. Yet with 1.19, both BufferNew::drop and BufferOld::drop compile to the same, more efficient code that doesn't use temporaries at all.

So it looks like there has been a regression in the optimizer since 1.20, and the new std::swap, while not the problem itself, suffers from it unproportionally.

@batonius
Copy link
Author

@aidanhs The label should probably be changed since the issue appears to be a recent regression.

@aidanhs aidanhs added C-bug Category: This is a bug. and removed C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Sep 21, 2017
@saethlin
Copy link
Member

@scottmcm is this related to #112733?

@saethlin saethlin added I-slow Issue: Problems and improvements with respect to performance of generated code. I-heavy Issue: Problems and improvements with respect to binary size of generated code. labels Jul 16, 2023
@scottmcm
Copy link
Member

Yes, this looks almost identical to the replace_string codegen test in that PR: https://github.com/rust-lang/rust/pull/112733/files#diff-aeacafb3161bf010bd64adaedc5709ad328f4e1f222cbbc3b3def7c6ac4e4385R49

@scottmcm scottmcm linked a pull request Jul 20, 2023 that will close this issue
@scottmcm
Copy link
Member

Actually, if this is about allocas during mem::replace, then I think it's actually already fixed with https://github.com/rust-lang/rust/pull/111010/files#diff-24ed0a5d700ac2b1c9e1ef53e400e7f92ec49f8165fc5bf2647d1efb69e21c80R16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants