- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1.6k
 
RFC: Add a replace_with method to Option #2490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 
           It would be ideal to implement a proper optimization instead of introducing this  I have initially attempted to dig into the optimizations, but soon I realized that I don't have enough expertise yet to do that. If someone can mentor me, I would eagerly dive into the topic! UPDATE: I moved my findings into a StackOverflow question. I would love to learn more details about optimizations, so your help is appreciated!  | 
    
31a6959    to
    e43de83      
    Compare
  
    
          The proposed implementation does not look panic safe to me, i e, in case   | 
    
| 
           @diwic Well, initialization with   | 
    
| 
           I would rewrite this as:     *self = f(mem::replace(self, None));which is safe and depends directly on zero unsafe operations.  | 
    
| 
           @Centril and   | 
    
| 
           @diwic is right. If  I initially started with the following implementation:         let mut new_value = f(self.take());
        mem::swap(self, &mut new_value);
        // Since self was None after take(), new_value holds None here after swap(),
        // so we can forget about it.
        mem::forget(new_value);It is as performant as the proposed implementation but has one extra   | 
    
| 
           @frol So I primarily see the motivation for   | 
    
| 
           @Centril To be honest, I would prefer fixing optimizer to handle   | 
    
| 
           The problem is not in  With  	mov	rax, qword ptr [rdi]
	mov	rcx, rax
	shr	rcx, 32
	xor	edx, edx
	test	eax, eax
	setne	dl
	add	ecx, 1
	mov	dword ptr [rdi], edx
	mov	dword ptr [rdi + 4], ecx
	retWith a different  	xor	eax, eax
	cmp	dword ptr [rdi], 0
	setne	al
	mov	dword ptr [rdi], eax
	add	dword ptr [rdi + 4], 1
	retWhich is pretty good, other than the usual rust-lang/rust#49420 (comment). I suspect  start:
  %1 = bitcast { i32, i32 }* %0 to i64*
  %2 = load i64, i64* %1, align 1, !alias.scope !0, !noalias !9That  TL/DR:  Edit: PR for   | 
    
mem::swap the obvious way for types smaller than the SIMD optimization's block size
LLVM isn't able to remove the alloca for the unaligned block in the post-SIMD tail in some cases, so doing this helps SRoA work in cases where it currently doesn't.  Found in the `replace_with` RFC discussion.
Examples of the improvements:
<details>
 <summary>swapping `[u16; 3]` takes 1/3 fewer instructions and no stackalloc</summary>
```rust
type Demo = [u16; 3];
pub fn swap_demo(x: &mut Demo, y: &mut Demo) {
    std::mem::swap(x, y);
}
```
nightly:
```asm
_ZN4blah9swap_demo17ha1732a9b71393a7eE:
.seh_proc _ZN4blah9swap_demo17ha1732a9b71393a7eE
	sub	rsp, 32
	.seh_stackalloc 32
	.seh_endprologue
	movzx	eax, word ptr [rcx + 4]
	mov	word ptr [rsp + 4], ax
	mov	eax, dword ptr [rcx]
	mov	dword ptr [rsp], eax
	movzx	eax, word ptr [rdx + 4]
	mov	word ptr [rcx + 4], ax
	mov	eax, dword ptr [rdx]
	mov	dword ptr [rcx], eax
	movzx	eax, word ptr [rsp + 4]
	mov	word ptr [rdx + 4], ax
	mov	eax, dword ptr [rsp]
	mov	dword ptr [rdx], eax
	add	rsp, 32
	ret
	.seh_handlerdata
	.section	.text,"xr",one_only,_ZN4blah9swap_demo17ha1732a9b71393a7eE
	.seh_endproc
```
this PR:
```asm
_ZN4blah9swap_demo17ha1732a9b71393a7eE:
	mov	r8d, dword ptr [rcx]
	movzx	r9d, word ptr [rcx + 4]
	movzx	eax, word ptr [rdx + 4]
	mov	word ptr [rcx + 4], ax
	mov	eax, dword ptr [rdx]
	mov	dword ptr [rcx], eax
	mov	word ptr [rdx + 4], r9w
	mov	dword ptr [rdx], r8d
	ret
```
</details>
<details>
 <summary>`replace_with` optimizes down much better</summary>
Inspired by rust-lang/rfcs#2490,
```rust
fn replace_with<T, F>(x: &mut Option<T>, f: F)
    where F: FnOnce(Option<T>) -> Option<T>
{
    *x = f(x.take());
}
pub fn inc_opt(mut x: &mut Option<i32>) {
    replace_with(&mut x, |i| i.map(|j| j + 1));
}
```
Rust 1.26.0:
```asm
_ZN4blah7inc_opt17heb0acb64c51777cfE:
	mov	rax, qword ptr [rcx]
	movabs	r8, 4294967296
	add	r8, rax
	shl	rax, 32
	movabs	rdx, -4294967296
	and	rdx, r8
	xor	r8d, r8d
	test	rax, rax
	cmove	rdx, rax
	setne	r8b
	or	rdx, r8
	mov	qword ptr [rcx], rdx
	ret
```
Nightly (better thanks to ScalarPair, maybe?):
```asm
_ZN4blah7inc_opt17h66df690be0b5899dE:
	mov	r8, qword ptr [rcx]
	mov	rdx, r8
	shr	rdx, 32
	xor	eax, eax
	test	r8d, r8d
	setne	al
	add	edx, 1
	mov	dword ptr [rcx], eax
	mov	dword ptr [rcx + 4], edx
	ret
```
This PR:
```asm
_ZN4blah7inc_opt17h1426dc215ecbdb19E:
	xor	eax, eax
	cmp	dword ptr [rcx], 0
	setne	al
	mov	dword ptr [rcx], eax
	add	dword ptr [rcx + 4], 1
	ret
```
Where that add is beautiful -- using an addressing mode to not even need to explicitly go through a register -- and the remaining imperfection is well-known (rust-lang#49420 (comment)).
</details>
    mem::swap the obvious way for types smaller than the SIMD optimization's block size
LLVM isn't able to remove the alloca for the unaligned block in the post-SIMD tail in some cases, so doing this helps SRoA work in cases where it currently doesn't.  Found in the `replace_with` RFC discussion.
Examples of the improvements:
<details>
 <summary>swapping `[u16; 3]` takes 1/3 fewer instructions and no stackalloc</summary>
```rust
type Demo = [u16; 3];
pub fn swap_demo(x: &mut Demo, y: &mut Demo) {
    std::mem::swap(x, y);
}
```
nightly:
```asm
_ZN4blah9swap_demo17ha1732a9b71393a7eE:
.seh_proc _ZN4blah9swap_demo17ha1732a9b71393a7eE
	sub	rsp, 32
	.seh_stackalloc 32
	.seh_endprologue
	movzx	eax, word ptr [rcx + 4]
	mov	word ptr [rsp + 4], ax
	mov	eax, dword ptr [rcx]
	mov	dword ptr [rsp], eax
	movzx	eax, word ptr [rdx + 4]
	mov	word ptr [rcx + 4], ax
	mov	eax, dword ptr [rdx]
	mov	dword ptr [rcx], eax
	movzx	eax, word ptr [rsp + 4]
	mov	word ptr [rdx + 4], ax
	mov	eax, dword ptr [rsp]
	mov	dword ptr [rdx], eax
	add	rsp, 32
	ret
	.seh_handlerdata
	.section	.text,"xr",one_only,_ZN4blah9swap_demo17ha1732a9b71393a7eE
	.seh_endproc
```
this PR:
```asm
_ZN4blah9swap_demo17ha1732a9b71393a7eE:
	mov	r8d, dword ptr [rcx]
	movzx	r9d, word ptr [rcx + 4]
	movzx	eax, word ptr [rdx + 4]
	mov	word ptr [rcx + 4], ax
	mov	eax, dword ptr [rdx]
	mov	dword ptr [rcx], eax
	mov	word ptr [rdx + 4], r9w
	mov	dword ptr [rdx], r8d
	ret
```
</details>
<details>
 <summary>`replace_with` optimizes down much better</summary>
Inspired by rust-lang/rfcs#2490,
```rust
fn replace_with<T, F>(x: &mut Option<T>, f: F)
    where F: FnOnce(Option<T>) -> Option<T>
{
    *x = f(x.take());
}
pub fn inc_opt(mut x: &mut Option<i32>) {
    replace_with(&mut x, |i| i.map(|j| j + 1));
}
```
Rust 1.26.0:
```asm
_ZN4blah7inc_opt17heb0acb64c51777cfE:
	mov	rax, qword ptr [rcx]
	movabs	r8, 4294967296
	add	r8, rax
	shl	rax, 32
	movabs	rdx, -4294967296
	and	rdx, r8
	xor	r8d, r8d
	test	rax, rax
	cmove	rdx, rax
	setne	r8b
	or	rdx, r8
	mov	qword ptr [rcx], rdx
	ret
```
Nightly (better thanks to ScalarPair, maybe?):
```asm
_ZN4blah7inc_opt17h66df690be0b5899dE:
	mov	r8, qword ptr [rcx]
	mov	rdx, r8
	shr	rdx, 32
	xor	eax, eax
	test	r8d, r8d
	setne	al
	add	edx, 1
	mov	dword ptr [rcx], eax
	mov	dword ptr [rcx + 4], edx
	ret
```
This PR:
```asm
_ZN4blah7inc_opt17h1426dc215ecbdb19E:
	xor	eax, eax
	cmp	dword ptr [rcx], 0
	setne	al
	mov	dword ptr [rcx], eax
	add	dword ptr [rcx + 4], 1
	ret
```
Where that add is beautiful -- using an addressing mode to not even need to explicitly go through a register -- and the remaining imperfection is well-known (rust-lang#49420 (comment)).
</details>
    | 
           @scottmcm FYI, I have tried the latest Rust nightly (6a1c0637c 2018-07-23), which includes the patch from PR rust-lang/rust#52051 and even though I see that your example snippet has been improved with the patch, there is no improvement for my use-case. Given that the proposed optimization was concluded to be irrelevant for the implementation, and the fact that the proposed method basically duplicates the already existing way to do this operation in an obvious way (  | 
    
| 
           UPDATE: There was a relevant RFC about the ideas I describe below, so feel free to ignore my message. I have just had a conversation where this  #[derive(Debug)]
struct Bar;
#[derive(Debug)]
enum Foo {
	A(Bar),
	B(Bar),
}
#[derive(Debug)]
struct Baz {
	foo: Foo,
}
impl Baz {
    fn switch_variant_unsafe(&mut self) {
        let mut foo_temp: Foo = unsafe { ::std::mem::uninitialized() };
        ::std::mem::swap(&mut self.foo, &mut foo_temp);
        self.foo = match foo_temp {
            Foo::A(bar) => Foo::B(bar),
            Foo::B(bar) => Foo::A(bar),
        }
    }
    
    fn switch_variant_safe(&mut self) {
        self.foo = match self.foo {
            Foo::A(bar) => Foo::B(bar),
            Foo::B(bar) => Foo::A(bar),
        })
    }
}
fn main() {
    let mut baz = Baz { foo: Foo::A(Bar) };
    baz.foo = match baz.foo {
        Foo::A(bar) => Foo::B(bar),
        Foo::B(bar) => Foo::A(bar),
    };
    dbg!(&baz);
    baz.switch_variant_unsafe();
    dbg!(&baz);
    baz.switch_variant_safe();
    dbg!(&baz);
}As is, you get a compilation error: There is also a similar question on SO. Here is the helper that I came up with (based on this RFC): fn replace_with<T, F>(dest: &mut T, mut f: F) 
where
    F: FnMut(T) -> T,
{
    let mut old_value = unsafe { std::mem::uninitialized() };
    std::mem::swap(dest, &mut old_value);  // dest is "uninitialized" (in fact, it is not touched in release mode)
    let mut new_value = f(old_value);
    std::mem::swap(dest, &mut new_value);   // dest holds new_value, and new_value is "uninitialized"
    std::mem::forget(new_value);  // since it is "uninitialized", we forget about it
}, and then we can implement      fn switch_variant_safe(&mut self) {
        replace_with(&mut self.foo, |foo| match foo {
            Foo::A(bar) => Foo::B(bar),
            Foo::B(bar) => Foo::A(bar),
        })
    }The generated assembly for  As to the unsoundness concerns raised in #2490 (comment), in release mode,  example::replace_with:
        mov     al, byte ptr [rdi]
        not     al
        and     al, 1
        mov     byte ptr [rdi], al
        retin fact, it gets automatically inlined unless I put          mov     al, byte ptr [rsp + 7]
        not     al
        and     al, 1Thus, there is no unsoundness in the release mode. Yet, in debug mode, there is indeed an explicit uninitialized value gets assigned to the  Another way to implement enum variant "toggle" is to pass ownership to      fn switch_variant_owned(mut self) -> Self {
        self.foo = match self.foo {
            Foo::A(bar) => Foo::B(bar),
            Foo::B(bar) => Foo::A(bar),
        };
        self
    }, but it requires the API changes all the way down to the method (i.e. you have to use this API style all the way through your codebase if it is a low-level method). Sidenote, while the generated assembly is mostly the same (there are some variables rearrangements), there is one interesting optimization gets applied when  I believe, there is a need for safe and sound  P.S. This more generic  let mut some_option: Option<i32> = Some(123);
some_option.replace_with(|old_value| consume_option_i32_and_produce_option_i32(old_value));I would write let mut some_option: Option<i32> = Some(123);
std::mem::replace(&mut some_option, |old_value| consume_option_i32_and_produce_option_i32(old_value)); | 
    
| 
           @frol Since it's unclear whether or not you're already aware of this: I believe that specific use case was a major part of the discussion around #1736 For those who aren't familiar with that discussion: The big sticking point that led to its closure was that apparently the only way to make a   | 
    
| 
           @Ixrec I was not aware of it. Thank you for pointing out in the right direction!  | 
    
Add the method
Option::replace_withto the core library.This RFC proposes the addition of
Option::replace_withto complimentOption::replace(RFC #2296) andOption::takemethods. It replaces the actual value in the option with the value returned from a closure given as a parameter, while the old value is passed into the closure.Rendered