Manually initialize GcBox contents post-allocation to reduce memory copying #2

adrian17 · 2021-03-10T20:45:08Z

Ideally, when calling

let data = GcCell::allocate(gc_context, data); // assume  struct Data { x: [u8; 1000] }

You'd assume data to be constructed in-place or moved into newly allocated memory. (the first one isn't really common as stable Rust lacks placement-new-like features). And with the struct being relatively big, you'd expect the compiler to generate a memcpy call to simply move the structure's bytes into place.

The issue currently is that due to either rustc not being smart enough or the gc-arena code not being optimizer friendly (or both), the compiler can memcpy your Data object several times before actually moving it into its final place.

For example here:

        let gc_box = GcBox {
            flags: GcFlags::new(),
            next: Cell::new(self.all.get()),
            value: UnsafeCell::new(t),
        };
        gc_box.flags.set_needs_trace(T::needs_trace());
        let ptr = NonNull::new_unchecked(Box::into_raw(Box::new(gc_box)));

The generated code will firstly do memcpy to move t into the gc_box object on stack, then allocate memory, and then do the second memcpy to move the gc_box object onto heap memory. For some reason, on wasm target the compiler is even worse at optimizing this; at the worst case, I've seen four memcpy calls for a single GC allocation. This can obviously cause unnecessary overhead.

My patch helps the compiler by simplifying the initialization - first we allocate the uninitialized memory, then we manually build the GcBox by moving its fields into place. This way the object t is moved straight into its final place without being moved into intermediate stack variable gc_box.

I was trying to show a comparison on godbolt, but as soon as I drop some layers of abstractions, rustc catches on and generates better code. This is my best attempt: https://godbolt.org/z/aaK75W . You can see that in old() there is one memcpy before allocation and one after, but in new() there is only one memcpy.

Here's a comparison on "production" code, with a decompiled wasm build of https://github.com/ruffle-rs/ruffle/ . In practice, I've seen this cause up to 15-20% speedups in some edge cases.

Before, 4x memcpy:

function gc_arena_context_MutationContext_allocate_h107c2f641d37f0af(a:int, b:int):int {
  var d:int;
  var e:int;
  var c:int = g_a - 368;
  g_a = c;
  memcpy(c + 16, b, 108);
  a[8]:int = (b = a[8]:int + 120);
  if (a[100]:ubyte == 3) {
    if (b <= a[10]:int) goto B_a;
    a[100]:byte = 0;
  }
  a[3]:double = a[3]:double + 120.0 + 120.0 / a[1]:double;
  label B_a:
  c[136]:byte = 0;
  c[16]:long = a[11]:long@4;
  memcpy(c + 140, c + 16, 108);
  gc_arena_types_GcFlags_set_needs_trace_h2ec31421f45c37eb(c + 136, 1); // No clue why it wasn't inlined
  memcpy(c + 248, c + 128, 120);
  b = rust_alloc(120, 4);
  if (b) {
    swf_string_SwfStr_as_core_convert_From_str_from_h2a696e0da16f901c( // <- this is actually `static_gc_box()`, compiler deduplicated functions. No clue why it wasn't inlined.
      c + 8,
      d = memcpy(b, c + 248, 120),
      1231000);
    b = c[2]:int;
    (a + 48)[0]:int = (e = c[3]:int);
    a[11]:int = b;
    if (a[100]:ubyte != 2) goto B_d;
    if (a[15]:int) goto B_d;
    a[16]:int = e;
    a[15]:int = b;
    label B_d:
    g_a = c + 368;
    return d;
  }
  alloc_alloc_handle_alloc_error_h9b35ff53f78b3b72(120, 4);
  return unreachable;
}

After, just two:

function gc_arena_context_MutationContext_allocate_h107c2f641d37f0af(a:int, b:long_ptr@4):int {
  var d:int;
  var c:long_ptr@4 = g_a - 240;
  g_a = c;
  memcpy(c + 8, b, 108);
  a[8]:int = (d = a[8]:int + 120);
  var e:int = 3;
  b = a[100]:ubyte;
  if (b == 3) {
    if (d <= a[10]:int) goto B_a;
    a[100]:byte = 0;
    b = 0;
  }
  a[3]:double = a[3]:double + 120.0 + 120.0 / a[1]:double;
  e = b;
  label B_a:
  b = rust_alloc(120, 4);
  if (b) {
    b[0] = c[30];
    d = b + 8;
    d[0]:int = (c + 128)[0]:int;
    d[0]:byte = 4;
    b[0] = a[11]:long@4;
    memcpy(b + 12, c + 8, 108);
    (a + 48)[0]:int = 1210328;
    a[11]:int = b;
    if (e != 2) goto B_d;
    if (a[15]:int) goto B_d;
    a[16]:int = 1210328;
    a[15]:int = b;
    label B_d:
    g_a = c + 240;
    return b;
  }
  alloc_alloc_handle_alloc_error_h9b35ff53f78b3b72(120, 4);
  return unreachable;
}

And when rust-lang/rust#82806 gets merged into Rustc , with my patch it'll become just one, how it's supposed to work :)

function gc_arena_context_MutationContext_allocate_h107c2f641d37f0af(a:int, b:int):int {
  var e:int;
  var c:int;
  a[8]:int = (e = a[8]:int + 120);
  var d:int = 3;
  c = a[100]:ubyte;
  if (c == 3) {
    if (e <= a[10]:int) goto B_a;
    a[100]:byte = 0;
    c = 0;
  }
  a[3]:double = a[3]:double + 120.0 + 120.0 / a[1]:double;
  d = c;
  label B_a:
  c = rust_alloc(120, 4);
  if (c) {
    c[8]:byte = 4;
    c[0]:long@4 = a[11]:long@4;
    memcpy(c + 12, b, 108);
    (a + 48)[0]:int = 1210328;
    a[11]:int = c;
    if (d != 2) goto B_d;
    if (a[15]:int) goto B_d;
    a[16]:int = 1210328;
    a[15]:int = c;
    label B_d:
    return c;
  }
  alloc_alloc_handle_alloc_error_h9b35ff53f78b3b72(120, 4);
  return unreachable;
}

I made sure the patch passes tests with miri.

adrian17 · 2021-03-10T20:46:35Z

Damn, wanted to post this to the upstream repo :/

adrian17 added 2 commits March 10, 2021 18:49

Mark cheap functions as inline.

12d74f7

Manually initialize GcBox contents post-allocation

e2f9bb0

adrian17 closed this Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Manually initialize GcBox contents post-allocation to reduce memory copying #2

Manually initialize GcBox contents post-allocation to reduce memory copying #2

Uh oh!

adrian17 commented Mar 10, 2021

Uh oh!

adrian17 commented Mar 10, 2021

Uh oh!

Uh oh!

Manually initialize GcBox contents post-allocation to reduce memory copying #2

Manually initialize GcBox contents post-allocation to reduce memory copying #2

Uh oh!

Conversation

adrian17 commented Mar 10, 2021

Uh oh!

adrian17 commented Mar 10, 2021

Uh oh!

Uh oh!