`som-gc`: Custom Mark-and-Sweep Garbage Collector #33

Hirevo · 2023-02-03T16:17:22Z

This PR introduces a new crate within the som-rs workspace, called som-gc.
It is an implementation of a mark-and-sweep garbage collector as a Rust library.

Automatic memory management and reclamation is a quite new territory for me, as I have been reading about and researching on my free time for a while now, but never wrote anything like it yet.
Therefore, the initial implementation in this PR is rather simple compared to what's eventually possible.
I plan to improve the implementation as time goes on and as my knowledge of better GC techniques grows.

This PR also already changes som-interpreter-bc to integrate this new GC, as a replacement of the reference-counting method that it previously used.
The integration is already complete, all instance of reference-counting is now gone and the SOM primitives that were previously unable to be implemented (System>>#fullGC and System>>#gcStats) are now available and passing the tests.

The performance of this GC, however, is a notable regression from reference-counting, from my initial measurements.
Since the implementation is rather simple right now, this was to be expected and I hope to make it better progressively over time.

The inclusion of a tracing garbage collector also allows to finally address the memory leak issue that could happen when reference cycles occurred (when two SOM values references each other, directly or indirectly).

Hirevo · 2023-02-03T16:19:57Z

Something else to note is that the library, as implemented right now, is unsound.

This is because simply storing a Gc<T> directly on the stack does not make it rooted on its own.
So, a developer using som-gc right now must be careful to either:

not call GcHeap::collect_garbage or GcHeap::maybe_collect_garbage while that Gc<T> is held but not rooted.
make sure it is reachable from the roots declared in the garbage collection calls.

Fixing this rooting problem is non-trivial and explained in detail in this blog post by manishearth, the author of the gc crate on crates.io.

I'll do revisions to the library's API surface to try to mitigate this issue as much as I can, over time.

smarr · 2023-02-03T16:58:47Z

Oh, nice.

So, how does this work? You got Gc<T> as a wrapper around any struct/object that is heap allocated, and when ever you have anywhere a heap reference, it needs to be a Gc<T>, i.e., for instance stored in the SOM object's fields?

The problem you describe with missing roots is indeed pretty common.
It's often hard to make sure that C/C++ methods don't have anything on the C stack that is a root.

Some systems deal with that by having enough head room, and when they are in a code section that may have room, they prevent GC, i.e., delay it until it is save.

Hirevo · 2023-02-03T18:07:31Z

So, how does this work? You got Gc<T> as a wrapper around any struct/object that is heap allocated, and when ever you have anywhere a heap reference, it needs to be a Gc<T>, i.e., for instance stored in the SOM object's fields?

Yeah, every type that is GC-allocated is accessed through a Gc<T>, and the only way to construct one of these is using the GcHeap::allocate method, which internally allocates an instance of GcBox<T> for it.
The GcBox<T> struct (never actually seen by the user) type essentially constitutes a linked-list, in that it stores an instance of T, a boolean which counts as the mark bit, and a pointer to another GcBox.
The GcHeap keeps a reference to the head of that linked list and adds a new node to the front at each allocation.

For tracing, all types that want to be allocated in the GC heap must implement the Trace trait, which purpose is to call trace on all its members that may contain another Gc<T>.
Doing so allows the heap to mark all objects reachable from just a few roots.
The user decides which objects are root by passing a closure to GcHeap::collect_garbage that calls trace on them, like so:

let mut heap = GcHeap::new();

// Consider `MyType` to be a struct that implements the `Trace` trait.
let a: Gc<MyType> = heap.allocate(MyType::new());
let b: Gc<MyType> = heap.allocate(MyType::new());

heap.collect_garbage(|| {
    a.trace();
});

// Only objects reachable from `a` will be kept.  
// All other objects gets deallocated.

This example also shows why this library is unsound.
Here, b is still reachable (directly stored on the stack) but will still be deallocated, which is problematic because accessing it is now undefined behaviour.
This is because Rust doesn't have a mechanism to determine if an object is directly stored on the stack (in which case it should be considered rooted) or only accessible through some other object (in which case we definitely don't want to have it rooted).
So, I made the initial choice to never consider anything rooted, except for what the user specifies using the closure.

The gc crate, as described in the blog post I linked, tries to solve this issue by re-adding some amount of reference counting, modifying the Trace trait a bit to include the ability to root/unroot everything in an object, and having all mutations go through a special GcCell type to apply this unrooting when mutations happen.

But I am not sure which route I personally want to take yet.

Some systems deal with that by having enough head room, and when they are in a code section that may have room, they prevent GC, i.e., delay it until it is save.

Do you mean having code sections where the GC is guaranteed to not run ?
If so, this is kinda the thing I am doing right now (making sure to never trigger any re-collection when directly interacting with Gc<T> objects).
Right now, the only points where a collection can happen is after a message send (or super-send) and at the end of execution (when no stack frames are remaining).

smarr · 2023-02-03T22:58:32Z

where the GC is guaranteed to not run?

Yes, exactly.

I have seen systems that use things like:

 /* ... */
 disable_gc();
 
 /* mess with stuff */

 reenable_gc();

constitutes a linked-list

Hm, any specific reason to go with a linked list?
More usual would be an array that represents the heap.
Some GC also use dynamic pages, regions, segments, or what ever you want to call it, to avoid having a fixed heap size, but being able to expand it when needed.

Hirevo · 2023-02-05T15:45:23Z

Hm, any specific reason to go with a linked list? More usual would be an array that represents the heap. Some GC also use dynamic pages, regions, segments, or what ever you want to call it, to avoid having a fixed heap size, but being able to expand it when needed.

The reason it is a linked list is because the allocated types can be different, which means their size may not be the same, so I can't use a Vec<_>.
In the linked list, each GcBox<T> node can store a different type T because the next pointer's target is typed as GcBox<dyn Trace> which is syntax a fat pointer (called a trait object in Rust) to the next node.
Maybe it can be possible to store some additional information about what is allocated (like its size) and write a sort of custom vector type that can make sense of it to iterate faster.

smarr · 2023-02-05T21:35:11Z

The reason it is a linked list is because the allocated types can be different, which means their size may not be the same, so I can't use a Vec<_>.

Hm, I see. Yeah makes sense.

OctaveLarose · 2024-01-22T10:28:35Z

Hey Nicolas, I was taking another look at som-rs to see how its performance could be improved. Were there good performance gains from using this GC, have you measured anything? Though I'm not sure it's finished since you mention something about it stagnating in #37

Hirevo · 2024-02-07T22:00:16Z

I did some measurements, and currently the GC is a considerable performance hit.

On my machine, here are the speedup ratios compared to the current master branch, for both the system allocator (the default malloc implementation), and using jemalloc:

	GC (PR #33)	GC + NaN Boxing (PR #37)
System Allocator	0.61x ± 0.01 (0.49..0.82)	0.89x ± 0.01 (0.68..1.26)
jemalloc	0.60x ± 0.01 (0.51..0.83)	0.87x ± 0.01 (0.62..1.28)

Each of these numbers are the average speedup across the ReBench benchmarks.
The OS is EndeavourOS, based on Arch Linux, but I don't know the name of its default memory allocator.

So yeah, the GC is quite a bit slower than using Rc<T>, but I think this is just due to my bad implementation right now.
However, the NaN boxing trick helps recoup some of the lost performance, which is a trick I don't think is really doable with reference counting, but does not quite bring it to parity.

I mentioned that it was currently stagnating mainly due to my technical knownledge of memory allocators (I considered maybe writing my own malloc-equivalent for this GC at one point, but writing a good one can be very tricky).

I have some other easier ideas I'd like to implement to improve the GC, but I could not dedicate much time to this with my current situation, so it hasn't been done yet.

Hirevo · 2024-02-07T22:30:14Z

I've also noticed that the people over at the Software Development Team from King's College London recently forked this project and are apparently pursuing integrating Alloy (their own GC solution, using a custom Rust compiler toolchain) into SOM-RS.

I think it is a really cool prospect and I am quite excited to see how it turns out, and what differences will there be with yksom (their own SOM interpreter written in Rust).

Their GC solution, being integrated into the compiler and being able to influence the codegen, is likely to always be faster than whatever I'll be able to come up with on my own, using stable regular Rust.

But I don't think it will make me stop my efforts towards improving this current GC, even if it is just for my own learning.
And maybe there can still be some value in a simpler-to-compile SOM interpreter using just regular Rust (maybe for its availability to be used as library for other regular Rust programs).

som-rs-benchmarker · 2024-02-14T18:37:11Z

Here are the benchmark results for feature/custom-gc (commit: d3ec93d):

AST interpreter

+-----------------+----------------------------------------+---------------------------+
| Benchmark       | master (base)                          | feature/custom-gc (head)  |
+-----------------+----------------------------------------+---------------------------+
| Bounce          | 215.47 ms ± 20.62 (188.62..257.40)     | 1.05x ± 0.12 (0.92..1.14) |
| BubbleSort      | 306.33 ms ± 28.86 (275.77..346.03)     | 1.05x ± 0.12 (0.95..1.16) |
| DeltaBlue       | 169.15 ms ± 5.46 (159.32..178.14)      | 0.98x ± 0.06 (0.89..1.06) |
| Dispatch        | 205.83 ms ± 18.90 (183.61..244.01)     | 0.95x ± 0.12 (0.81..1.06) |
| Fannkuch        | 128.80 ms ± 7.17 (118.23..138.35)      | 1.02x ± 0.07 (0.97..1.07) |
| Fibonacci       | 394.99 ms ± 15.68 (369.45..421.96)     | 1.00x ± 0.06 (0.94..1.07) |
| FieldLoop       | 346.19 ms ± 13.38 (319.07..366.12)     | 0.97x ± 0.08 (0.86..1.07) |
| GraphSearch     | 108.48 ms ± 25.33 (85.29..154.03)      | 1.07x ± 0.29 (0.88..1.31) |
| IntegerLoop     | 373.44 ms ± 40.07 (325.97..446.79)     | 1.09x ± 0.13 (1.03..1.16) |
| JsonSmall       | 234.62 ms ± 23.27 (203.39..282.31)     | 0.98x ± 0.14 (0.84..1.18) |
| List            | 284.15 ms ± 21.41 (259.63..315.36)     | 1.06x ± 0.11 (0.93..1.20) |
| Loop            | 445.86 ms ± 30.04 (409.08..502.10)     | 0.93x ± 0.09 (0.83..1.06) |
| Mandelbrot      | 294.27 ms ± 22.18 (269.49..326.25)     | 1.00x ± 0.09 (0.91..1.06) |
| NBody           | 242.65 ms ± 28.84 (216.14..303.50)     | 1.10x ± 0.16 (0.90..1.18) |
| PageRank        | 309.45 ms ± 20.14 (286.80..348.96)     | 0.94x ± 0.12 (0.77..1.05) |
| Permute         | 315.06 ms ± 15.61 (292.36..345.72)     | 0.95x ± 0.09 (0.80..1.07) |
| Queens          | 274.42 ms ± 15.12 (251.92..297.82)     | 1.00x ± 0.12 (0.81..1.12) |
| QuickSort       | 82.41 ms ± 8.46 (73.36..99.99)         | 0.91x ± 0.17 (0.67..1.10) |
| Recurse         | 325.45 ms ± 33.22 (285.47..381.78)     | 1.15x ± 0.13 (1.07..1.21) |
| Richards        | 4211.18 ms ± 109.68 (3988.81..4359.05) | 0.99x ± 0.05 (0.93..1.07) |
| Sieve           | 480.67 ms ± 46.64 (438.69..576.55)     | 1.07x ± 0.12 (0.96..1.13) |
| Storage         | 94.10 ms ± 7.33 (84.40..109.75)        | 1.07x ± 0.12 (0.93..1.18) |
| Sum             | 202.30 ms ± 20.49 (177.50..231.97)     | 1.16x ± 0.13 (1.09..1.26) |
| Towers          | 357.01 ms ± 18.71 (335.36..386.98)     | 1.04x ± 0.09 (0.93..1.12) |
| TreeSort        | 180.79 ms ± 18.82 (158.20..218.76)     | 1.09x ± 0.12 (1.01..1.16) |
| WhileLoop       | 417.12 ms ± 65.73 (352.37..538.07)     | 1.08x ± 0.17 (1.01..1.14) |
|                 |                                        |                           |
| Average Speedup |               (baseline)               | 1.03x ± 0.02 (0.91..1.16) |
+-----------------+----------------------------------------+---------------------------+

The raw ReBench data files are available for download here: baseline and head

Bytecode interpreter

+-----------------+----------------------------------------+---------------------------+
| Benchmark       | master (base)                          | feature/custom-gc (head)  |
+-----------------+----------------------------------------+---------------------------+
| Bounce          | 96.67 ms ± 7.01 (88.45..112.06)        | 0.68x ± 0.13 (0.54..0.85) |
| BubbleSort      | 128.93 ms ± 4.72 (124.10..138.74)      | 0.70x ± 0.11 (0.59..0.89) |
| DeltaBlue       | 75.94 ms ± 10.38 (66.34..99.26)        | 0.82x ± 0.12 (0.75..0.88) |
| Dispatch        | 100.28 ms ± 6.18 (92.42..114.67)       | 0.78x ± 0.16 (0.53..1.00) |
| Fannkuch        | 62.01 ms ± 7.60 (53.36..76.68)         | 0.81x ± 0.17 (0.66..0.95) |
| Fibonacci       | 167.52 ms ± 5.94 (160.53..178.34)      | 0.78x ± 0.10 (0.66..0.91) |
| FieldLoop       | 246.66 ms ± 16.67 (215.35..269.30)     | 1.11x ± 0.17 (0.90..1.28) |
| GraphSearch     | 39.44 ms ± 1.39 (37.45..41.64)         | 0.72x ± 0.11 (0.60..0.87) |
| IntegerLoop     | 171.58 ms ± 10.53 (161.39..197.53)     | 0.73x ± 0.07 (0.65..0.81) |
| JsonSmall       | 109.43 ms ± 7.03 (102.24..123.64)      | 0.82x ± 0.08 (0.71..0.92) |
| List            | 134.41 ms ± 25.84 (116.54..198.67)     | 0.79x ± 0.18 (0.61..0.88) |
| Loop            | 228.59 ms ± 26.21 (208.00..298.72)     | 0.82x ± 0.13 (0.68..0.96) |
| Mandelbrot      | 132.83 ms ± 7.60 (122.99..148.90)      | 0.80x ± 0.09 (0.68..0.91) |
| NBody           | 96.90 ms ± 2.80 (93.50..101.11)        | 0.79x ± 0.08 (0.69..0.88) |
| PageRank        | 151.99 ms ± 5.07 (145.91..161.12)      | 0.90x ± 0.06 (0.82..0.99) |
| Permute         | 139.67 ms ± 6.77 (131.93..152.55)      | 0.79x ± 0.13 (0.60..0.92) |
| Queens          | 107.07 ms ± 6.02 (100.51..122.27)      | 0.88x ± 0.07 (0.80..0.96) |
| QuickSort       | 35.01 ms ± 2.44 (32.87..39.70)         | 0.86x ± 0.12 (0.66..1.01) |
| Recurse         | 144.22 ms ± 12.32 (131.37..166.53)     | 0.80x ± 0.09 (0.71..0.94) |
| Richards        | 1851.11 ms ± 104.07 (1761.21..2098.84) | 0.78x ± 0.05 (0.75..0.81) |
| Sieve           | 205.83 ms ± 20.51 (190.06..256.81)     | 0.80x ± 0.14 (0.62..0.94) |
| Storage         | 41.07 ms ± 2.60 (36.07..44.78)         | 0.76x ± 0.09 (0.66..0.86) |
| Sum             | 84.08 ms ± 5.98 (80.12..100.11)        | 0.79x ± 0.15 (0.58..1.03) |
| Towers          | 146.58 ms ± 7.84 (137.59..159.35)      | 0.70x ± 0.08 (0.58..0.82) |
| TreeSort        | 64.57 ms ± 10.87 (54.64..92.21)        | 0.77x ± 0.15 (0.63..0.88) |
| WhileLoop       | 214.85 ms ± 10.72 (201.23..238.67)     | 0.80x ± 0.11 (0.61..0.94) |
|                 |                                        |                           |
| Average Speedup |               (baseline)               | 0.80x ± 0.02 (0.68..1.11) |
+-----------------+----------------------------------------+---------------------------+

The raw ReBench data files are available for download here: baseline and head

The benchmarks were run using ReBench v1.2.0
The statistical analysis was done using rebench-tabler v0.1.0

The source code of this benchmark runner is available as a GitHub Gist for more details about the setup

feat: more trait impls for `Gc<T>`

som-rs-benchmarker · 2024-05-08T13:29:53Z

Here are the benchmark results for feature/custom-gc (commit: 4962f59):

AST interpreter

+-----------------+---------------------------------------+---------------------------+
| Benchmark       | master (base)                         | feature/custom-gc (head)  |
+-----------------+---------------------------------------+---------------------------+
| Bounce          | 214.29 ms ± 27.85 (181.60..267.68)    | 1.21x ± 0.16 (1.17..1.25) |
| BubbleSort      | 252.78 ms ± 12.46 (234.16..275.45)    | 0.98x ± 0.08 (0.90..1.06) |
| DeltaBlue       | 152.17 ms ± 13.05 (141.08..171.51)    | 1.02x ± 0.09 (0.98..1.06) |
| Dispatch        | 185.82 ms ± 14.32 (168.81..219.15)    | 1.06x ± 0.09 (0.99..1.10) |
| Fannkuch        | 117.86 ms ± 4.15 (112.29..126.81)     | 1.00x ± 0.05 (0.93..1.05) |
| Fibonacci       | 371.12 ms ± 21.15 (348.37..414.11)    | 1.02x ± 0.08 (0.93..1.10) |
| FieldLoop       | 329.77 ms ± 33.06 (295.38..387.05)    | 1.03x ± 0.12 (0.95..1.12) |
| GraphSearch     | 90.67 ms ± 17.56 (76.20..126.77)      | 1.10x ± 0.28 (0.78..1.24) |
| IntegerLoop     | 328.81 ms ± 37.93 (302.35..413.66)    | 1.07x ± 0.15 (0.88..1.16) |
| JsonSmall       | 211.82 ms ± 29.41 (183.34..263.59)    | 1.08x ± 0.20 (0.84..1.26) |
| List            | 225.45 ms ± 11.91 (216.27..255.59)    | 1.00x ± 0.07 (0.89..1.05) |
| Loop            | 413.33 ms ± 30.81 (382.78..478.32)    | 1.05x ± 0.09 (0.98..1.10) |
| Mandelbrot      | 251.55 ms ± 17.70 (237.90..294.94)    | 1.03x ± 0.08 (0.97..1.07) |
| NBody           | 205.25 ms ± 19.10 (186.71..245.28)    | 1.00x ± 0.11 (0.90..1.09) |
| PageRank        | 297.45 ms ± 26.12 (268.42..351.94)    | 1.05x ± 0.10 (0.96..1.11) |
| Permute         | 294.97 ms ± 18.65 (264.34..318.19)    | 1.05x ± 0.07 (0.98..1.07) |
| Queens          | 251.17 ms ± 44.70 (218.03..350.60)    | 1.13x ± 0.21 (1.03..1.18) |
| QuickSort       | 72.40 ms ± 5.28 (66.92..85.69)        | 0.99x ± 0.14 (0.76..1.08) |
| Recurse         | 276.79 ms ± 25.46 (251.21..336.65)    | 1.02x ± 0.12 (0.92..1.12) |
| Richards        | 3845.18 ms ± 86.41 (3734.67..4001.21) | 1.02x ± 0.03 (0.96..1.04) |
| Sieve           | 410.35 ms ± 35.75 (378.64..476.60)    | 1.05x ± 0.10 (0.99..1.09) |
| Storage         | 83.24 ms ± 7.10 (77.14..98.67)        | 0.94x ± 0.20 (0.68..1.11) |
| Sum             | 152.96 ms ± 5.63 (149.72..168.65)     | 0.98x ± 0.08 (0.87..1.04) |
| Towers          | 336.93 ms ± 30.87 (297.18..388.01)    | 1.15x ± 0.11 (1.12..1.20) |
| TreeSort        | 173.18 ms ± 31.82 (150.78..250.80)    | 1.14x ± 0.21 (1.09..1.19) |
| WhileLoop       | 364.80 ms ± 28.12 (332.74..410.44)    | 1.00x ± 0.11 (0.91..1.11) |
|                 |                                       |                           |
| Average Speedup |              (baseline)               | 1.05x ± 0.03 (0.94..1.21) |
+-----------------+---------------------------------------+---------------------------+

The raw ReBench data files are available for download here: baseline and head

Bytecode interpreter

+-----------------+---------------------------------------+---------------------------+
| Benchmark       | master (base)                         | feature/custom-gc (head)  |
+-----------------+---------------------------------------+---------------------------+
| Bounce          | 75.13 ms ± 9.06 (67.94..98.75)        | 0.69x ± 0.16 (0.48..0.87) |
| BubbleSort      | 99.70 ms ± 5.71 (96.34..115.58)       | 0.74x ± 0.10 (0.59..0.85) |
| DeltaBlue       | 57.00 ms ± 3.11 (55.00..65.30)        | 0.68x ± 0.11 (0.53..0.84) |
| Dispatch        | 83.28 ms ± 10.05 (76.09..103.87)      | 0.87x ± 0.14 (0.71..0.99) |
| Fannkuch        | 46.84 ms ± 1.95 (43.56..48.68)        | 0.67x ± 0.20 (0.47..0.92) |
| Fibonacci       | 128.30 ms ± 4.59 (123.42..139.37)     | 0.71x ± 0.12 (0.52..0.81) |
| FieldLoop       | 162.02 ms ± 23.80 (145.56..225.49)    | 1.07x ± 0.18 (0.94..1.19) |
| GraphSearch     | 31.80 ms ± 0.64 (30.38..32.58)        | 0.77x ± 0.20 (0.47..0.99) |
| IntegerLoop     | 140.27 ms ± 12.06 (132.92..171.54)    | 0.84x ± 0.09 (0.73..0.92) |
| JsonSmall       | 83.78 ms ± 6.44 (74.56..91.26)        | 0.78x ± 0.09 (0.69..0.89) |
| List            | 94.52 ms ± 6.21 (90.90..111.85)       | 0.75x ± 0.09 (0.62..0.81) |
| Loop            | 168.70 ms ± 5.24 (163.09..180.44)     | 0.83x ± 0.04 (0.76..0.87) |
| Mandelbrot      | 107.72 ms ± 5.04 (103.72..119.99)     | 0.78x ± 0.11 (0.64..0.89) |
| NBody           | 75.55 ms ± 2.22 (72.97..78.97)        | 0.78x ± 0.09 (0.63..0.87) |
| PageRank        | 112.84 ms ± 4.50 (109.17..123.91)     | 0.89x ± 0.06 (0.79..0.95) |
| Permute         | 109.68 ms ± 11.57 (100.28..130.90)    | 0.84x ± 0.14 (0.62..0.92) |
| Queens          | 100.43 ms ± 16.59 (82.56..136.36)     | 1.01x ± 0.18 (0.94..1.09) |
| QuickSort       | 27.11 ms ± 1.19 (25.97..30.03)        | 0.73x ± 0.20 (0.48..0.98) |
| Recurse         | 109.63 ms ± 9.91 (101.08..131.26)     | 0.81x ± 0.09 (0.72..0.87) |
| Richards        | 1386.07 ms ± 30.00 (1344.50..1424.16) | 0.73x ± 0.03 (0.69..0.79) |
| Sieve           | 155.83 ms ± 9.25 (146.23..179.90)     | 0.83x ± 0.10 (0.65..0.88) |
| Storage         | 32.34 ms ± 2.76 (30.07..38.03)        | 0.74x ± 0.14 (0.57..0.91) |
| Sum             | 70.57 ms ± 10.65 (64.40..99.53)       | 0.85x ± 0.21 (0.58..1.05) |
| Towers          | 119.81 ms ± 10.70 (110.79..147.37)    | 0.84x ± 0.15 (0.61..0.94) |
| TreeSort        | 50.33 ms ± 9.87 (45.26..77.32)        | 0.83x ± 0.18 (0.71..0.96) |
| WhileLoop       | 161.67 ms ± 4.61 (155.47..171.22)     | 0.90x ± 0.06 (0.81..0.98) |
|                 |                                       |                           |
| Average Speedup |              (baseline)               | 0.81x ± 0.03 (0.67..1.07) |
+-----------------+---------------------------------------+---------------------------+

The raw ReBench data files are available for download here: baseline and head

The benchmarks were run using ReBench v1.2.0
The statistical analysis was done using rebench-tabler v0.1.0

The source code of this benchmark runner is available as a GitHub Gist for more details about the setup

som-rs-benchmarker · 2024-05-08T13:37:18Z

Here are the benchmark results for feature/custom-gc (commit: 823b85a):

AST interpreter

+-----------------+----------------------------------------+---------------------------+
| Benchmark       | master (base)                          | feature/custom-gc (head)  |
+-----------------+----------------------------------------+---------------------------+
| Bounce          | 192.76 ms ± 10.12 (180.01..208.12)     | 1.04x ± 0.07 (0.96..1.09) |
| BubbleSort      | 256.40 ms ± 17.28 (235.15..298.05)     | 1.00x ± 0.09 (0.90..1.06) |
| DeltaBlue       | 152.07 ms ± 8.36 (141.31..167.91)      | 1.02x ± 0.06 (0.98..1.07) |
| Dispatch        | 172.69 ms ± 3.52 (168.23..179.17)      | 0.95x ± 0.08 (0.82..1.04) |
| Fannkuch        | 118.12 ms ± 5.74 (110.80..131.37)      | 0.91x ± 0.14 (0.72..1.04) |
| Fibonacci       | 363.83 ms ± 18.67 (338.53..395.18)     | 0.99x ± 0.07 (0.92..1.06) |
| FieldLoop       | 322.41 ms ± 35.85 (298.44..419.33)     | 1.04x ± 0.12 (1.02..1.10) |
| GraphSearch     | 79.24 ms ± 7.28 (74.54..99.63)         | 1.02x ± 0.11 (0.91..1.08) |
| IntegerLoop     | 340.80 ms ± 25.48 (308.46..385.77)     | 1.13x ± 0.10 (1.03..1.19) |
| JsonSmall       | 182.20 ms ± 11.27 (172.64..211.83)     | 0.95x ± 0.09 (0.86..1.04) |
| List            | 219.12 ms ± 13.34 (207.99..251.51)     | 0.93x ± 0.10 (0.82..1.04) |
| Loop            | 404.57 ms ± 18.68 (384.02..438.81)     | 1.02x ± 0.06 (0.95..1.07) |
| Mandelbrot      | 263.54 ms ± 20.87 (239.16..302.42)     | 1.05x ± 0.10 (0.98..1.12) |
| NBody           | 214.84 ms ± 13.78 (195.18..236.07)     | 1.01x ± 0.12 (0.86..1.10) |
| PageRank        | 293.02 ms ± 11.27 (282.00..313.20)     | 1.03x ± 0.06 (0.96..1.08) |
| Permute         | 316.38 ms ± 26.28 (287.95..373.73)     | 1.09x ± 0.12 (0.96..1.16) |
| Queens          | 234.57 ms ± 21.14 (217.46..289.94)     | 1.00x ± 0.10 (0.94..1.07) |
| QuickSort       | 81.13 ms ± 11.60 (68.27..106.47)       | 1.08x ± 0.22 (0.80..1.22) |
| Recurse         | 265.35 ms ± 17.71 (250.21..312.30)     | 1.03x ± 0.08 (0.96..1.06) |
| Richards        | 3945.47 ms ± 125.82 (3792.18..4148.18) | 1.02x ± 0.05 (0.95..1.08) |
| Sieve           | 410.70 ms ± 29.56 (377.38..453.74)     | 1.03x ± 0.10 (0.91..1.11) |
| Storage         | 82.96 ms ± 8.27 (75.59..101.91)        | 1.07x ± 0.11 (1.01..1.10) |
| Sum             | 162.83 ms ± 23.10 (147.62..222.44)     | 1.03x ± 0.20 (0.77..1.12) |
| Towers          | 315.17 ms ± 24.86 (285.88..358.88)     | 1.04x ± 0.09 (0.97..1.11) |
| TreeSort        | 154.31 ms ± 10.37 (143.92..177.76)     | 0.95x ± 0.13 (0.74..1.04) |
| WhileLoop       | 341.59 ms ± 13.33 (327.41..365.00)     | 1.00x ± 0.05 (0.94..1.04) |
|                 |                                        |                           |
| Average Speedup |               (baseline)               | 1.02x ± 0.02 (0.91..1.13) |
+-----------------+----------------------------------------+---------------------------+

The raw ReBench data files are available for download here: baseline and head

Bytecode interpreter

+-----------------+---------------------------------------+---------------------------+
| Benchmark       | master (base)                         | feature/custom-gc (head)  |
+-----------------+---------------------------------------+---------------------------+
| Bounce          | 69.61 ms ± 1.45 (66.82..71.78)        | 0.75x ± 0.08 (0.63..0.85) |
| BubbleSort      | 113.44 ms ± 13.75 (98.79..138.26)     | 0.90x ± 0.14 (0.71..1.01) |
| DeltaBlue       | 59.18 ms ± 1.56 (57.13..62.00)        | 0.70x ± 0.17 (0.46..0.83) |
| Dispatch        | 79.76 ms ± 6.10 (72.75..91.85)        | 0.78x ± 0.11 (0.60..0.87) |
| Fannkuch        | 45.13 ms ± 0.98 (44.13..47.30)        | 0.80x ± 0.07 (0.70..0.87) |
| Fibonacci       | 145.73 ms ± 17.09 (129.91..177.52)    | 0.89x ± 0.14 (0.77..0.99) |
| FieldLoop       | 152.83 ms ± 8.71 (144.30..175.06)     | 1.05x ± 0.09 (0.94..1.15) |
| GraphSearch     | 32.61 ms ± 2.84 (30.45..39.91)        | 0.85x ± 0.11 (0.70..0.99) |
| IntegerLoop     | 146.87 ms ± 12.17 (135.55..168.42)    | 0.94x ± 0.11 (0.80..1.05) |
| JsonSmall       | 82.91 ms ± 11.73 (75.92..113.73)      | 0.83x ± 0.13 (0.75..0.94) |
| List            | 104.68 ms ± 19.10 (91.94..152.73)     | 0.85x ± 0.17 (0.76..0.94) |
| Loop            | 184.68 ms ± 9.08 (170.80..200.56)     | 0.93x ± 0.05 (0.89..0.97) |
| Mandelbrot      | 114.56 ms ± 6.36 (107.64..127.29)     | 0.89x ± 0.15 (0.64..1.01) |
| NBody           | 76.77 ms ± 4.09 (73.93..86.84)        | 0.76x ± 0.09 (0.61..0.86) |
| PageRank        | 120.17 ms ± 10.67 (111.31..144.67)    | 0.83x ± 0.12 (0.66..0.92) |
| Permute         | 115.43 ms ± 8.80 (100.97..132.70)     | 0.85x ± 0.10 (0.74..0.96) |
| Queens          | 92.32 ms ± 9.92 (83.77..117.62)       | 0.91x ± 0.11 (0.86..1.03) |
| QuickSort       | 28.11 ms ± 1.61 (26.50..32.05)        | 0.81x ± 0.10 (0.68..0.97) |
| Recurse         | 109.22 ms ± 3.33 (103.51..112.85)     | 0.73x ± 0.09 (0.57..0.82) |
| Richards        | 1378.30 ms ± 26.78 (1339.09..1417.80) | 0.74x ± 0.02 (0.72..0.77) |
| Sieve           | 166.97 ms ± 12.05 (154.66..191.07)    | 0.88x ± 0.09 (0.78..0.98) |
| Storage         | 30.60 ms ± 0.85 (29.08..32.10)        | 0.77x ± 0.14 (0.55..0.92) |
| Sum             | 69.06 ms ± 6.04 (63.15..82.26)        | 0.82x ± 0.12 (0.65..0.94) |
| Towers          | 125.77 ms ± 10.85 (110.74..146.76)    | 0.93x ± 0.09 (0.84..0.98) |
| TreeSort        | 50.96 ms ± 10.12 (44.10..78.11)       | 0.91x ± 0.21 (0.69..1.04) |
| WhileLoop       | 173.20 ms ± 16.36 (153.06..197.25)    | 0.90x ± 0.13 (0.71..1.02) |
|                 |                                       |                           |
| Average Speedup |              (baseline)               | 0.85x ± 0.02 (0.70..1.05) |
+-----------------+---------------------------------------+---------------------------+

The raw ReBench data files are available for download here: baseline and head

The benchmarks were run using ReBench v1.2.0
The statistical analysis was done using rebench-tabler v0.1.0

The source code of this benchmark runner is available as a GitHub Gist for more details about the setup

Hirevo added C-enhancement Category: Enhancements M-interpreter Module: Interpreter P-medium Priority: Medium labels Feb 3, 2023

Hirevo self-assigned this Feb 3, 2023

Hirevo changed the title **som-gc**: Custom Mark-and-Sweep Garbage Collector som-gc: Custom Mark-and-Sweep Garbage Collector Feb 3, 2023

Hirevo force-pushed the feature/custom-gc branch from e1063d1 to 7ee4580 Compare November 10, 2023 17:10

Hirevo mentioned this pull request Nov 10, 2023

NaN-boxing of SOM values #37

Open

Hirevo added 13 commits May 8, 2024 14:23

feat: initial custom GC implementation

f8117ed

feat: GC heap can now allocate different types

b738e23

feat: GC configuration parameters

f4a2e99

feat: more trait impls for `Gc<T>`

feat: bytecode interpreter GC integration

1011d0f

fix: fixed wrong comment location

1b71437

feat: GC mark bits now cleared after sweeping

285e360

feat: adjusted when the GC is allowed to run

c3580ad

chore: adjusted README for the GC's introduction

ed80187

fix: fixed minor oversight

2847330

fix: removed superfluous PhantomData

90fe1cb

fix: fixed compiler errors

5fe855a

fix: fixed inline caches pointer checks

3bce35c

feat: GC scans now use a Vec

4962f59

Hirevo force-pushed the feature/custom-gc branch from d3ec93d to 4962f59 Compare May 8, 2024 13:29

fix: fixed compilation error in tests

823b85a

Hirevo mentioned this pull request May 8, 2024

Inline storage for instance fields and frame locals #44

Open

Hirevo mentioned this pull request May 15, 2024

GC Parallel Sweeping #45

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`som-gc`: Custom Mark-and-Sweep Garbage Collector #33

`som-gc`: Custom Mark-and-Sweep Garbage Collector #33

Hirevo commented Feb 3, 2023 •

edited

Loading

Hirevo commented Feb 3, 2023

smarr commented Feb 3, 2023

Hirevo commented Feb 3, 2023 •

edited

Loading

smarr commented Feb 3, 2023

Hirevo commented Feb 5, 2023 •

edited

Loading

smarr commented Feb 5, 2023

OctaveLarose commented Jan 22, 2024

Hirevo commented Feb 7, 2024

Hirevo commented Feb 7, 2024

som-rs-benchmarker bot commented Feb 14, 2024 •

edited

Loading

som-rs-benchmarker bot commented May 8, 2024 •

edited

Loading

som-rs-benchmarker bot commented May 8, 2024 •

edited

Loading

som-gc: Custom Mark-and-Sweep Garbage Collector #33

Are you sure you want to change the base?

som-gc: Custom Mark-and-Sweep Garbage Collector #33

Conversation

Hirevo commented Feb 3, 2023 • edited Loading

Hirevo commented Feb 3, 2023

smarr commented Feb 3, 2023

Hirevo commented Feb 3, 2023 • edited Loading

smarr commented Feb 3, 2023

Hirevo commented Feb 5, 2023 • edited Loading

smarr commented Feb 5, 2023

OctaveLarose commented Jan 22, 2024

Hirevo commented Feb 7, 2024

Hirevo commented Feb 7, 2024

som-rs-benchmarker bot commented Feb 14, 2024 • edited Loading

som-rs-benchmarker bot commented May 8, 2024 • edited Loading

som-rs-benchmarker bot commented May 8, 2024 • edited Loading

`som-gc`: Custom Mark-and-Sweep Garbage Collector #33

`som-gc`: Custom Mark-and-Sweep Garbage Collector #33

Hirevo commented Feb 3, 2023 •

edited

Loading

Hirevo commented Feb 3, 2023 •

edited

Loading

Hirevo commented Feb 5, 2023 •

edited

Loading

som-rs-benchmarker bot commented Feb 14, 2024 •

edited

Loading

som-rs-benchmarker bot commented May 8, 2024 •

edited

Loading

som-rs-benchmarker bot commented May 8, 2024 •

edited

Loading