Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

som-gc: Custom Mark-and-Sweep Garbage Collector #33

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

Hirevo
Copy link
Owner

@Hirevo Hirevo commented Feb 3, 2023

This PR introduces a new crate within the som-rs workspace, called som-gc.
It is an implementation of a mark-and-sweep garbage collector as a Rust library.

Automatic memory management and reclamation is a quite new territory for me, as I have been reading about and researching on my free time for a while now, but never wrote anything like it yet.
Therefore, the initial implementation in this PR is rather simple compared to what's eventually possible.
I plan to improve the implementation as time goes on and as my knowledge of better GC techniques grows.

This PR also already changes som-interpreter-bc to integrate this new GC, as a replacement of the reference-counting method that it previously used.
The integration is already complete, all instance of reference-counting is now gone and the SOM primitives that were previously unable to be implemented (System>>#fullGC and System>>#gcStats) are now available and passing the tests.

The performance of this GC, however, is a notable regression from reference-counting, from my initial measurements.
Since the implementation is rather simple right now, this was to be expected and I hope to make it better progressively over time.

The inclusion of a tracing garbage collector also allows to finally address the memory leak issue that could happen when reference cycles occurred (when two SOM values references each other, directly or indirectly).

@Hirevo Hirevo added C-enhancement Category: Enhancements M-interpreter Module: Interpreter P-medium Priority: Medium labels Feb 3, 2023
@Hirevo Hirevo self-assigned this Feb 3, 2023
@Hirevo
Copy link
Owner Author

Hirevo commented Feb 3, 2023

Something else to note is that the library, as implemented right now, is unsound.

This is because simply storing a Gc<T> directly on the stack does not make it rooted on its own.
So, a developer using som-gc right now must be careful to either:

  • not call GcHeap::collect_garbage or GcHeap::maybe_collect_garbage while that Gc<T> is held but not rooted.
  • make sure it is reachable from the roots declared in the garbage collection calls.

Fixing this rooting problem is non-trivial and explained in detail in this blog post by manishearth, the author of the gc crate on crates.io.

I'll do revisions to the library's API surface to try to mitigate this issue as much as I can, over time.

@Hirevo Hirevo changed the title **som-gc**: Custom Mark-and-Sweep Garbage Collector som-gc: Custom Mark-and-Sweep Garbage Collector Feb 3, 2023
@smarr
Copy link
Contributor

smarr commented Feb 3, 2023

Oh, nice.

So, how does this work? You got Gc<T> as a wrapper around any struct/object that is heap allocated, and when ever you have anywhere a heap reference, it needs to be a Gc<T>, i.e., for instance stored in the SOM object's fields?

The problem you describe with missing roots is indeed pretty common.
It's often hard to make sure that C/C++ methods don't have anything on the C stack that is a root.

Some systems deal with that by having enough head room, and when they are in a code section that may have room, they prevent GC, i.e., delay it until it is save.

@Hirevo
Copy link
Owner Author

Hirevo commented Feb 3, 2023

So, how does this work? You got Gc<T> as a wrapper around any struct/object that is heap allocated, and when ever you have anywhere a heap reference, it needs to be a Gc<T>, i.e., for instance stored in the SOM object's fields?

Yeah, every type that is GC-allocated is accessed through a Gc<T>, and the only way to construct one of these is using the GcHeap::allocate method, which internally allocates an instance of GcBox<T> for it.
The GcBox<T> struct (never actually seen by the user) type essentially constitutes a linked-list, in that it stores an instance of T, a boolean which counts as the mark bit, and a pointer to another GcBox.
The GcHeap keeps a reference to the head of that linked list and adds a new node to the front at each allocation.

For tracing, all types that want to be allocated in the GC heap must implement the Trace trait, which purpose is to call trace on all its members that may contain another Gc<T>.
Doing so allows the heap to mark all objects reachable from just a few roots.
The user decides which objects are root by passing a closure to GcHeap::collect_garbage that calls trace on them, like so:

let mut heap = GcHeap::new();

// Consider `MyType` to be a struct that implements the `Trace` trait.
let a: Gc<MyType> = heap.allocate(MyType::new());
let b: Gc<MyType> = heap.allocate(MyType::new());

heap.collect_garbage(|| {
    a.trace();
});

// Only objects reachable from `a` will be kept.  
// All other objects gets deallocated.  

This example also shows why this library is unsound.
Here, b is still reachable (directly stored on the stack) but will still be deallocated, which is problematic because accessing it is now undefined behaviour.
This is because Rust doesn't have a mechanism to determine if an object is directly stored on the stack (in which case it should be considered rooted) or only accessible through some other object (in which case we definitely don't want to have it rooted).
So, I made the initial choice to never consider anything rooted, except for what the user specifies using the closure.

The gc crate, as described in the blog post I linked, tries to solve this issue by re-adding some amount of reference counting, modifying the Trace trait a bit to include the ability to root/unroot everything in an object, and having all mutations go through a special GcCell type to apply this unrooting when mutations happen.

But I am not sure which route I personally want to take yet.

Some systems deal with that by having enough head room, and when they are in a code section that may have room, they prevent GC, i.e., delay it until it is save.

Do you mean having code sections where the GC is guaranteed to not run ?
If so, this is kinda the thing I am doing right now (making sure to never trigger any re-collection when directly interacting with Gc<T> objects).
Right now, the only points where a collection can happen is after a message send (or super-send) and at the end of execution (when no stack frames are remaining).

@smarr
Copy link
Contributor

smarr commented Feb 3, 2023

where the GC is guaranteed to not run?

Yes, exactly.

I have seen systems that use things like:

 /* ... */
 disable_gc();
 
 /* mess with stuff */

 reenable_gc();

constitutes a linked-list

Hm, any specific reason to go with a linked list?
More usual would be an array that represents the heap.
Some GC also use dynamic pages, regions, segments, or what ever you want to call it, to avoid having a fixed heap size, but being able to expand it when needed.

@Hirevo
Copy link
Owner Author

Hirevo commented Feb 5, 2023

Hm, any specific reason to go with a linked list? More usual would be an array that represents the heap. Some GC also use dynamic pages, regions, segments, or what ever you want to call it, to avoid having a fixed heap size, but being able to expand it when needed.

The reason it is a linked list is because the allocated types can be different, which means their size may not be the same, so I can't use a Vec<_>.
In the linked list, each GcBox<T> node can store a different type T because the next pointer's target is typed as GcBox<dyn Trace> which is syntax a fat pointer (called a trait object in Rust) to the next node.
Maybe it can be possible to store some additional information about what is allocated (like its size) and write a sort of custom vector type that can make sense of it to iterate faster.

@smarr
Copy link
Contributor

smarr commented Feb 5, 2023

The reason it is a linked list is because the allocated types can be different, which means their size may not be the same, so I can't use a Vec<_>.

Hm, I see. Yeah makes sense.

@OctaveLarose
Copy link
Contributor

Hey Nicolas, I was taking another look at som-rs to see how its performance could be improved. Were there good performance gains from using this GC, have you measured anything? Though I'm not sure it's finished since you mention something about it stagnating in #37

@Hirevo
Copy link
Owner Author

Hirevo commented Feb 7, 2024

I did some measurements, and currently the GC is a considerable performance hit.

On my machine, here are the speedup ratios compared to the current master branch, for both the system allocator (the default malloc implementation), and using jemalloc:

GC (PR #33) GC + NaN Boxing (PR #37)
System Allocator 0.61x ± 0.01 (0.49..0.82) 0.89x ± 0.01 (0.68..1.26)
jemalloc 0.60x ± 0.01 (0.51..0.83) 0.87x ± 0.01 (0.62..1.28)

Each of these numbers are the average speedup across the ReBench benchmarks.
The OS is EndeavourOS, based on Arch Linux, but I don't know the name of its default memory allocator.

So yeah, the GC is quite a bit slower than using Rc<T>, but I think this is just due to my bad implementation right now.
However, the NaN boxing trick helps recoup some of the lost performance, which is a trick I don't think is really doable with reference counting, but does not quite bring it to parity.

I mentioned that it was currently stagnating mainly due to my technical knownledge of memory allocators (I considered maybe writing my own malloc-equivalent for this GC at one point, but writing a good one can be very tricky).

I have some other easier ideas I'd like to implement to improve the GC, but I could not dedicate much time to this with my current situation, so it hasn't been done yet.

@Hirevo
Copy link
Owner Author

Hirevo commented Feb 7, 2024

I've also noticed that the people over at the Software Development Team from King's College London recently forked this project and are apparently pursuing integrating Alloy (their own GC solution, using a custom Rust compiler toolchain) into SOM-RS.

I think it is a really cool prospect and I am quite excited to see how it turns out, and what differences will there be with yksom (their own SOM interpreter written in Rust).

Their GC solution, being integrated into the compiler and being able to influence the codegen, is likely to always be faster than whatever I'll be able to come up with on my own, using stable regular Rust.

But I don't think it will make me stop my efforts towards improving this current GC, even if it is just for my own learning.
And maybe there can still be some value in a simpler-to-compile SOM interpreter using just regular Rust (maybe for its availability to be used as library for other regular Rust programs).

@som-rs-benchmarker
Copy link

som-rs-benchmarker bot commented Feb 14, 2024

Here are the benchmark results for feature/custom-gc (commit: d3ec93d):

AST interpreter
+-----------------+----------------------------------------+---------------------------+
| Benchmark       | master (base)                          | feature/custom-gc (head)  |
+-----------------+----------------------------------------+---------------------------+
| Bounce          | 215.47 ms ± 20.62 (188.62..257.40)     | 1.05x ± 0.12 (0.92..1.14) |
| BubbleSort      | 306.33 ms ± 28.86 (275.77..346.03)     | 1.05x ± 0.12 (0.95..1.16) |
| DeltaBlue       | 169.15 ms ± 5.46 (159.32..178.14)      | 0.98x ± 0.06 (0.89..1.06) |
| Dispatch        | 205.83 ms ± 18.90 (183.61..244.01)     | 0.95x ± 0.12 (0.81..1.06) |
| Fannkuch        | 128.80 ms ± 7.17 (118.23..138.35)      | 1.02x ± 0.07 (0.97..1.07) |
| Fibonacci       | 394.99 ms ± 15.68 (369.45..421.96)     | 1.00x ± 0.06 (0.94..1.07) |
| FieldLoop       | 346.19 ms ± 13.38 (319.07..366.12)     | 0.97x ± 0.08 (0.86..1.07) |
| GraphSearch     | 108.48 ms ± 25.33 (85.29..154.03)      | 1.07x ± 0.29 (0.88..1.31) |
| IntegerLoop     | 373.44 ms ± 40.07 (325.97..446.79)     | 1.09x ± 0.13 (1.03..1.16) |
| JsonSmall       | 234.62 ms ± 23.27 (203.39..282.31)     | 0.98x ± 0.14 (0.84..1.18) |
| List            | 284.15 ms ± 21.41 (259.63..315.36)     | 1.06x ± 0.11 (0.93..1.20) |
| Loop            | 445.86 ms ± 30.04 (409.08..502.10)     | 0.93x ± 0.09 (0.83..1.06) |
| Mandelbrot      | 294.27 ms ± 22.18 (269.49..326.25)     | 1.00x ± 0.09 (0.91..1.06) |
| NBody           | 242.65 ms ± 28.84 (216.14..303.50)     | 1.10x ± 0.16 (0.90..1.18) |
| PageRank        | 309.45 ms ± 20.14 (286.80..348.96)     | 0.94x ± 0.12 (0.77..1.05) |
| Permute         | 315.06 ms ± 15.61 (292.36..345.72)     | 0.95x ± 0.09 (0.80..1.07) |
| Queens          | 274.42 ms ± 15.12 (251.92..297.82)     | 1.00x ± 0.12 (0.81..1.12) |
| QuickSort       | 82.41 ms ± 8.46 (73.36..99.99)         | 0.91x ± 0.17 (0.67..1.10) |
| Recurse         | 325.45 ms ± 33.22 (285.47..381.78)     | 1.15x ± 0.13 (1.07..1.21) |
| Richards        | 4211.18 ms ± 109.68 (3988.81..4359.05) | 0.99x ± 0.05 (0.93..1.07) |
| Sieve           | 480.67 ms ± 46.64 (438.69..576.55)     | 1.07x ± 0.12 (0.96..1.13) |
| Storage         | 94.10 ms ± 7.33 (84.40..109.75)        | 1.07x ± 0.12 (0.93..1.18) |
| Sum             | 202.30 ms ± 20.49 (177.50..231.97)     | 1.16x ± 0.13 (1.09..1.26) |
| Towers          | 357.01 ms ± 18.71 (335.36..386.98)     | 1.04x ± 0.09 (0.93..1.12) |
| TreeSort        | 180.79 ms ± 18.82 (158.20..218.76)     | 1.09x ± 0.12 (1.01..1.16) |
| WhileLoop       | 417.12 ms ± 65.73 (352.37..538.07)     | 1.08x ± 0.17 (1.01..1.14) |
|                 |                                        |                           |
| Average Speedup |               (baseline)               | 1.03x ± 0.02 (0.91..1.16) |
+-----------------+----------------------------------------+---------------------------+

The raw ReBench data files are available for download here: baseline and head

Bytecode interpreter
+-----------------+----------------------------------------+---------------------------+
| Benchmark       | master (base)                          | feature/custom-gc (head)  |
+-----------------+----------------------------------------+---------------------------+
| Bounce          | 96.67 ms ± 7.01 (88.45..112.06)        | 0.68x ± 0.13 (0.54..0.85) |
| BubbleSort      | 128.93 ms ± 4.72 (124.10..138.74)      | 0.70x ± 0.11 (0.59..0.89) |
| DeltaBlue       | 75.94 ms ± 10.38 (66.34..99.26)        | 0.82x ± 0.12 (0.75..0.88) |
| Dispatch        | 100.28 ms ± 6.18 (92.42..114.67)       | 0.78x ± 0.16 (0.53..1.00) |
| Fannkuch        | 62.01 ms ± 7.60 (53.36..76.68)         | 0.81x ± 0.17 (0.66..0.95) |
| Fibonacci       | 167.52 ms ± 5.94 (160.53..178.34)      | 0.78x ± 0.10 (0.66..0.91) |
| FieldLoop       | 246.66 ms ± 16.67 (215.35..269.30)     | 1.11x ± 0.17 (0.90..1.28) |
| GraphSearch     | 39.44 ms ± 1.39 (37.45..41.64)         | 0.72x ± 0.11 (0.60..0.87) |
| IntegerLoop     | 171.58 ms ± 10.53 (161.39..197.53)     | 0.73x ± 0.07 (0.65..0.81) |
| JsonSmall       | 109.43 ms ± 7.03 (102.24..123.64)      | 0.82x ± 0.08 (0.71..0.92) |
| List            | 134.41 ms ± 25.84 (116.54..198.67)     | 0.79x ± 0.18 (0.61..0.88) |
| Loop            | 228.59 ms ± 26.21 (208.00..298.72)     | 0.82x ± 0.13 (0.68..0.96) |
| Mandelbrot      | 132.83 ms ± 7.60 (122.99..148.90)      | 0.80x ± 0.09 (0.68..0.91) |
| NBody           | 96.90 ms ± 2.80 (93.50..101.11)        | 0.79x ± 0.08 (0.69..0.88) |
| PageRank        | 151.99 ms ± 5.07 (145.91..161.12)      | 0.90x ± 0.06 (0.82..0.99) |
| Permute         | 139.67 ms ± 6.77 (131.93..152.55)      | 0.79x ± 0.13 (0.60..0.92) |
| Queens          | 107.07 ms ± 6.02 (100.51..122.27)      | 0.88x ± 0.07 (0.80..0.96) |
| QuickSort       | 35.01 ms ± 2.44 (32.87..39.70)         | 0.86x ± 0.12 (0.66..1.01) |
| Recurse         | 144.22 ms ± 12.32 (131.37..166.53)     | 0.80x ± 0.09 (0.71..0.94) |
| Richards        | 1851.11 ms ± 104.07 (1761.21..2098.84) | 0.78x ± 0.05 (0.75..0.81) |
| Sieve           | 205.83 ms ± 20.51 (190.06..256.81)     | 0.80x ± 0.14 (0.62..0.94) |
| Storage         | 41.07 ms ± 2.60 (36.07..44.78)         | 0.76x ± 0.09 (0.66..0.86) |
| Sum             | 84.08 ms ± 5.98 (80.12..100.11)        | 0.79x ± 0.15 (0.58..1.03) |
| Towers          | 146.58 ms ± 7.84 (137.59..159.35)      | 0.70x ± 0.08 (0.58..0.82) |
| TreeSort        | 64.57 ms ± 10.87 (54.64..92.21)        | 0.77x ± 0.15 (0.63..0.88) |
| WhileLoop       | 214.85 ms ± 10.72 (201.23..238.67)     | 0.80x ± 0.11 (0.61..0.94) |
|                 |                                        |                           |
| Average Speedup |               (baseline)               | 0.80x ± 0.02 (0.68..1.11) |
+-----------------+----------------------------------------+---------------------------+

The raw ReBench data files are available for download here: baseline and head

The benchmarks were run using ReBench v1.2.0
The statistical analysis was done using rebench-tabler v0.1.0

The source code of this benchmark runner is available as a GitHub Gist for more details about the setup

@Hirevo Hirevo force-pushed the feature/custom-gc branch from d3ec93d to 4962f59 Compare May 8, 2024 13:29
@som-rs-benchmarker
Copy link

som-rs-benchmarker bot commented May 8, 2024

Here are the benchmark results for feature/custom-gc (commit: 4962f59):

AST interpreter
+-----------------+---------------------------------------+---------------------------+
| Benchmark       | master (base)                         | feature/custom-gc (head)  |
+-----------------+---------------------------------------+---------------------------+
| Bounce          | 214.29 ms ± 27.85 (181.60..267.68)    | 1.21x ± 0.16 (1.17..1.25) |
| BubbleSort      | 252.78 ms ± 12.46 (234.16..275.45)    | 0.98x ± 0.08 (0.90..1.06) |
| DeltaBlue       | 152.17 ms ± 13.05 (141.08..171.51)    | 1.02x ± 0.09 (0.98..1.06) |
| Dispatch        | 185.82 ms ± 14.32 (168.81..219.15)    | 1.06x ± 0.09 (0.99..1.10) |
| Fannkuch        | 117.86 ms ± 4.15 (112.29..126.81)     | 1.00x ± 0.05 (0.93..1.05) |
| Fibonacci       | 371.12 ms ± 21.15 (348.37..414.11)    | 1.02x ± 0.08 (0.93..1.10) |
| FieldLoop       | 329.77 ms ± 33.06 (295.38..387.05)    | 1.03x ± 0.12 (0.95..1.12) |
| GraphSearch     | 90.67 ms ± 17.56 (76.20..126.77)      | 1.10x ± 0.28 (0.78..1.24) |
| IntegerLoop     | 328.81 ms ± 37.93 (302.35..413.66)    | 1.07x ± 0.15 (0.88..1.16) |
| JsonSmall       | 211.82 ms ± 29.41 (183.34..263.59)    | 1.08x ± 0.20 (0.84..1.26) |
| List            | 225.45 ms ± 11.91 (216.27..255.59)    | 1.00x ± 0.07 (0.89..1.05) |
| Loop            | 413.33 ms ± 30.81 (382.78..478.32)    | 1.05x ± 0.09 (0.98..1.10) |
| Mandelbrot      | 251.55 ms ± 17.70 (237.90..294.94)    | 1.03x ± 0.08 (0.97..1.07) |
| NBody           | 205.25 ms ± 19.10 (186.71..245.28)    | 1.00x ± 0.11 (0.90..1.09) |
| PageRank        | 297.45 ms ± 26.12 (268.42..351.94)    | 1.05x ± 0.10 (0.96..1.11) |
| Permute         | 294.97 ms ± 18.65 (264.34..318.19)    | 1.05x ± 0.07 (0.98..1.07) |
| Queens          | 251.17 ms ± 44.70 (218.03..350.60)    | 1.13x ± 0.21 (1.03..1.18) |
| QuickSort       | 72.40 ms ± 5.28 (66.92..85.69)        | 0.99x ± 0.14 (0.76..1.08) |
| Recurse         | 276.79 ms ± 25.46 (251.21..336.65)    | 1.02x ± 0.12 (0.92..1.12) |
| Richards        | 3845.18 ms ± 86.41 (3734.67..4001.21) | 1.02x ± 0.03 (0.96..1.04) |
| Sieve           | 410.35 ms ± 35.75 (378.64..476.60)    | 1.05x ± 0.10 (0.99..1.09) |
| Storage         | 83.24 ms ± 7.10 (77.14..98.67)        | 0.94x ± 0.20 (0.68..1.11) |
| Sum             | 152.96 ms ± 5.63 (149.72..168.65)     | 0.98x ± 0.08 (0.87..1.04) |
| Towers          | 336.93 ms ± 30.87 (297.18..388.01)    | 1.15x ± 0.11 (1.12..1.20) |
| TreeSort        | 173.18 ms ± 31.82 (150.78..250.80)    | 1.14x ± 0.21 (1.09..1.19) |
| WhileLoop       | 364.80 ms ± 28.12 (332.74..410.44)    | 1.00x ± 0.11 (0.91..1.11) |
|                 |                                       |                           |
| Average Speedup |              (baseline)               | 1.05x ± 0.03 (0.94..1.21) |
+-----------------+---------------------------------------+---------------------------+

The raw ReBench data files are available for download here: baseline and head

Bytecode interpreter
+-----------------+---------------------------------------+---------------------------+
| Benchmark       | master (base)                         | feature/custom-gc (head)  |
+-----------------+---------------------------------------+---------------------------+
| Bounce          | 75.13 ms ± 9.06 (67.94..98.75)        | 0.69x ± 0.16 (0.48..0.87) |
| BubbleSort      | 99.70 ms ± 5.71 (96.34..115.58)       | 0.74x ± 0.10 (0.59..0.85) |
| DeltaBlue       | 57.00 ms ± 3.11 (55.00..65.30)        | 0.68x ± 0.11 (0.53..0.84) |
| Dispatch        | 83.28 ms ± 10.05 (76.09..103.87)      | 0.87x ± 0.14 (0.71..0.99) |
| Fannkuch        | 46.84 ms ± 1.95 (43.56..48.68)        | 0.67x ± 0.20 (0.47..0.92) |
| Fibonacci       | 128.30 ms ± 4.59 (123.42..139.37)     | 0.71x ± 0.12 (0.52..0.81) |
| FieldLoop       | 162.02 ms ± 23.80 (145.56..225.49)    | 1.07x ± 0.18 (0.94..1.19) |
| GraphSearch     | 31.80 ms ± 0.64 (30.38..32.58)        | 0.77x ± 0.20 (0.47..0.99) |
| IntegerLoop     | 140.27 ms ± 12.06 (132.92..171.54)    | 0.84x ± 0.09 (0.73..0.92) |
| JsonSmall       | 83.78 ms ± 6.44 (74.56..91.26)        | 0.78x ± 0.09 (0.69..0.89) |
| List            | 94.52 ms ± 6.21 (90.90..111.85)       | 0.75x ± 0.09 (0.62..0.81) |
| Loop            | 168.70 ms ± 5.24 (163.09..180.44)     | 0.83x ± 0.04 (0.76..0.87) |
| Mandelbrot      | 107.72 ms ± 5.04 (103.72..119.99)     | 0.78x ± 0.11 (0.64..0.89) |
| NBody           | 75.55 ms ± 2.22 (72.97..78.97)        | 0.78x ± 0.09 (0.63..0.87) |
| PageRank        | 112.84 ms ± 4.50 (109.17..123.91)     | 0.89x ± 0.06 (0.79..0.95) |
| Permute         | 109.68 ms ± 11.57 (100.28..130.90)    | 0.84x ± 0.14 (0.62..0.92) |
| Queens          | 100.43 ms ± 16.59 (82.56..136.36)     | 1.01x ± 0.18 (0.94..1.09) |
| QuickSort       | 27.11 ms ± 1.19 (25.97..30.03)        | 0.73x ± 0.20 (0.48..0.98) |
| Recurse         | 109.63 ms ± 9.91 (101.08..131.26)     | 0.81x ± 0.09 (0.72..0.87) |
| Richards        | 1386.07 ms ± 30.00 (1344.50..1424.16) | 0.73x ± 0.03 (0.69..0.79) |
| Sieve           | 155.83 ms ± 9.25 (146.23..179.90)     | 0.83x ± 0.10 (0.65..0.88) |
| Storage         | 32.34 ms ± 2.76 (30.07..38.03)        | 0.74x ± 0.14 (0.57..0.91) |
| Sum             | 70.57 ms ± 10.65 (64.40..99.53)       | 0.85x ± 0.21 (0.58..1.05) |
| Towers          | 119.81 ms ± 10.70 (110.79..147.37)    | 0.84x ± 0.15 (0.61..0.94) |
| TreeSort        | 50.33 ms ± 9.87 (45.26..77.32)        | 0.83x ± 0.18 (0.71..0.96) |
| WhileLoop       | 161.67 ms ± 4.61 (155.47..171.22)     | 0.90x ± 0.06 (0.81..0.98) |
|                 |                                       |                           |
| Average Speedup |              (baseline)               | 0.81x ± 0.03 (0.67..1.07) |
+-----------------+---------------------------------------+---------------------------+

The raw ReBench data files are available for download here: baseline and head

The benchmarks were run using ReBench v1.2.0
The statistical analysis was done using rebench-tabler v0.1.0

The source code of this benchmark runner is available as a GitHub Gist for more details about the setup

@som-rs-benchmarker
Copy link

som-rs-benchmarker bot commented May 8, 2024

Here are the benchmark results for feature/custom-gc (commit: 823b85a):

AST interpreter
+-----------------+----------------------------------------+---------------------------+
| Benchmark       | master (base)                          | feature/custom-gc (head)  |
+-----------------+----------------------------------------+---------------------------+
| Bounce          | 192.76 ms ± 10.12 (180.01..208.12)     | 1.04x ± 0.07 (0.96..1.09) |
| BubbleSort      | 256.40 ms ± 17.28 (235.15..298.05)     | 1.00x ± 0.09 (0.90..1.06) |
| DeltaBlue       | 152.07 ms ± 8.36 (141.31..167.91)      | 1.02x ± 0.06 (0.98..1.07) |
| Dispatch        | 172.69 ms ± 3.52 (168.23..179.17)      | 0.95x ± 0.08 (0.82..1.04) |
| Fannkuch        | 118.12 ms ± 5.74 (110.80..131.37)      | 0.91x ± 0.14 (0.72..1.04) |
| Fibonacci       | 363.83 ms ± 18.67 (338.53..395.18)     | 0.99x ± 0.07 (0.92..1.06) |
| FieldLoop       | 322.41 ms ± 35.85 (298.44..419.33)     | 1.04x ± 0.12 (1.02..1.10) |
| GraphSearch     | 79.24 ms ± 7.28 (74.54..99.63)         | 1.02x ± 0.11 (0.91..1.08) |
| IntegerLoop     | 340.80 ms ± 25.48 (308.46..385.77)     | 1.13x ± 0.10 (1.03..1.19) |
| JsonSmall       | 182.20 ms ± 11.27 (172.64..211.83)     | 0.95x ± 0.09 (0.86..1.04) |
| List            | 219.12 ms ± 13.34 (207.99..251.51)     | 0.93x ± 0.10 (0.82..1.04) |
| Loop            | 404.57 ms ± 18.68 (384.02..438.81)     | 1.02x ± 0.06 (0.95..1.07) |
| Mandelbrot      | 263.54 ms ± 20.87 (239.16..302.42)     | 1.05x ± 0.10 (0.98..1.12) |
| NBody           | 214.84 ms ± 13.78 (195.18..236.07)     | 1.01x ± 0.12 (0.86..1.10) |
| PageRank        | 293.02 ms ± 11.27 (282.00..313.20)     | 1.03x ± 0.06 (0.96..1.08) |
| Permute         | 316.38 ms ± 26.28 (287.95..373.73)     | 1.09x ± 0.12 (0.96..1.16) |
| Queens          | 234.57 ms ± 21.14 (217.46..289.94)     | 1.00x ± 0.10 (0.94..1.07) |
| QuickSort       | 81.13 ms ± 11.60 (68.27..106.47)       | 1.08x ± 0.22 (0.80..1.22) |
| Recurse         | 265.35 ms ± 17.71 (250.21..312.30)     | 1.03x ± 0.08 (0.96..1.06) |
| Richards        | 3945.47 ms ± 125.82 (3792.18..4148.18) | 1.02x ± 0.05 (0.95..1.08) |
| Sieve           | 410.70 ms ± 29.56 (377.38..453.74)     | 1.03x ± 0.10 (0.91..1.11) |
| Storage         | 82.96 ms ± 8.27 (75.59..101.91)        | 1.07x ± 0.11 (1.01..1.10) |
| Sum             | 162.83 ms ± 23.10 (147.62..222.44)     | 1.03x ± 0.20 (0.77..1.12) |
| Towers          | 315.17 ms ± 24.86 (285.88..358.88)     | 1.04x ± 0.09 (0.97..1.11) |
| TreeSort        | 154.31 ms ± 10.37 (143.92..177.76)     | 0.95x ± 0.13 (0.74..1.04) |
| WhileLoop       | 341.59 ms ± 13.33 (327.41..365.00)     | 1.00x ± 0.05 (0.94..1.04) |
|                 |                                        |                           |
| Average Speedup |               (baseline)               | 1.02x ± 0.02 (0.91..1.13) |
+-----------------+----------------------------------------+---------------------------+

The raw ReBench data files are available for download here: baseline and head

Bytecode interpreter
+-----------------+---------------------------------------+---------------------------+
| Benchmark       | master (base)                         | feature/custom-gc (head)  |
+-----------------+---------------------------------------+---------------------------+
| Bounce          | 69.61 ms ± 1.45 (66.82..71.78)        | 0.75x ± 0.08 (0.63..0.85) |
| BubbleSort      | 113.44 ms ± 13.75 (98.79..138.26)     | 0.90x ± 0.14 (0.71..1.01) |
| DeltaBlue       | 59.18 ms ± 1.56 (57.13..62.00)        | 0.70x ± 0.17 (0.46..0.83) |
| Dispatch        | 79.76 ms ± 6.10 (72.75..91.85)        | 0.78x ± 0.11 (0.60..0.87) |
| Fannkuch        | 45.13 ms ± 0.98 (44.13..47.30)        | 0.80x ± 0.07 (0.70..0.87) |
| Fibonacci       | 145.73 ms ± 17.09 (129.91..177.52)    | 0.89x ± 0.14 (0.77..0.99) |
| FieldLoop       | 152.83 ms ± 8.71 (144.30..175.06)     | 1.05x ± 0.09 (0.94..1.15) |
| GraphSearch     | 32.61 ms ± 2.84 (30.45..39.91)        | 0.85x ± 0.11 (0.70..0.99) |
| IntegerLoop     | 146.87 ms ± 12.17 (135.55..168.42)    | 0.94x ± 0.11 (0.80..1.05) |
| JsonSmall       | 82.91 ms ± 11.73 (75.92..113.73)      | 0.83x ± 0.13 (0.75..0.94) |
| List            | 104.68 ms ± 19.10 (91.94..152.73)     | 0.85x ± 0.17 (0.76..0.94) |
| Loop            | 184.68 ms ± 9.08 (170.80..200.56)     | 0.93x ± 0.05 (0.89..0.97) |
| Mandelbrot      | 114.56 ms ± 6.36 (107.64..127.29)     | 0.89x ± 0.15 (0.64..1.01) |
| NBody           | 76.77 ms ± 4.09 (73.93..86.84)        | 0.76x ± 0.09 (0.61..0.86) |
| PageRank        | 120.17 ms ± 10.67 (111.31..144.67)    | 0.83x ± 0.12 (0.66..0.92) |
| Permute         | 115.43 ms ± 8.80 (100.97..132.70)     | 0.85x ± 0.10 (0.74..0.96) |
| Queens          | 92.32 ms ± 9.92 (83.77..117.62)       | 0.91x ± 0.11 (0.86..1.03) |
| QuickSort       | 28.11 ms ± 1.61 (26.50..32.05)        | 0.81x ± 0.10 (0.68..0.97) |
| Recurse         | 109.22 ms ± 3.33 (103.51..112.85)     | 0.73x ± 0.09 (0.57..0.82) |
| Richards        | 1378.30 ms ± 26.78 (1339.09..1417.80) | 0.74x ± 0.02 (0.72..0.77) |
| Sieve           | 166.97 ms ± 12.05 (154.66..191.07)    | 0.88x ± 0.09 (0.78..0.98) |
| Storage         | 30.60 ms ± 0.85 (29.08..32.10)        | 0.77x ± 0.14 (0.55..0.92) |
| Sum             | 69.06 ms ± 6.04 (63.15..82.26)        | 0.82x ± 0.12 (0.65..0.94) |
| Towers          | 125.77 ms ± 10.85 (110.74..146.76)    | 0.93x ± 0.09 (0.84..0.98) |
| TreeSort        | 50.96 ms ± 10.12 (44.10..78.11)       | 0.91x ± 0.21 (0.69..1.04) |
| WhileLoop       | 173.20 ms ± 16.36 (153.06..197.25)    | 0.90x ± 0.13 (0.71..1.02) |
|                 |                                       |                           |
| Average Speedup |              (baseline)               | 0.85x ± 0.02 (0.70..1.05) |
+-----------------+---------------------------------------+---------------------------+

The raw ReBench data files are available for download here: baseline and head

The benchmarks were run using ReBench v1.2.0
The statistical analysis was done using rebench-tabler v0.1.0

The source code of this benchmark runner is available as a GitHub Gist for more details about the setup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: Enhancements M-interpreter Module: Interpreter P-medium Priority: Medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants