Skip to content

Commit 832e7f2

Browse files
Add .NET 10 Preview 2 release notes -- Libraries and Runtime (#9770)
* Start runtime and library release notes * Add JIT notes * Update benchmarks * Fix array de-abstraction intro * Update release-notes/10.0/preview/preview2/runtime.md --------- Co-authored-by: Aman Khalid <[email protected]>
1 parent 2469bf7 commit 832e7f2

File tree

1 file changed

+161
-2
lines changed

1 file changed

+161
-2
lines changed

release-notes/10.0/preview/preview2/runtime.md

Lines changed: 161 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,165 @@
88

99
- [What's new in .NET 10](https://learn.microsoft.com/dotnet/core/whats-new/dotnet-10/overview) documentation
1010

11-
## Feature
11+
## Array Enumeration De-Abstraction
1212

13-
This is about the feature
13+
Preview 1 brought enhancements to the JIT compiler's devirtualization abilities for array interface methods; this was our first step in reducing the abstraction overhead of array iteration via enumerators. Preview 2 continues this effort with improvements to many other optimizations. Consider the following benchmarks:
14+
```csharp
15+
public class ArrayDeAbstraction
16+
{
17+
static readonly int[] array = new int[512];
18+
19+
[Benchmark(Baseline = true)]
20+
public int foreach_static_readonly_array()
21+
{
22+
int sum = 0;
23+
foreach (int i in array) sum += i;
24+
return sum;
25+
}
26+
27+
[Benchmark]
28+
public int foreach_static_readonly_array_via_interface()
29+
{
30+
IEnumerable<int> o = array;
31+
int sum = 0;
32+
foreach (int i in o) sum += i;
33+
return sum;
34+
}
35+
}
36+
```
37+
38+
In `foreach_static_readonly_array`, the type of `array` is transparent, so it is easy for the JIT to generate efficient code. In `foreach_static_readonly_array_via_interface`, the type of `array` is hidden behind an `IEnumerable`, introducing an object allocation and virtual calls for advancing and dereferencing the iterator. In .NET 9, this overhead impacts performance profoundly:
39+
```
40+
| Method | Mean | Ratio | Allocated |
41+
|------------------------------------------------------------- |-----------:|------:|----------:|
42+
| foreach_static_readonly_array (.NET 9) | 150.8 ns | 1.00 | - |
43+
| foreach_static_readonly_array_via_interface (.NET 9) | 851.8 ns | 5.65 | 32 B |
44+
```
45+
46+
Thanks to improvements to the JIT's inlining, stack allocation, and loop cloning abilities (all of which are detailed in [dotnet/runtime #108913](https://github.com/dotnet/runtime/issues/108913)), the object allocation is gone, and runtime impact has been reduced substantially:
47+
```
48+
| Method | Mean | Ratio | Allocated |
49+
|------------------------------------------------------------- |-----------:|------:|----------:|
50+
| foreach_static_readonly_array (.NET 9) | 150.8 ns | 1.00 | - |
51+
| foreach_static_readonly_array_via_interface (.NET 10) | 280.0 ns | 1.86 | - |
52+
```
53+
54+
We plan to close the gap entirely by ensuring the loop optimizations introduced in .NET 9 can kick in for these enumeration patterns. Now, let's consider a more challenging example:
55+
```csharp
56+
[MethodImpl(MethodImplOptions.NoInlining)]
57+
IEnumerable<int> get_opaque_array() => s_ro_array;
58+
59+
[Benchmark]
60+
public int foreach_opaque_array_via_interface()
61+
{
62+
IEnumerable<int> o = get_opaque_array();
63+
int sum = 0;
64+
foreach (int i in o) sum += i;
65+
return sum;
66+
}
67+
```
68+
69+
When compiling `foreach_opaque_array_via_interface`, the JIT does not know the underlying collection type. Fortunately, PGO data can tell the JIT what the likely type of the collection is, and via guarded devirtualization, the JIT can create a fast path under a test for this type. The benefits of PGO are significant, but it isn't enough to reach performance parity with the baseline:
70+
```
71+
| (.NET 9) Method | Mean | Ratio | Allocated |
72+
|------------------------------------------------------------- |-----------:|------:|----------:|
73+
| foreach_static_readonly_array | 153.4 ns | 1.00 | - |
74+
| foreach_opaque_array_via_interface | 843.2 ns | 5.50 | 32 B |
75+
| foreach_opaque_array_via_interface (no PGO) | 2,076.4 ns | 13.54 | 32 B |
76+
```
77+
78+
Notice how `foreach_opaque_array_via_interface` allocates memory on the heap, suggesting the JIT failed to stack-allocate and promote the enumerator to registers. This is because the JIT relies on a technique called escape analysis to enable stack allocation. Escape analysis determines if an object's lifetime can exceed that of its creation context; if the JIT can guarantee an object will not outlive the current method, it can safely allocate it on the stack. In the above example, calling an interface method on the enumerator to control iteration causes it to escape, as the call takes a reference to the enumerator object. On the fast path of the type test, the JIT can try to devirtualize and inline these interface calls to keep the enumerator from escaping. However, escape analysis typically considers the whole method context, so the slow path's reliance on interface calls prevents the JIT from stack-allocating the enumerator at all.
79+
80+
[dotnet/runtime #111473](https://github.com/dotnet/runtime/pull/111473) introduces conditional escape analysis -- a flow-sensitive form of the technique -- to the JIT. Conditional escape analysis can determine if an object will escape only on certain paths through the method, and prompt the JIT to create a fast path where the object never escapes. For array enumeration scenarios, conditional escape analysis reveals the enumerator will escape only when type tests for the collection fail, enabling the JIT to create a copy of the iteration code where the enumerator is stack-allocated and promoted. Once again, this reduces the abstraction cost considerably:
81+
```
82+
| Method | Mean | Ratio | Allocated |
83+
|------------------------------------------------------------- |-----------:|------:|----------:|
84+
| foreach_static_readonly_array (.NET 9) | 150.8 ns | 1.00 | - |
85+
| foreach_opaque_array_via_interface (.NET 9) | 874.7 ns | 5.80 | 32 B |
86+
| foreach_opaque_array_via_interface (.NET 10) | 277.9 ns | 1.84 | 32 B |
87+
```
88+
89+
## Inlining of Late Devirtualized Methods
90+
91+
The JIT compiler can replace virtual method calls with non-virtual equivalents when it can determine the exact type of the `this` object. However, this type information may not be available to the JIT unless a specific method call is inlined. Consider the following example:
92+
93+
```cs
94+
IC obj = GetObject();
95+
obj.M();
96+
97+
IC GetObject() => new C();
98+
99+
interface IC
100+
{
101+
void M();
102+
}
103+
class C : IC
104+
{
105+
public void M() => Console.WriteLine(42);
106+
}
107+
```
108+
109+
If the call to `GetObject` is not inlined, the JIT cannot determine that `obj` is actually of type `C` rather than `IC`, meaning the subsequent call `M()` on `obj` will not be devirtualized. **Late devirtualization** occurs when a call becomes eligible for devirtualization due to previous inlining. Devirtualizing a call can create new inlining opportunities, but previously, such opportunities were abandoned. With [dotnet/runtime #110827](https://github.com/dotnet/runtime/pull/110827) (credit: [@hez2010](https://github.com/hez2010)), the JIT can now inline these late devirtualized calls. Inlining a late devirtualized call can reveal more devirtualization opportunities, yielding even more inlining candidates and increasing optimization potential.
110+
111+
## Devirtualization Based on Inlining Observations
112+
113+
During inlining, a temporary variable may be created to hold the return value of the callee. With [dotnet/runtime #111948](https://github.com/dotnet/runtime/pull/111948) (credit: [@hez2010](https://github.com/hez2010)), the JIT now analyzes and updates the type of this temporary variable accordingly. If all return sites in a callee yield the same exact type, this precise type information is leveraged to devirtualize subsequent calls.
114+
115+
With the above two improvements, along with recent efforts to de-abstract array enumeration, the JIT can now devirtualize, inline, stack-allocate, and then perform struct promotion on arbitrary enumerators. This means the abstraction overhead can be entirely eliminated, even without PGO data. Consider the following example:
116+
117+
```cs
118+
var r = GetRangeEnumerable(0, 10);
119+
foreach (var i in r)
120+
{
121+
Console.WriteLine(i);
122+
}
123+
124+
static IEnumerable<int> GetRangeEnumerable(int start, int count) => new RangeEnumerable(start, count);
125+
126+
class RangeEnumerable(int start, int count) : IEnumerable<int>
127+
{
128+
public class RangeEnumerator(int start, int count) : IEnumerator<int>
129+
{
130+
private int _value = start - 1;
131+
public int Current => _value;
132+
object IEnumerator.Current => Current;
133+
public void Dispose() { }
134+
public bool MoveNext()
135+
{
136+
_value++;
137+
return count-- != 0;
138+
}
139+
public void Reset() => _value = start - 1;
140+
}
141+
142+
public IEnumerator<int> GetEnumerator() => new RangeEnumerator(start, count);
143+
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
144+
}
145+
```
146+
147+
The JIT now produces fully optimized code where all virtual calls are devirtualized and inlined. Additionally, thanks to escape analysis and struct promotion, the enumerator is stack-allocated and promoted to registers, resulting in zero heap allocations:
148+
149+
```asm
150+
...
151+
G_M27646_IG02:
152+
mov ebx, 10
153+
mov r15d, -1
154+
jmp SHORT G_M27646_IG04
155+
G_M27646_IG03:
156+
mov edi, r15d
157+
call [System.Console:WriteLine(int)]
158+
mov ebx, r14d
159+
G_M27646_IG04:
160+
inc r15d
161+
lea edi, [rbx-0x01]
162+
mov r14d, edi
163+
test ebx, ebx
164+
jne SHORT G_M27646_IG03
165+
...
166+
```
167+
168+
Check out the full codegen comparison between .NET 9 and .NET 10 [here](https://godbolt.org/z/9svq156Gj).
169+
170+
## Support for Casting and Negation in NativeAOT's Type Preinitializer
171+
172+
NativeAOT includes a type preinitializer that can execute type initializers -- in other words, static constructors -- without side effects at compile time using an IL interpreter. The results are then embedded directly into the binary, allowing the initializers to be omitted. With [dotnet/runtime #112073](https://github.com/dotnet/runtime/pull/112073) (credit: [@hez2010](https://github.com/hez2010)), support has been extended to cover all variants of the `conv.*` and `neg` opcodes, enabling preinitialization of methods that include casting or negation operations.

0 commit comments

Comments
 (0)