Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 7, 2026

Description

Ports the switch-based alternation optimization from RegexGenerator.Emitter.cs to RegexCompiler.cs. The source generator emits a C# switch statement for alternations where every branch begins with unique characters, relying on Roslyn to lower it to an IL switch when beneficial. This change adds the same optimization directly to the compiler using the Roslyn heuristic:

  • count >= 3 AND density >= 0.5 (where density = count / range)

Implementation:

  • TryEmitAlternationAsSwitch: Checks eligibility (atomic or no backtracking branches, unique starting chars, RightToLeft disabled) and applies Roslyn heuristic
  • EmitSwitchedBranches: Emits IL switch instruction, handles Multi/Set/Concatenate nodes by slicing off the first matched character

The optimization provides O(1) branch selection instead of sequential checking when the heuristic is satisfied.

Synchronization with Source Generator:

  • Ported the TryEmitAlternationAsSwitch refactoring back to the source generator to keep both implementations synchronized
  • Both implementations now use the same structure with early returns instead of local boolean flags

Customer Impact

Performance improvement for compiled regexes with alternations meeting the criteria. No functional change.

Regression

No, this is a new optimization bringing parity with the source generator.

Testing

All 30,496 functional tests pass. Added new test cases for alternation switch optimization covering 8-branch atomic alternations with unique starting characters, testing match, no-match, and partial input scenarios.

Risk

Low. The optimization only triggers under strict conditions matching the source generator's behavior, and falls back to existing code path otherwise.

Package authoring no longer needed in .NET 9

IMPORTANT: Starting with .NET 9, you no longer need to edit a NuGet package's csproj to enable building and bump the version.
Keep in mind that we still need package authoring in .NET 8 and older versions.

Original prompt

The regex source generator and the regex compiler are mostly in sync, but there are a few places where they've diverged. One in particular is

// Note: This optimization does not exist with RegexOptions.Compiled. Here we rely on the
// C# compiler to lower the C# switch statement with appropriate optimizations. In some
// cases there are enough branches that the compiler will emit a jump table. In others
// it'll optimize the order of checks in order to minimize the total number in the worst
// case. In any case, we get easier to read and reason about C#.
EmitSwitchedBranches();
, where the source generator has a special optimization for alternations where every branch provably begins with a different character, in which case it can emit a C# switch statement. That doesn't exist in the RegexCompiler ( ) because the optimization relies on the C# compiler's lowering of a C# switch to an IL switch but only when it's a perf win. Please port this optimization to the regex compiler, using the same heuristic the C# compiler (Roslyn) uses, which is if the count_of_values / max_value-min_value >= .5 and if count_of_values >= 7, it'll emit a switch... in this case, use the same heuristic to determine whether to do the optimization at all. No additional tests are needed, but all existing functional tests must pass.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI changed the title [WIP] Port optimization for regex alternations to regex compiler Port alternation switch optimization from source generator to RegexCompiler Jan 7, 2026
Copilot AI requested a review from stephentoub January 7, 2026 04:09
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

@stephentoub stephentoub marked this pull request as ready for review January 8, 2026 05:05
Copilot AI review requested due to automatic review settings January 8, 2026 05:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ports the switch-based alternation optimization from RegexGenerator.Emitter.cs to RegexCompiler.cs. The source generator has been emitting C# switch statements for alternations where every branch begins with unique characters, relying on Roslyn to lower it to an IL switch when beneficial. This change adds the same optimization directly to the compiler, using Roslyn's heuristic: emit an IL switch if count >= 7 and density >= 0.5 (where density = count / range).

Key changes:

  • Adds TryEmitAlternationAsSwitch method to check eligibility and apply the Roslyn heuristic
  • Adds EmitSwitchedBranches method to emit the IL switch instruction with proper bounds checking
  • Handles Multi/Set/Concatenate nodes by correctly slicing off the first matched character

Copilot AI and others added 2 commits January 23, 2026 00:00
…mpiler

Add switch-based optimization for alternations in RegexCompiler that matches
the source generator's behavior. The optimization applies when:
1. The alternation is atomic or no branch can backtrack
2. Not right-to-left matching
3. Every branch begins with unique character(s)

If the is count >= 3 AND density >= 0.5, this uses an IL switch instruction for efficient branch selection based on the first character of each alternation branch. Otherwise, it uses a cascading if/else based on the first character.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
@stephentoub stephentoub force-pushed the copilot/port-regex-optimization-to-compiler branch from 406a7d9 to 03e7c36 Compare January 23, 2026 05:01
@stephentoub
Copy link
Member

@EgorBot -amd -intel -arm

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Text.RegularExpressions;

BenchmarkSwitcher.FromAssembly(typeof(Benchmarks).Assembly).Run(args);

[MemoryDiagnoser(false)]
public partial class Benchmarks
{
    private Regex _regex;
    private string _haystack;

    [GlobalSetup]
    public async Task Setup()
    {
        using HttpClient client = new();
        _regex = new Regex(await client.GetStringAsync("https://raw.githubusercontent.com/BurntSushi/rebar/refs/heads/master/benchmarks/regexes/wild/date.txt"), RegexOptions.Compiled);
        _haystack = await client.GetStringAsync("https://github.com/BurntSushi/rebar/blob/master/benchmarks/haystacks/rust-src-tools-3b0d4813.txt");
    }

    [Benchmark]
    public int Count() => _regex.Count(_haystack);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants