Skip to content

fix: prevent quadratic complexity in emStrongLDelim regex#3906

Open
sammiee5311 wants to merge 1 commit intomarkedjs:masterfrom
sammiee5311:fix/emstrong-redos
Open

fix: prevent quadratic complexity in emStrongLDelim regex#3906
sammiee5311 wants to merge 1 commit intomarkedjs:masterfrom
sammiee5311:fix/emstrong-redos

Conversation

@sammiee5311
Copy link

@sammiee5311 sammiee5311 commented Mar 5, 2026

Hello,

I noticed the emStrongLDelim regex has similar quadratic complexity to the issue fixed in #3902. I looked for an existing report but only found exponential case, not quadratic. Since #3902 was also handled as a regular PR, I'm submitting this the same way.

Thanks.

Marked version: 17.0.4
Markdown flavor: n/a

Description

  • Fixes #### (if fixing a known issue; otherwise, describe issue using the following format)

The emStrongLDelim regex suffers from O(n) backtracking per .exec() call on long runs of _ or * followed by whitespace. Combined with the per-character inline tokenizer loop, this produces O(n²) total processing time.

Make suffix groups optional to eliminate O(n) backtracking per exec(). Skip entire delimiter run as text when no valid suffix is found to avoid the O(n²) amplification loop.

Input Time (before) Time (after)
'_'.repeat(10000) + ' a' ~790 ms <2 ms
'_'.repeat(50000) + ' a' ~20,500 ms <2 ms

PoC

const { marked } = require('marked');

const start = performance.now();
marked('_'.repeat(50000) + ' a');
console.log(performance.now() - start);

Contributor

  • Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR); or,
  • no tests required for this PR.
  • If submitting new feature, it has been documented in the appropriate places.

Committer

In most cases, this should be a different person than the contributor.

@vercel
Copy link

vercel bot commented Mar 5, 2026

@sammiee5311 is attempting to deploy a commit to the MarkedJS Team on Vercel.

A member of the Team first needs to authorize it.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a performance degradation in the marked library's tokenizer, specifically targeting the emStrongLDelim regular expression. By refining the regex and adjusting the tokenizer's handling of delimiter runs, it eliminates a quadratic complexity issue that could lead to significant processing delays for certain markdown inputs, effectively mitigating a potential ReDoS vulnerability.

Highlights

  • Performance Fix for emStrongLDelim Regex: Addressed a quadratic complexity issue (O(n²) processing time) in the emStrongLDelim regex, similar to a previously fixed ReDoS vulnerability. This significantly improves performance for long runs of _ or * followed by whitespace.
  • Regex Modification: Modified the emStrongLDelimCore regex in src/rules.ts to make suffix groups optional, which eliminates O(n) backtracking per exec() call.
  • Tokenizer Logic Update: Updated the emStrong method in src/Tokenizer.ts to handle cases where no valid suffix is found, returning the entire delimiter run as a Tokens.Text type to prevent O(n²) amplification.
  • New Regression Test: Introduced a new test case in test/specs/redos/quadratic_emstrong_delim.cjs to ensure the fix for quadratic complexity is maintained and to prevent regressions.
Changelog
  • src/Tokenizer.ts
    • Updated the return type of emStrong to include Tokens.Text.
    • Added logic to return a Tokens.Text token if the delimiter run has no valid suffix, skipping the entire run.
    • Adjusted array indexing for match to correctly access nextChar and prevChar conditions.
  • src/rules.ts
    • Modified emStrongLDelimCore regex to make the suffix groups optional using ?.
  • test/specs/redos/quadratic_emstrong_delim.cjs
    • Added a new test file with a long string of underscores followed by a space to test the quadratic complexity fix.
Activity
  • The author, sammiee5311, identified and addressed a quadratic complexity issue in the emStrongLDelim regex, similar to a previously fixed ReDoS vulnerability.
  • A proof-of-concept was provided in the PR description demonstrating the significant performance improvement.
  • The author confirmed the existence of tests to ensure functionality and minimize regression.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully addresses a quadratic complexity (ReDoS) vulnerability in the emStrongLDelim regular expression by updating the regex to make suffix groups optional and ensuring the tokenizer correctly handles delimiter runs without valid suffixes. A new test case has been added to prevent future regressions. However, the implementation in src/rules.ts introduces a critical syntax error due to an extra closing parenthesis in the regular expression, which will cause the library to crash upon initialization. Additionally, there is a high-severity issue in how the nextChar is determined, potentially leading to incorrect parsing of emphasis/strong markdown in certain scenarios. The PR also includes improvements to the intra-word underscore parsing logic in src/Tokenizer.ts to better align with CommonMark specifications.

src/Tokenizer.ts Outdated
if (!match) return;
if (!match[1] && !match[2] && !match[3] && !match[4]) {
// Delimiter run has no valid suffix — skip entire run as text
return {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just return here and let the text tokenizer create the text token

Copy link
Author

@sammiee5311 sammiee5311 Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,

Thanks for the review!

I tried changing it to simply return, but it still shows O(n²) behavior [1].

If that’s acceptable, and letting the text tokenizer create the text token is the expected behavior, I will update the implementation to just return.

Thanks!

[1]

Input Time (Unpatched) Time (Return only) Time (Current implementation)
10k ~790 ms ~267 ms <2 ms
20k ~3,150 ms ~1,032 ms <2 ms
50k ~20,500 ms ~6,269 ms <2 ms

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya since the user can change the text tokenizer we want to allow them to return the text token that they want and not have to edit the emStrong tokenizer when they just want to change the text tokenizer

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah,, I didn't think of that.

I've updated the code to just return.

Thanks!

Make suffix groups optional to eliminate O(n) backtracking per exec().
Return early when no valid suffix is found to avoid O(n^2)
amplification loop.
@vercel
Copy link

vercel bot commented Mar 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marked-website Ready Ready Preview, Comment Mar 7, 2026 2:26pm

Request Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants