fix: prevent quadratic complexity in emStrongLDelim regex#3906
fix: prevent quadratic complexity in emStrongLDelim regex#3906sammiee5311 wants to merge 1 commit intomarkedjs:masterfrom
Conversation
|
@sammiee5311 is attempting to deploy a commit to the MarkedJS Team on Vercel. A member of the Team first needs to authorize it. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a performance degradation in the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request successfully addresses a quadratic complexity (ReDoS) vulnerability in the emStrongLDelim regular expression by updating the regex to make suffix groups optional and ensuring the tokenizer correctly handles delimiter runs without valid suffixes. A new test case has been added to prevent future regressions. However, the implementation in src/rules.ts introduces a critical syntax error due to an extra closing parenthesis in the regular expression, which will cause the library to crash upon initialization. Additionally, there is a high-severity issue in how the nextChar is determined, potentially leading to incorrect parsing of emphasis/strong markdown in certain scenarios. The PR also includes improvements to the intra-word underscore parsing logic in src/Tokenizer.ts to better align with CommonMark specifications.
src/Tokenizer.ts
Outdated
| if (!match) return; | ||
| if (!match[1] && !match[2] && !match[3] && !match[4]) { | ||
| // Delimiter run has no valid suffix — skip entire run as text | ||
| return { |
There was a problem hiding this comment.
We should just return here and let the text tokenizer create the text token
There was a problem hiding this comment.
Hello,
Thanks for the review!
I tried changing it to simply return, but it still shows O(n²) behavior [1].
If that’s acceptable, and letting the text tokenizer create the text token is the expected behavior, I will update the implementation to just return.
Thanks!
[1]
| Input | Time (Unpatched) | Time (Return only) | Time (Current implementation) |
|---|---|---|---|
| 10k | ~790 ms | ~267 ms | <2 ms |
| 20k | ~3,150 ms | ~1,032 ms | <2 ms |
| 50k | ~20,500 ms | ~6,269 ms | <2 ms |
There was a problem hiding this comment.
Ya since the user can change the text tokenizer we want to allow them to return the text token that they want and not have to edit the emStrong tokenizer when they just want to change the text tokenizer
There was a problem hiding this comment.
Ah, yeah,, I didn't think of that.
I've updated the code to just return.
Thanks!
Make suffix groups optional to eliminate O(n) backtracking per exec(). Return early when no valid suffix is found to avoid O(n^2) amplification loop.
d4e788d to
b53ba2e
Compare
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Hello,
I noticed the emStrongLDelim regex has similar quadratic complexity to the issue fixed in #3902. I looked for an existing report but only found exponential case, not quadratic. Since #3902 was also handled as a regular PR, I'm submitting this the same way.
Thanks.
Marked version: 17.0.4
Markdown flavor: n/a
Description
The
emStrongLDelimregex suffers from O(n) backtracking per.exec()call on long runs of_or*followed by whitespace. Combined with the per-character inline tokenizer loop, this produces O(n²) total processing time.Make suffix groups optional to eliminate O(n) backtracking per exec(). Skip entire delimiter run as text when no valid suffix is found to avoid the O(n²) amplification loop.
'_'.repeat(10000) + ' a''_'.repeat(50000) + ' a'PoC
Contributor
Committer
In most cases, this should be a different person than the contributor.