Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java: Improve performance of XSS regex. #18552

Merged
merged 1 commit into from
Jan 22, 2025

Conversation

aschackmull
Copy link
Contributor

@jbj highlighted a case where the regex matching in isXssVulnerableContentType was a noticeable bottleneck - it's slow and it's evaluated multiple times.

Timing it locally, I measured one of the cases to 2.5s.
Merging the 4 regexes to 1 brought that down to about 670ms.
And testing the resulting regex on a single-column projection brought that further down to about 90ms.

This makes sense, since checking a single regex is much faster than checking 4. And projecting CompileTimeConstantExpr.getStringValue() to the string column easily reduces the number of regex operations by an order of magnitude when many string literals have the same value.

@aschackmull aschackmull added the no-change-note-required This PR does not need a change note label Jan 21, 2025
@Copilot Copilot bot review requested due to automatic review settings January 21, 2025 13:50
@aschackmull aschackmull requested a review from a team as a code owner January 21, 2025 13:50

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

Files not reviewed (3)
  • java/ql/lib/semmle/code/java/frameworks/JaxWS.qll: Language not supported
  • java/ql/lib/semmle/code/java/frameworks/spring/SpringHttp.qll: Language not supported
  • java/ql/lib/semmle/code/java/security/XSS.qll: Language not supported

Tip: If you use Visual Studio Code, you can request a review from Copilot before you push from the "Source Control" tab. Learn more

@github-actions github-actions bot added the Java label Jan 21, 2025
@aschackmull
Copy link
Contributor Author

I could btw. cut the 670ms in half by switching the column order, since that makes the string pool cache more efficient, but it'll still evaluate the regex for each row, so it's not really competitive with the regex-on-projection solution. I also tried instrumenting the evaluator to exploit the sorted strings case (i.e. the switched column order) by adding a very lightweight single-tuple cache to skip regex evals on identical tuples, but this took about 280ms, so it still didn't compete with the regex-on-projection even though it ought to have computed the same number of regex and string pool operations.

@aschackmull
Copy link
Contributor Author

Wow, this actually shows as a more significant improvement on dca than I expected 🎉

@aschackmull aschackmull merged commit 5bfd22e into github:main Jan 22, 2025
15 checks passed
@aschackmull aschackmull deleted the java/xss-regex-perf branch January 22, 2025 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants