fix Incomplete multi-character sanitization leads Improper Input Validation (RXSS) #4748

odaysec · 2025-04-11T20:33:14Z

Line 624 in 591aedb

    
           return this.replace(/<\w+(\s+("[^"]*"|'[^']*'|[^>'"])+)?\s*("[^">]*|'[^'>])?(\/)?>|<\/\w+>/gi, '');

fix the problem ensure that all instances of the targeted patterns are removed, even if they appear consecutively or in a nested manner. One effective way to achieve this is to apply the regular expression replacement repeatedly until no more replacements can be performed. This approach ensures that all instances of the targeted patterns are removed, effectively sanitizing the input. modify the stripTags function to repeatedly apply the regular expression replacement until the input no longer changes. This will ensure that all HTML tags are removed, even if they are nested or malformed.

Sanitizing untrusted input is a common technique for preventing injection attacks and other security vulnerabilities. Regular expressions are often used to perform this sanitization. However, when the regular expression matches multiple consecutive characters, replacing it just once can result in the unsafe text reappearing in the sanitized input. Attackers can exploit this issue by crafting inputs that, when sanitized with an ineffective regular expression, still contain malicious code or content. This can lead to code execution, data exposure, or other vulnerabilities.

POC

Consider the following JavaScript code that aims to remove all HTML comment start and end tags:

str.replace(/<!--|--!?>/g, "");

Given the input string "<!>", the output will be "", which still contains an HTML comment. One possible fix for this issue is to apply the regular expression replacement repeatedly until no more replacements can be performed. This ensures that the unsafe text does not re-appear in the sanitized input, effectively removing all instances of the targeted pattern:

function removeHtmlComments(input) {  
  let previous;  
  do {  
    previous = input;  
    input = input.replace(/<!--|--!?>/g, "");  
  } while (input !== previous);  
  return input;  
}

Another vulnerable is the following regular expression intended to remove script tags:

str.replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/g, "");

If the input string is <scrip<script>is removed</script>t>alert(123)</script>, the output will be <script>alert(123)</script>, which still contains a script tag. A fix for this issue is to rewrite the regular expression to match single characters ("<" and ">") instead of the entire unsafe text. This simplifies the sanitization process and ensures that all potentially unsafe characters are removed:

function removeAllHtmlTags(input) {  
  return input.replace(/<|>/g, "");  
}

Another potential fix is to use the popular sanitize-html npm library. It keeps most of the safe HTML tags while removing all unsafe tags and attributes.

const sanitizeHtml = require("sanitize-html");
function removeAllHtmlTags(input) {  
  return sanitizeHtml(input);  
}

The regular expression attempts to strip out all occurences of /../ from str. This will not work as expected: for the string /./.././, it will remove the single occurrence of /../ in the middle, but the remainder of the string then becomes /../, which is another instance of the substring we were trying to remove. A possible fix for this issue is to use the "sanitize-filename" npm library for path sanitization. This library is specifically designed to handle path sanitization, and should handle all corner cases and ensure effective sanitization:

const sanitize = require("sanitize-filename");  
  
function sanitizePath(input) {  
  return sanitize(input);  
}

References

A1 Injection
Removing all script tags from HTML with JS regular expression
CWE-20
CWE-80
CWE-116

joshua-bn · 2025-04-11T20:41:07Z

This should be done on the backend.

kiatng · 2025-04-15T02:23:37Z

Can you provide test cases as in http://prototypejs.org/doc/latest/language/String/prototype/stripTags/

sonarqubecloud · 2025-04-15T02:23:53Z

Quality Gate failed

Failed conditions
1 Security Hotspot
D Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Update prototype.js

3c7b2a1

github-actions bot added the JavaScript Relates to js/* label Apr 11, 2025

Merge branch 'main' into patch-1

a5bd55a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix Incomplete multi-character sanitization leads Improper Input Validation (RXSS) #4748

fix Incomplete multi-character sanitization leads Improper Input Validation (RXSS) #4748

Uh oh!

odaysec commented Apr 11, 2025

Uh oh!

joshua-bn commented Apr 11, 2025

Uh oh!

kiatng commented Apr 15, 2025

Uh oh!

sonarqubecloud bot commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

fix Incomplete multi-character sanitization leads Improper Input Validation (RXSS) #4748

Are you sure you want to change the base?

fix Incomplete multi-character sanitization leads Improper Input Validation (RXSS) #4748

Uh oh!

Conversation

odaysec commented Apr 11, 2025

POC

References

Uh oh!

joshua-bn commented Apr 11, 2025

Uh oh!

kiatng commented Apr 15, 2025

Uh oh!

sonarqubecloud bot commented Apr 15, 2025

Quality Gate failed

Uh oh!

Uh oh!