Skip to content

tel URIs: Incorrect escaping due to missing RFC 3966 "tel:" URI syntax support/parsing #263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jmiserez opened this issue Apr 25, 2022 · 0 comments

Comments

@jmiserez
Copy link

jmiserez commented Apr 25, 2022

RFC 3966 "The tel URI for Telephone Numbers" defines the URI scheme "tel". The OWASP sanitizer currently does not handle these correctly: both escaping certain characters when it "MUST NOT", or not escaping characters when it "MUST". Testcases at the end.

This is not due to a bug in the OWASP sanitizer, but due to missing support for "tel" URI syntax described in Section 3 of RFC 3966. While the "tel" URI syntax is based on the generic syntax outlined in RFC 2396 "Uniform Resource Identifiers (URI): Generic Syntax", there are important differences.

The most important difference is described in these two sentences of Section 3:

If the reserved characters "+", ";", "=", and "?" are used as delimiters between components of
the "tel" URI, they MUST NOT be percent encoded. 
These characters MUST be percent encoded if they appear in tel URI parameter values.

Unfortunately, determining whether one of these characters is used as a delimiter or as a parameter value is not straightforward, and probably requires parsing the full syntax (also described in Section 3 in ABNF form).

Basic example:
href="tel:+1234567890" -> href="tel:+1234567890": The leading plus is a separator and "MUST NOT" be percent encoded according to RFC 3966
But it depends on where the character is in the URI, e.g.:
href="tel:tel:890;phone-context=;+123-4-567" -> href="tel:tel:890;phone-context=+123-4-567": The equals is a separator here and "MUST NOT" be escaped but the plus in the parameter is not a separator and thus "MUST" be percent encoded according to RFC 3966

This does matter in the real world: While browsers (tested Chrome & Firefox) will e.g. still recognize some misencoded telephone numbers, E-Mail clients (Outlook, Apple Mail, Gmail) or telephony software (MS Teams) are more restrictive and no longer recognize the numbers. In any case I would argue that a HTML sanitizer library should probably aim for correctness here, even if it still works in some browsers.

Test cases demonstrating the issue, with comments:

Simply add the testcases to your SanitizersTest and run.

I already alluded to this before in a comment last year (#21 (comment)), but did not follow up then with proper testcases. I hope this report is slightly more helpful in understanding the issue.

@jmiserez jmiserez changed the title tel URIs: Incorrect escaping due to missing RFC 3966 "tel:" URI support & parsing tel URIs: Incorrect escaping due to missing RFC 3966 "tel:" URI syntax support/parsing Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant