Skip to content

Commit c849288

Browse files
authored
Merge pull request #2705 from eddyb/mangling-underscore-escaping
Amend RFC2603 to allow mangled identifiers to start with a digit.
2 parents dcd2b8f + 5a4d154 commit c849288

File tree

1 file changed

+19
-23
lines changed

1 file changed

+19
-23
lines changed

text/2603-rust-symbol-name-mangling-v0.md

+19-23
Original file line numberDiff line numberDiff line change
@@ -506,7 +506,7 @@ mod gödel {
506506
would be mangled as:
507507

508508
```
509-
_RNvNtNtC7mycrateu8gdel_Fqa6escher4bach
509+
_RNvNtNtC7mycrateu8gdel_5qa6escher4bach
510510
<-------->
511511
Unicode component
512512
```
@@ -613,10 +613,10 @@ compiler generates mangled names.
613613

614614
The syntax of mangled names is given in extended Backus-Naur form:
615615

616-
- Non-terminals are within angle brackets (as in `<name-prefix>`)
616+
- Non-terminals are within angle brackets (as in `<path>`)
617617
- Terminals are within quotes (as in `"_R"`),
618-
- Optional parts are in brackets (as in `[<decimal>]`),
619-
- Repetition (zero or more times) is signified by curly braces (as in `{<name-prefix>}`)
618+
- Optional parts are in brackets (as in `[<disambiguator>]`),
619+
- Repetition (zero or more times) is signified by curly braces (as in `{<type>}`)
620620
- Comments are marked with `//`.
621621

622622
Mangled names conform to the following grammar:
@@ -641,11 +641,13 @@ Mangled names conform to the following grammar:
641641
<impl-path> = [<disambiguator>] <path>
642642
643643
// The <decimal-number> is the length of the identifier in bytes.
644-
// <bytes> is the identifier itself and must not start with a decimal digit.
644+
// <bytes> is the identifier itself, and it's optionally preceded by "_",
645+
// to separate it from its length - this "_" is mandatory if the <bytes>
646+
// starts with a decimal digit, or "_", in order to keep it unambiguous.
645647
// If the "u" is present then <bytes> is Punycode-encoded.
646648
<identifier> = [<disambiguator>] <undisambiguated-identifier>
647649
<disambiguator> = "s" <base-62-number>
648-
<undisambiguated-identifier> = ["u"] <decimal-number> <bytes>
650+
<undisambiguated-identifier> = ["u"] <decimal-number> ["_"] <bytes>
649651
650652
// Namespace of the identifier in a (nested) path.
651653
// It's an a-zA-Z character, with a-z reserved for implementation-internal
@@ -775,29 +777,22 @@ and, for now, only define a mangling for integer values.
775777
### Punycode Identifiers
776778

777779
Punycode generates strings of the form `([[:ascii:]]+-)?[[:alnum:]]+`.
778-
This is problematic for two reasons:
780+
This is problematic because of the `-` character, which is not in the
781+
supported character set; Punycode uses it to separate the ASCII part
782+
(if it exists), from the base-36 encoding of the non-ASCII characters.
779783

780-
- Generated strings can contain a `-` character; which is not in the
781-
supported character set.
782-
- Generated strings can start with a digit; which makes them clash
783-
with the byte-count prefix of the `<identifier>` production.
784-
785-
For these reasons, vanilla Punycode string are further encoded during mangling:
786-
787-
- The `-` character is simply replaced by a `_` character.
788-
- The part of the Punycode string that encodes the non-ASCII characters
789-
is a base-36 number, using `[a-z0-9]` as its "digits". We want to get
790-
rid of the decimal digits in there, so we simply remap `0-9` to `A-J`.
784+
For this reasons, we deviate from vanilla Punycode, by replacing
785+
the `-` character with a `_` character.
791786

792787
Here are some examples:
793788

794789
| Original | Punycode | Punycode + Encoding |
795790
|-----------------|-----------------|---------------------|
796-
| føø | f-5gaa | f_Fgaa |
797-
| α_ω | _-ylb7e | __ylbHe |
798-
| 铁锈 | n84amf | nIEamf |
799-
| 🤦 | fq9h | fqJh |
800-
| ρυστ | 2xaedc | Cxaedc |
791+
| føø | f-5gaa | f_5gaa |
792+
| α_ω | _-ylb7e | __ylb7e |
793+
| 铁锈 | n84amf | n84amf |
794+
| 🤦 | fq9h | fq9h |
795+
| ρυστ | 2xaedc | 2xaedc |
801796

802797
With this post-processing in place the Punycode strings can be treated
803798
like regular identifiers and need no further special handling.
@@ -1154,3 +1149,4 @@ pub static QUUX: u32 = {
11541149
- Resolve question of complex constant data.
11551150
- Add a recommended resolution for open question around Punycode identifiers.
11561151
- Add a recommended resolution for open question around encoding function parameter types.
1152+
- Allow identifiers to start with a digit.

0 commit comments

Comments
 (0)