@@ -506,7 +506,7 @@ mod gödel {
506
506
would be mangled as:
507
507
508
508
```
509
- _RNvNtNtC7mycrateu8gdel_Fqa6escher4bach
509
+ _RNvNtNtC7mycrateu8gdel_5qa6escher4bach
510
510
<-------->
511
511
Unicode component
512
512
```
@@ -613,10 +613,10 @@ compiler generates mangled names.
613
613
614
614
The syntax of mangled names is given in extended Backus-Naur form:
615
615
616
- - Non-terminals are within angle brackets (as in ` <name-prefix > ` )
616
+ - Non-terminals are within angle brackets (as in ` <path > ` )
617
617
- Terminals are within quotes (as in ` "_R" ` ),
618
- - Optional parts are in brackets (as in ` [<decimal >] ` ),
619
- - Repetition (zero or more times) is signified by curly braces (as in ` {<name-prefix >} ` )
618
+ - Optional parts are in brackets (as in ` [<disambiguator >] ` ),
619
+ - Repetition (zero or more times) is signified by curly braces (as in ` {<type >} ` )
620
620
- Comments are marked with ` // ` .
621
621
622
622
Mangled names conform to the following grammar:
@@ -641,11 +641,13 @@ Mangled names conform to the following grammar:
641
641
<impl-path> = [<disambiguator>] <path>
642
642
643
643
// The <decimal-number> is the length of the identifier in bytes.
644
- // <bytes> is the identifier itself and must not start with a decimal digit.
644
+ // <bytes> is the identifier itself, and it's optionally preceded by "_",
645
+ // to separate it from its length - this "_" is mandatory if the <bytes>
646
+ // starts with a decimal digit, or "_", in order to keep it unambiguous.
645
647
// If the "u" is present then <bytes> is Punycode-encoded.
646
648
<identifier> = [<disambiguator>] <undisambiguated-identifier>
647
649
<disambiguator> = "s" <base-62-number>
648
- <undisambiguated-identifier> = ["u"] <decimal-number> <bytes>
650
+ <undisambiguated-identifier> = ["u"] <decimal-number> ["_"] <bytes>
649
651
650
652
// Namespace of the identifier in a (nested) path.
651
653
// It's an a-zA-Z character, with a-z reserved for implementation-internal
@@ -775,29 +777,22 @@ and, for now, only define a mangling for integer values.
775
777
### Punycode Identifiers
776
778
777
779
Punycode generates strings of the form ` ([[:ascii:]]+-)?[[:alnum:]]+ ` .
778
- This is problematic for two reasons:
780
+ This is problematic because of the ` - ` character, which is not in the
781
+ supported character set; Punycode uses it to separate the ASCII part
782
+ (if it exists), from the base-36 encoding of the non-ASCII characters.
779
783
780
- - Generated strings can contain a ` - ` character; which is not in the
781
- supported character set.
782
- - Generated strings can start with a digit; which makes them clash
783
- with the byte-count prefix of the ` <identifier> ` production.
784
-
785
- For these reasons, vanilla Punycode string are further encoded during mangling:
786
-
787
- - The ` - ` character is simply replaced by a ` _ ` character.
788
- - The part of the Punycode string that encodes the non-ASCII characters
789
- is a base-36 number, using ` [a-z0-9] ` as its "digits". We want to get
790
- rid of the decimal digits in there, so we simply remap ` 0-9 ` to ` A-J ` .
784
+ For this reasons, we deviate from vanilla Punycode, by replacing
785
+ the ` - ` character with a ` _ ` character.
791
786
792
787
Here are some examples:
793
788
794
789
| Original | Punycode | Punycode + Encoding |
795
790
| -----------------| -----------------| ---------------------|
796
- | føø | f-5gaa | f_Fgaa |
797
- | α_ω | _ -ylb7e | __ ylbHe |
798
- | 铁锈 | n84amf | nIEamf |
799
- | 🤦 | fq9h | fqJh |
800
- | ρυστ | 2xaedc | Cxaedc |
791
+ | føø | f-5gaa | f_5gaa |
792
+ | α_ω | _ -ylb7e | __ ylb7e |
793
+ | 铁锈 | n84amf | n84amf |
794
+ | 🤦 | fq9h | fq9h |
795
+ | ρυστ | 2xaedc | 2xaedc |
801
796
802
797
With this post-processing in place the Punycode strings can be treated
803
798
like regular identifiers and need no further special handling.
@@ -1154,3 +1149,4 @@ pub static QUUX: u32 = {
1154
1149
- Resolve question of complex constant data.
1155
1150
- Add a recommended resolution for open question around Punycode identifiers.
1156
1151
- Add a recommended resolution for open question around encoding function parameter types.
1152
+ - Allow identifiers to start with a digit.
0 commit comments