4
4
5
5
Utilities for generating PHP code.
6
6
7
-
8
7
## Normalizers
9
8
10
- The normalizers generate readable PHP labels (class names, namespaces, property names, etc) from valid UTF-8 strings,
9
+ The normalizers generate readable PHP labels (class names, namespaces, property names, etc) from valid UTF-8 strings,
11
10
[ transliterating] them to ASCII and spelling out any invalid characters.
12
11
13
- ### Usage:
12
+ ### Usage
14
13
15
14
The following code (forgive the Japanese - a certain translation tool tells me it means "Pet Store"):
15
+
16
16
``` php
17
17
<?php
18
18
@@ -24,11 +24,13 @@ echo $namespace;
24
24
```
25
25
26
26
outputs:
27
- ```
27
+
28
+ ``` text
28
29
Petto\Shoppu
29
30
```
30
31
31
32
and:
33
+
32
34
``` php
33
35
<?php
34
36
@@ -40,47 +42,48 @@ echo $property;
40
42
```
41
43
42
44
outputs:
43
- ```
45
+
46
+ ``` text
44
47
twoDollarBill
45
48
```
46
49
47
50
See the [ tests] for more examples.
48
51
49
52
### Why?
50
53
51
- You must ** never** run code generated from untrusted user input. But there are a few cases where you do want to
54
+ You must ** never** run code generated from untrusted user input. But there are a few cases where you do want to
52
55
_ output_ code generated from (mostly) trusted input.
53
56
54
57
In my case, I need to generate classes and properties from an OpenAPI specification. There are no hard-and-fast rules
55
- on the characters present, just a vague "it is RECOMMENDED to follow common programming naming conventions". Whatever
56
- they are.
58
+ on the characters present, just a vague "it is RECOMMENDED to follow common programming naming conventions". Whatever
59
+ they are.
57
60
58
61
### How?
59
62
60
- Each normalizer uses ` ext-intl ` 's [ Transliterator] to turn the UTF-8 string into Latin-ASCII. Where a character has no
61
- equivalent in ASCII (the "€" symbol is a good example), it uses the [ Unicode name] of the character to spell it out (to
62
- ` Euro ` , after some minor clean-up). For ASCII characters that are not valid in a PHP label, it provides its own spell
63
+ Each normalizer uses ` ext-intl ` 's [ Transliterator] to turn the UTF-8 string into Latin-ASCII. Where a character has no
64
+ equivalent in ASCII (the "€" symbol is a good example), it uses the [ Unicode name] of the character to spell it out (to
65
+ ` Euro ` , after some minor clean-up). For ASCII characters that are not valid in a PHP label, it provides its own spell
63
66
outs. For instance, a backtick "` ; " becomes ` Backtick ` .
64
67
65
- Initial digits are also spelt out: "123foo" becomes ` OneTwoThreeFoo ` . Finally reserved words are suffixed with a
66
- user-supplied string so they don't mess things up. In the first usage example above, if we normalized "class" it would
68
+ Initial digits are also spelt out: "123foo" becomes ` OneTwoThreeFoo ` . Finally reserved words are suffixed with a
69
+ user-supplied string so they don't mess things up. In the first usage example above, if we normalized "class" it would
67
70
become ` ClassController ` .
68
71
69
- The results may not be pretty. If for some mad reason your input contains ` ͖` - put your glasses on! - the label will
70
- contain ` CombiningRightArrowheadAndUpArrowheadBelow ` . But it _ is_ valid PHP, and stands a chance of being as unique as
72
+ The results may not be pretty. If for some mad reason your input contains ` ͖ ` - put your glasses on! - the label will
73
+ contain ` CombiningRightArrowheadAndUpArrowheadBelow ` . But it _ is_ valid PHP, and stands a chance of being as unique as
71
74
the original. Which brings me to...
72
75
73
-
74
76
## Unique labelers
75
77
76
- The normalization process reduces around a million Unicode code points down to just 162 ASCII characters. Then it
77
- mangles the label further by stripping separators, reducing whitespace and turning it into camelCase, snake_case or
78
+ The normalization process reduces around a million Unicode code points down to just 162 ASCII characters. Then it
79
+ mangles the label further by stripping separators, reducing whitespace and turning it into camelCase, snake_case or
78
80
whatever your programming preference. It's gonna be lossy - nothing we can do about that.
79
81
80
82
The unique labelers' job is to add back lost uniqueness, using a ` UniqueStrategyInterface ` to decorate any non-unique
81
83
class names in the list it is given.
82
84
83
85
To guarantee uniqueness within a set of class name labels, use the ` UniqueClassLabeller ` :
86
+
84
87
``` php
85
88
<?php
86
89
@@ -96,7 +99,8 @@ var_dump($unique);
96
99
```
97
100
98
101
outputs:
99
- ```
102
+
103
+ ``` text
100
104
array(3) {
101
105
'Déjà vu' =>
102
106
string(7) "DejaVu1"
@@ -107,10 +111,11 @@ array(3) {
107
111
}
108
112
```
109
113
110
- There are labelers for each of the normalizers: ` UniqueClassLabeler ` , ` UniqueConstantLabeler ` , ` UniquePropertyLabeler `
111
- and ` UniqueVariableLabeler ` . Along with the ` NumberSuffix ` implementation of ` UniqueStrategyInterface ` , we provide a
114
+ There are labelers for each of the normalizers: ` UniqueClassLabeler ` , ` UniqueConstantLabeler ` , ` UniquePropertyLabeler `
115
+ and ` UniqueVariableLabeler ` . Along with the ` NumberSuffix ` implementation of ` UniqueStrategyInterface ` , we provide a
112
116
` SpellOutOrdinalPrefix ` strategy. Using that instead of ` NumberSuffix ` above would output:
113
- ```
117
+
118
+ ``` text
114
119
array(3) {
115
120
'Déjà vu' =>
116
121
string(11) "FirstDejaVu"
@@ -123,8 +128,7 @@ array(3) {
123
128
124
129
Kinda cute, but a bit verbose for my taste.
125
130
126
-
127
131
[ transliterating ] : https://unicode-org.github.io/icu/userguide/transforms/general/#script-transliteration
128
132
[ tests ] : ./test/AbstractNormalizerTest.php
129
133
[ Transliterator ] : https://www.php.net/manual/en/class.transliterator.php
130
- [ Unicode name ] : https://unicode.org/charts/charindex.html
134
+ [ Unicode name ] : https://unicode.org/charts/charindex.html
0 commit comments