Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit def43e0

Browse files
author
Alexis Hunt
committedJan 19, 2019
Expand docs on Macros By Example.
The primary motivation here was to increase clarity and fully address the scoping and naming details. The inclusion of RFC 550's formal specification is to move it to the reference where it can be updated. I made several changes, motivated by accommodating `?` and new fragment specifiers, but there are some other things which need highlighting so that they can be double-checked for correctness. * Permit the empty string to follow on in the first invariant; this is a technical oversight in the definition I believe. * Added a requirement that repetitions obey the follow rules; this was an oversight in the original RFC and currently planned for fix. * Rewrote the definition of FIRST for complex NTs to be more clear. * Added a case to LAST for `?` repetitions * Removed the last example of LAST, because it is wrong. * Rearranged the definition of FOLLOW to be more clear * Added Shl to FOLLOW(ty) and FOLLOW(path), as documented in the Reference already. * Added missing follow sets for newer fragment specifiers. The scoping text is probably not completely accurate, but it's certainly much better than what was there before (i.e. basically nothing).
1 parent dafc0ba commit def43e0

File tree

5 files changed

+779
-123
lines changed

5 files changed

+779
-123
lines changed
 

‎src/SUMMARY.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -111,8 +111,8 @@
111111

112112
- [Constant Evaluation](const_eval.md)
113113

114-
[Appendix: Influences](influences.md)
115-
116-
[Appendix: As-yet-undocumented Features](undocumented.md)
117-
118-
[Appendix: Glossary](glossary.md)
114+
- [Appendices](appendices.md)
115+
- [Macro Follow-Set Ambiguity Formal Specification](macro-ambiguity.md)
116+
- [Influences](influences.md)
117+
- [As-Yet-Undocumented Features](undocumented.md)
118+
- [Glossary](glossary.md)

‎src/appendices.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Appendices

‎src/attributes.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -175,8 +175,6 @@ which can be used to control type layout.
175175
macros named. The `extern crate` must appear at the crate root, not inside
176176
`mod`, which ensures proper function of the `$crate` macro variable.
177177

178-
- `macro_reexport` on an `extern crate` — re-export the named macros.
179-
180178
- `macro_export` - export a `macro_rules` macro for cross-crate usage.
181179

182180
- `no_link` on an `extern crate` — even if we load this crate for macros, don't

‎src/macro-ambiguity.md

Lines changed: 378 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,378 @@
1+
# Appendix: Macro Follow-Set Ambiguity Formal Specification
2+
3+
This page documents the formal specification of the follow rules for [Macros
4+
By Example]. They were originally specified in [RFC 550], from which the bulk
5+
of this text is copied, and expanded upon in subsequent RFCs.
6+
7+
## Definitions & Conventions
8+
9+
- `macro`: anything invokable as `foo!(...)` in source code.
10+
- `MBE`: macro-by-example, a macro defined by `macro_rules`.
11+
- `matcher`: the left-hand-side of a rule in a `macro_rules` invocation, or a
12+
subportion thereof.
13+
- `macro parser`: the bit of code in the Rust parser that will parse the
14+
input using a grammar derived from all of the matchers.
15+
- `fragment`: The class of Rust syntax that a given matcher will accept (or
16+
"match").
17+
- `repetition` : a fragment that follows a regular repeating pattern
18+
- `NT`: non-terminal, the various "meta-variables" or repetition matchers
19+
that can appear in a matcher, specified in MBE syntax with a leading `$`
20+
character.
21+
- `simple NT`: a "meta-variable" non-terminal (further discussion below).
22+
- `complex NT`: a repetition matching non-terminal, specified via repetition
23+
operators (`\*`, `+`, `?`).
24+
- `token`: an atomic element of a matcher; i.e. identifiers, operators,
25+
open/close delimiters, *and* simple NT's.
26+
- `token tree`: a tree structure formed from tokens (the leaves), complex
27+
NT's, and finite sequences of token trees.
28+
- `delimiter token`: a token that is meant to divide the end of one fragment
29+
and the start of the next fragment.
30+
- `separator token`: an optional delimiter token in an complex NT that
31+
separates each pair of elements in the matched repetition.
32+
- `separated complex NT`: a complex NT that has its own separator token.
33+
- `delimited sequence`: a sequence of token trees with appropriate open- and
34+
close-delimiters at the start and end of the sequence.
35+
- `empty fragment`: The class of invisible Rust syntax that separates tokens,
36+
i.e. whitespace, or (in some lexical contexts), the empty token sequence.
37+
- `fragment specifier`: The identifier in a simple NT that specifies which
38+
fragment the NT accepts.
39+
- `language`: a context-free language.
40+
41+
Example:
42+
43+
```rust,compile_fail
44+
macro_rules! i_am_an_mbe {
45+
(start $foo:expr $($i:ident),* end) => ($foo)
46+
}
47+
```
48+
49+
`(start $foo:expr $($i:ident),\* end)` is a matcher. The whole matcher is a
50+
delimited sequence (with open- and close-delimiters `(` and `)`), and `$foo`
51+
and `$i` are simple NT's with `expr` and `ident` as their respective fragment
52+
specifiers.
53+
54+
`$(i:ident),\*` is *also* an NT; it is a complex NT that matches a
55+
comma-seprated repetition of identifiers. The `,` is the separator token for
56+
the complex NT; it occurs in between each pair of elements (if any) of the
57+
matched fragment.
58+
59+
Another example of a complex NT is `$(hi $e:expr ;)+`, which matches any
60+
fragment of the form `hi <expr>; hi <expr>; ...` where `hi <expr>;` occurs at
61+
least once. Note that this complex NT does not have a dedicated separator
62+
token.
63+
64+
(Note that Rust's parser ensures that delimited sequences always occur with
65+
proper nesting of token tree structure and correct matching of open- and
66+
close-delimiters.)
67+
68+
We will tend to use the variable "M" to stand for a matcher, variables "t" and
69+
"u" for arbitrary individual tokens, and the variables "tt" and "uu" for
70+
arbitrary token trees. (The use of "tt" does present potential ambiguity with
71+
its additional role as a fragment specifier; but it will be clear from context
72+
which interpretation is meant.)
73+
74+
"SEP" will range over separator tokens, "OP" over the repetition operators
75+
`\*`, `+`, and `?`, "OPEN"/"CLOSE" over matching token pairs surrounding a
76+
delimited sequence (e.g. `[` and `]`).
77+
78+
We also use Greek letters "α" "β" "γ" "δ" to stand for potentially empty
79+
token-tree sequences. (However, the Greek letter "ε" (epsilon) has a special
80+
role in the presentation and does not stand for a token-tree sequence.)
81+
82+
* This Greek letter convention is usually just employed when the presence of
83+
a sequence is a technical detail; in particular, when I wish to *emphasize*
84+
that we are operating on a sequence of token-trees, I will use the notation
85+
"tt ..." for the sequence, not a Greek letter
86+
87+
Note that a matcher is merely a token tree. A "simple NT", as mentioned above,
88+
is an meta-variable NT; thus it is a non-repetition. For example, `$foo:ty` is
89+
a simple NT but `$($foo:ty)+` is a complex NT.
90+
91+
Note also that in the context of this formalism, the term "token" generally
92+
*includes* simple NTs.
93+
94+
Finally, it is useful for the reader to keep in mind that according to the
95+
definitions of this formalism, no simple NT matches the empty fragment, and
96+
likewise no token matches the empty fragment of Rust syntax. (Thus, the *only*
97+
NT that can match the empty fragment is a complex NT.) This is not actually
98+
true, because the `vis` matcher can match an empty fragment. Thus, for the
99+
purposes of the formalism, we will treat `$v:vis` as actually being
100+
`$($v:vis)?`, with a requirement that the matcher match an empty fragment.
101+
102+
### The Matcher Invariants
103+
104+
In order to be valid, a matcher must meet the following three invariants. The
105+
definitions of FIRST and FOLLOW are described later.
106+
107+
1. For any two successive token tree sequences in a matcher `M` (i.e. `M = ...
108+
tt uu ...`) with `uu ...` nonempty, we must have FOLLOW(`... tt`) ∪ {ε} ⊇
109+
FIRST(`uu ...`).
110+
1. For any separated complex NT in a matcher, `M = ... $(tt ...) SEP OP ...`,
111+
we must have `SEP` ∈ FOLLOW(`tt ...`).
112+
1. For an unseparated complex NT in a matcher, `M = ... $(tt ...) OP ...`, if
113+
OP = `\*` or `+`, we must have FOLLOW(`tt ...`) ⊇ FIRST(`tt ...`).
114+
115+
The first invariant says that whatever actual token that comes after a matcher,
116+
if any, must be somewhere in the predetermined follow set. This ensures that a
117+
legal macro definition will continue to assign the same determination as to
118+
where `... tt` ends and `uu ...` begins, even as new syntactic forms are added
119+
to the language.
120+
121+
The second invariant says that a separated complex NT must use a seperator token
122+
that is part of the predetermined follow set for the internal contents of the
123+
NT. This ensures that a legal macro definition will continue to parse an input
124+
fragment into the same delimited sequence of `tt ...`'s, even as new syntactic
125+
forms are added to the language.
126+
127+
The third invariant says that when we have a complex NT that can match two or
128+
more copies of the same thing with no separation in between, it must be
129+
permissible for them to be placed next to each other as per the first invariant.
130+
This invariant also requires they be nonempty, which eliminates a possible
131+
ambiguity.
132+
133+
**NOTE: The third invariant is currently unenforced due to historical oversight
134+
and significant reliance on the behaviour. It is currently undecided what to do
135+
about this going forward. Macros that do not respect the behaviour may become
136+
invalid in a future edition of Rust. See the [tracking issue].**
137+
138+
### FIRST and FOLLOW, informally
139+
140+
A given matcher M maps to three sets: FIRST(M), LAST(M) and FOLLOW(M).
141+
142+
Each of the three sets is made up of tokens. FIRST(M) and LAST(M) may also
143+
contain a distinguished non-token element ε ("epsilon"), which indicates that M
144+
can match the empty fragment. (But FOLLOW(M) is always just a set of tokens.)
145+
146+
Informally:
147+
148+
* FIRST(M): collects the tokens potentially used first when matching a
149+
fragment to M.
150+
151+
* LAST(M): collects the tokens potentially used last when matching a fragment
152+
to M.
153+
154+
* FOLLOW(M): the set of tokens allowed to follow immediately after some
155+
fragment matched by M.
156+
157+
In other words: t ∈ FOLLOW(M) if and only if there exists (potentially
158+
empty) token sequences α, β, γ, δ where:
159+
160+
* M matches β,
161+
162+
* t matches γ, and
163+
164+
* The concatenation α β γ δ is a parseable Rust program.
165+
166+
We use the shorthand ANYTOKEN to denote the set of all tokens (including simple
167+
NTs). For example, if any token is legal after a matcher M, then FOLLOW(M) =
168+
ANYTOKEN.
169+
170+
(To review one's understanding of the above informal descriptions, the reader
171+
at this point may want to jump ahead to the [examples of
172+
FIRST/LAST][#examples-of-first-and-last] before reading their formal
173+
definitions.)
174+
175+
### FIRST, LAST
176+
177+
Below are formal inductive definitions for FIRST and LAST.
178+
179+
"A ∪ B" denotes set union, "A ∩ B" denotes set intersection, and "A \ B"
180+
denotes set difference (i.e. all elements of A that are not present in B).
181+
182+
#### FIRST
183+
184+
FIRST(M) is defined by case analysis on the sequence M and the structure of its
185+
first token-tree (if any):
186+
187+
* if M is the empty sequence, then FIRST(M) = { ε },
188+
189+
* if M starts with a token t, then FIRST(M) = { t },
190+
191+
(Note: this covers the case where M starts with a delimited token-tree
192+
sequence, `M = OPEN tt ... CLOSE ...`, in which case `t = OPEN` and thus
193+
FIRST(M) = { `OPEN` }.)
194+
195+
(Note: this critically relies on the property that no simple NT matches the
196+
empty fragment.)
197+
198+
* Otherwise, M is a token-tree sequence starting with a complex NT: `M = $( tt
199+
... ) OP α`, or `M = $( tt ... ) SEP OP α`, (where `α` is the (potentially
200+
empty) sequence of token trees for the rest of the matcher).
201+
202+
* Let SEP\_SET(M) = { SEP } if SEP is present and ε ∈ FIRST(`tt ...`);
203+
otherwise SEP\_SET(M) = {}.
204+
205+
* Let ALPHA\_SET(M) = FIRST(`α`) if OP = `\*` or `?` and ALPHA\_SET(M) = {} if
206+
OP = `+`.
207+
* FIRST(M) = (FIRST(`tt ...`) \\ {ε}) ∪ SEP\_SET(M) ∪ ALPHA\_SET(M).
208+
209+
The definition for complex NTs deserves some justification. SEP\_SET(M) defines
210+
the possibility that the separator could be a valid first token for M, which
211+
happens when there is a separator defined and the repeated fragment could be
212+
empty. ALPHA\_SET(M) defines the possibility that the complex NT could be empty,
213+
meaning that M's valid first tokens are those of the following token-tree
214+
sequences `α`. This occurs when either `\*` or `?` is used, in which case there
215+
could be zero repetitions. In theory, this could also occur if `+` was used with
216+
a potentially-empty repeating fragment, but this is forbidden by the third
217+
invariant.
218+
219+
From there, clearly FIRST(M) can include any token from SEP\_SET(M) or
220+
ALPHA\_SET(M), and if the complex NT match is nonempty, then any token starting
221+
FIRST(`tt ...`) could work too. The last piece to consider is ε. SEP\_SET(M) and
222+
FIRST(`tt ...`) \ {ε} cannot contain ε, but ALPHA\_SET(M) could. Hence, this
223+
definition allows M to accept ε if and only if ε ∈ ALPHA\_SET(M) does. This is
224+
correct because in order for M to accept ε in the complex NT case, both the
225+
complex NT and α must accept it. If OP = `+`, meaning that the complex NT
226+
cannot be empty, then by definition ε ∉ ALPHA\_SET(M). Otherwise, the complex NT
227+
can accept zero repititions, and then ALPHA\_SET(M) = FOLLOW(`α`). So this
228+
definition is correct with respect to \varepsilon as well.
229+
230+
#### LAST
231+
232+
LAST(M), defined by case analysis on M itself (a sequence of token-trees):
233+
234+
* if M is the empty sequence, then LAST(M) = { ε }
235+
236+
* if M is a singleton token t, then LAST(M) = { t }
237+
238+
* if M is the singleton complex NT repeating zero or more times, `M = $( tt
239+
... ) *`, or `M = $( tt ... ) SEP *`
240+
241+
* Let sep_set = { SEP } if SEP present; otherwise sep_set = {}.
242+
243+
* if ε ∈ LAST(`tt ...`) then LAST(M) = LAST(`tt ...`) ∪ sep_set
244+
245+
* otherwise, the sequence `tt ...` must be non-empty; LAST(M) = LAST(`tt
246+
...`) ∪ {ε}.
247+
248+
* if M is the singleton complex NT repeating one or more times, `M = $( tt ...
249+
) +`, or `M = $( tt ... ) SEP +`
250+
251+
* Let sep_set = { SEP } if SEP present; otherwise sep_set = {}.
252+
253+
* if ε ∈ LAST(`tt ...`) then LAST(M) = LAST(`tt ...`) ∪ sep_set
254+
255+
* otherwise, the sequence `tt ...` must be non-empty; LAST(M) = LAST(`tt
256+
...`)
257+
258+
* if M is the singleton complex NT repeating zero or one time, `M = $( tt ...)
259+
?`, then LAST(M) = LAST(`tt ...`) ∪ {ε}.
260+
261+
* if M is a delimited token-tree sequence `OPEN tt ... CLOSE`, then LAST(M) =
262+
{ `CLOSE` }.
263+
264+
* if M is a non-empty sequence of token-trees `tt uu ...`,
265+
266+
* If ε ∈ LAST(`uu ...`), then LAST(M) = LAST(`tt`) ∪ (LAST(`uu ...`) \ { ε }).
267+
268+
* Otherwise, the sequence `uu ...` must be non-empty; then LAST(M) =
269+
LAST(`uu ...`).
270+
271+
### Examples of FIRST and LAST
272+
[examples-of-first-and-last]: #examples-of-first-and-last
273+
274+
Below are some examples of FIRST and LAST.
275+
(Note in particular how the special ε element is introduced and
276+
eliminated based on the interation between the pieces of the input.)
277+
278+
Our first example is presented in a tree structure to elaborate on how
279+
the analysis of the matcher composes. (Some of the simpler subtrees
280+
have been elided.)
281+
282+
```text
283+
INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g
284+
~~~~~~~~ ~~~~~~~ ~
285+
| | |
286+
FIRST: { $d:ident } { $e:expr } { h }
287+
288+
289+
INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+
290+
~~~~~~~~~~~~~~~~~~ ~~~~~~~ ~~~
291+
| | |
292+
FIRST: { $d:ident } { h, ε } { f }
293+
294+
INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g
295+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~~~ ~
296+
| | | |
297+
FIRST: { $d:ident, ε } { h, ε, ; } { f } { g }
298+
299+
300+
INPUT: $( $d:ident $e:expr );* $( $( h )* );* $( f ; )+ g
301+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
302+
|
303+
FIRST: { $d:ident, h, ;, f }
304+
```
305+
306+
Thus:
307+
308+
* FIRST(`$($d:ident $e:expr );* $( $(h)* );* $( f ;)+ g`) = { `$d:ident`, `h`, `;`, `f` }
309+
310+
Note however that:
311+
312+
* FIRST(`$($d:ident $e:expr );* $( $(h)* );* $($( f ;)+ g)*`) = { `$d:ident`, `h`, `;`, `f`, ε }
313+
314+
Here are similar examples but now for LAST.
315+
316+
* LAST(`$d:ident $e:expr`) = { `$e:expr` }
317+
* LAST(`$( $d:ident $e:expr );*`) = { `$e:expr`, ε }
318+
* LAST(`$( $d:ident $e:expr );* $(h)*`) = { `$e:expr`, ε, `h` }
319+
* LAST(`$( $d:ident $e:expr );* $(h)* $( f ;)+`) = { `;` }
320+
* LAST(`$( $d:ident $e:expr );* $(h)* $( f ;)+ g`) = { `g` }
321+
322+
### FOLLOW(M)
323+
324+
Finally, the definition for FOLLOW(M) is built up as follows. pat, expr, etc.
325+
represent simple nonterminals with the given fragment specifier.
326+
327+
* FOLLOW(pat) = {`=>`, `,`, `=`, `|`, `if`, `in`}`.
328+
329+
* FOLLOW(expr) = FOLLOW(stmt) = {`=>`, `,`, `;`}`.
330+
331+
* FOLLOW(ty) = FOLLOW(path) = {`{`, `[`, `,`, `=>`, `:`, `=`, `>`, `>>`, `;`,
332+
`|`, `as`, `where`, block nonterminals}.
333+
334+
* FOLLOW(vis) = {`,`l any keyword or identifier except a non-raw `priv`; any
335+
token that can begin a type; ident, ty, and path nonterminals}.
336+
337+
* FOLLOW(t) = ANYTOKEN for any other simple token, including block, ident,
338+
tt, item, lifetime, literal and meta simple nonterminals, and all terminals.
339+
340+
* FOLLOW(M), for any other M, is defined as the intersection, as t ranges over
341+
(LAST(M) \ {ε}), of FOLLOW(t).
342+
343+
The tokens that can begin a type are, as of this writing, {`(`, `[`, `!`, `\*`,
344+
`&`, `&&`, `?`, lifetimes, `>`, `>>`, `::`, any non-keyword identifier, `super`,
345+
`self`, `Self`, `extern`, `crate`, `$crate`, `_`, `for`, `impl`, `fn`, `unsafe`,
346+
`typeof`, `dyn`}, although this list may not be complete because people won't
347+
always remember to update the appendix when new ones are added.
348+
349+
Examples of FOLLOW for complex M:
350+
351+
* FOLLOW(`$( $d:ident $e:expr )\*`) = FOLLOW(`$e:expr`)
352+
* FOLLOW(`$( $d:ident $e:expr )\* $(;)\*`) = FOLLOW(`$e:expr`) ∩ ANYTOKEN = FOLLOW(`$e:expr`)
353+
* FOLLOW(`$( $d:ident $e:expr )\* $(;)\* $( f |)+`) = ANYTOKEN
354+
355+
### Examples of valid and invalid matchers
356+
357+
With the above specification in hand, we can present arguments for
358+
why particular matchers are legal and others are not.
359+
360+
* `($ty:ty < foo ,)` : illegal, because FIRST(`< foo ,`) = { `<` } ⊈ FOLLOW(`ty`)
361+
362+
* `($ty:ty , foo <)` : legal, because FIRST(`, foo <`) = { `,` } is ⊆ FOLLOW(`ty`).
363+
364+
* `($pa:pat $pb:pat $ty:ty ,)` : illegal, because FIRST(`$pb:pat $ty:ty ,`) = { `$pb:pat` } ⊈ FOLLOW(`pat`), and also FIRST(`$ty:ty ,`) = { `$ty:ty` } ⊈ FOLLOW(`pat`).
365+
366+
* `( $($a:tt $b:tt)* ; )` : legal, because FIRST(`$b:tt`) = { `$b:tt` } is ⊆ FOLLOW(`tt`) = ANYTOKEN, as is FIRST(`;`) = { `;` }.
367+
368+
* `( $($t:tt),* , $(t:tt),* )` : legal, (though any attempt to actually use this macro will signal a local ambguity error during expansion).
369+
370+
* `($ty:ty $(; not sep)* -)` : illegal, because FIRST(`$(; not sep)* -`) = { `;`, `-` } is not in FOLLOW(`ty`).
371+
372+
* `($($ty:ty)-+)` : illegal, because separator `-` is not in FOLLOW(`ty`).
373+
374+
* `($($e:expr)*)` : illegal, because expr NTs are not in FOLLOW(expr NT).
375+
376+
[Macros by Example]: macros-by-example.html
377+
[RFC 550]: https://github.com/rust-lang/rfcs/blob/master/text/0550-macro-future-proofing.html
378+
[tracking issue]: https://github.com/rust-lang/rust/issues/56575

‎src/macros-by-example.md

Lines changed: 395 additions & 116 deletions
Original file line numberDiff line numberDiff line change
@@ -24,149 +24,428 @@
2424
> &nbsp;&nbsp; &nbsp;&nbsp; [_Token_]<sub>_except $ and delimiters_</sub>\
2525
> &nbsp;&nbsp; | _MacroMatcher_\
2626
> &nbsp;&nbsp; | `$` [IDENTIFIER] `:` _MacroFragSpec_\
27-
> &nbsp;&nbsp; | `$` `(` _MacroMatch_<sup>+</sup> `)` _MacroRepSep_<sup>?</sup> _MacroKleeneOp_
27+
> &nbsp;&nbsp; | `$` `(` _MacroMatch_<sup>+</sup> `)` _MacroRepSep_<sup>?</sup> _MacroRepOp_
2828
>
2929
> _MacroFragSpec_ :\
3030
> &nbsp;&nbsp; &nbsp;&nbsp; `block` | `expr` | `ident` | `item` | `lifetime` | `literal`\
3131
> &nbsp;&nbsp; | `meta` | `pat` | `path` | `stmt` | `tt` | `ty` | `vis`
3232
>
3333
> _MacroRepSep_ :\
34-
> &nbsp;&nbsp; [_Token_]<sub>_except delimiters and kleene operators_</sub>
34+
> &nbsp;&nbsp; [_Token_]<sub>_except delimiters and repetition operators_</sub>
3535
>
36-
> _MacroKleeneOp_<sub>2015</sub> :\
37-
> &nbsp;&nbsp; `*` | `+`
38-
>
39-
> _MacroKleeneOp_<sub>2018+</sub> :\
40-
> &nbsp;&nbsp; `*` | `+` | `?`
36+
> _MacroRepOp_<sub>2018+</sub> :\
37+
> &nbsp;&nbsp; `*` | `+` | `?`<sub>2018+</sub>
4138
>
4239
> _MacroTranscriber_ :\
4340
> &nbsp;&nbsp; [_DelimTokenTree_]
4441
4542
`macro_rules` allows users to define syntax extension in a declarative way. We
4643
call such extensions "macros by example" or simply "macros".
4744

48-
Macros can expand to expressions, statements, items, types, or patterns.
49-
50-
The macro expander looks up macro invocations by name, and tries each macro
51-
rule in turn. It transcribes the first successful match. Matching and
52-
transcription are closely related to each other, and we will describe them
53-
together.
54-
55-
The macro expander matches and transcribes every token that does not begin with
56-
a `$` literally, including delimiters. For parsing reasons, delimiters must be
57-
balanced, but they are otherwise not special.
58-
59-
In the matcher, `$` _name_ `:` _designator_ matches the nonterminal in the Rust
60-
syntax named by _designator_. Valid designators are:
61-
62-
* `item`: an [_Item_]
63-
* `block`: a [_BlockExpression_]
64-
* `stmt`: a [_Statement_] without the trailing semicolon
65-
* `pat`: a [_Pattern_]
66-
* `expr`: an [_Expression_]
67-
* `ty`: a [_Type_]
68-
* `ident`: an [IDENTIFIER_OR_KEYWORD]
69-
* `path`: a [_TypePath_] style path
70-
* `tt`: a [_TokenTree_]&nbsp;(a single [token] or tokens in matching delimiters `()`, `[]`, or `{}`)
71-
* `meta`: a [_MetaItem_], the contents of an attribute
72-
* `lifetime`: a [LIFETIME_TOKEN]
73-
* `vis`: a [_Visibility_] qualifier
74-
* `literal`: matches `-`<sup>?</sup>[_LiteralExpression_]
45+
Each macro by example has a name, and one or more _rules_. Each rule has two
46+
parts: a _matcher_, describing the syntax that it matches, and a _transcriber_,
47+
describing the syntax that will replace a successfully matched invocation. Both
48+
the matcher and the transcriber must be surrounded by delimiters. Macros can
49+
expand to expressions, statements, items (including traits, impls, and foreign
50+
items), types, or patterns.
51+
52+
When a macro is invoked, the macro expander looks up macro invocations by name,
53+
and tries each macro rule in turn. It transcribes the first successful match; if
54+
this results in an error, then future matches are not tried. When matching, no
55+
lookahead is performed; if the compiler cannot unambiguously determine how to
56+
parse the macro invocation one token at a time, then it is an error. In the
57+
following example, the compiler does not look ahead past the identifier to see
58+
if the following token is a `)`, even though that would allow it to parse the
59+
invocation unambiguously:
60+
61+
```rust,compile_fail
62+
macro_rules! ambiguity {
63+
($($i:ident)* $j:ident) => { ($($i)-*) * $j };
64+
}
65+
66+
ambiguity!(error); // Error: local ambiguity
67+
```
68+
69+
In both the matcher and the transcriber, the `$` token is used to invoke special
70+
behaviours from the macro engine. Tokens that aren't part of such an invocation
71+
are matched and transcribed literally, with one exception. The exception is that
72+
the outer delimiters for the matcher will match any pair of delimiters. Thus,
73+
for instance, the matcher `(())` will match `{()}` but not `{{}}`. The character
74+
`$` cannot be matched or transcribed literally.
75+
76+
## Metavariables
77+
78+
In the matcher, `$` _name_ `:` _fragment-specifier_ matches a Rust syntax
79+
fragment of the kind specified and binds it to the metavariable `$`_name_. Valid
80+
fragment specifiers are:
81+
82+
* `item`: an [_Item_]
83+
* `block`: a [_BlockExpression_]
84+
* `stmt`: a [_Statement_] without the trailing semicolon (except for item
85+
statements that require semicolons)
86+
* `pat`: a [_Pattern_]
87+
* `expr`: an [_Expression_]
88+
* `ty`: a [_Type_]
89+
* `ident`: an [IDENTIFIER_OR_KEYWORD]
90+
* `path`: a [_TypePath_] style path
91+
* `tt`: a [_TokenTree_]&nbsp;(a single [token] or tokens in matching delimiters `()`, `[]`, or `{}`)
92+
* `meta`: a [_MetaItem_], the contents of an attribute
93+
* `lifetime`: a [LIFETIME_TOKEN]
94+
* `vis`: a possibly empty [_Visibility_] qualifier
95+
* `literal`: matches `-`<sup>?</sup>[_LiteralExpression_]
96+
97+
In the transcriber, metavariables are referred to simply by $`_name_`, since
98+
the fragment kind is specified in the matcher. Metavariables are replaced with
99+
the syntax element that matched them. The keyword metavariable `$crate` can be
100+
used to refer to the current crate; see [Hygiene] below. Metavariables can be
101+
transcribed more than once or not at all.
102+
103+
## Repititions
104+
105+
In both the matcher and transcriber, repetitions are indicated by placing the
106+
tokens to be repeated inside `$( ... )`, followed by a repetition operator,
107+
optionally with a separator token between. The separator token can be any token
108+
other than a delimiter or one of the repetition operators, but `;` and `,` are
109+
the most common. For instance, `$( $i:ident ),*` represents any number of
110+
identifiers separated by commas. Nested repititions are permitted.
111+
112+
The repetition operators are `*`, which indicates any number of repetitions,
113+
`+`, which indicates any number but at least one, and `?` which indicates an
114+
optional fragment with zero or one occurrences. Since `?` represents at most one
115+
occurrence, it cannot be used with a separator.
116+
117+
The repeated fragment both matches and transcribes to the specified number of
118+
the fragment, separated by the separator token. Metavariables are matched to
119+
every repetition of their corresponding fragment, so for instance, the `$(
120+
$i:ident ),*` example above matches `$i` to all of the identifiers in the list.
121+
122+
During transcription, additional restrictions apply to repititions so that the
123+
compiler knows how to expand them properly. First, a metavariable must appear in
124+
exactly the same number, kind, and nesting order of repetitions in the
125+
transcriber as it did in the matcher. So for the matcher `$( $i:ident ),*`, the
126+
transcribers `=> $i`, `=> $( $( $i)* )*`, and `=> $( $i )+` are all illegal, but
127+
`=> $( $i );*` is correct and replaces a comma-separated list of identifiers
128+
with a semicolon-separated list.
129+
130+
Second, each repetition in the transcriber must contain at least one
131+
metavariable in order to decide now many times to expand it. If multiple
132+
metavariables appear in the same repetition, they must be bound to the same
133+
number of fragments. For instance, `( $( $i:ident ),* ; $( $j:ident ),* ) => (
134+
$( ($i,$j) ),*` must bind the same number of `$i` fragments as `$j` fragments.
135+
This means that invoking the macro with `(a, b, c; d, e, f`) is legal and
136+
expands to `((a,d), (b,e), c,f))`, but `(a, b, c; d, e)` is illegal because it
137+
does not have the same number. This requirement applies to every layer of nested
138+
repetitions.
139+
140+
> **Edition Differences**: The `?` repetition operator did not exist before the
141+
> 2018 edition. Prior to the 2018 Edition, `?` was an allowed
142+
> separator token, rather than a repetition operator.
143+
144+
## Scoping, Exporting, and Importing
145+
146+
For historical reasons, the scoping of macros by example does not work entirely like
147+
items. Macros have two forms of scope: textual scope, and path-based scope.
148+
Textual scope is based on the order that things appear in source files, or even
149+
across multiple files, and is the default scoping. It's explained further below.
150+
Path-based scope works exactly the same way that item scoping does. The scoping,
151+
exporting, and importing of macros is controlled largely by attributes.
152+
153+
When a macro is invoked by an unqualified identifier (not part of a multi-part
154+
path), it's first looked up in textual scoping. If this does not yield any
155+
results, then it is looked up in path-based scoping. If the macro's name is
156+
qualified with a path, then it is only looked up in path-based scoping.
157+
158+
```rust,ignore
159+
use lazy_static::lazy_static; // Path-based import.
160+
161+
macro_rules! lazy_static { // Textual definition.
162+
(lazy) => {};
163+
}
164+
165+
lazy_static!{lazy} // Textual lookup finds our macro first.
166+
self::lazy_static!{} // Path-based lookup ignores our macro, finds imported one.
167+
```
168+
169+
### Textual Scope
170+
171+
Textual scope is based largely on the order that things appear in source files,
172+
and works similarly to the scope of local variables declared with `let` except
173+
it also applies at the module level. When `macro_rules!` is used to define a
174+
macro, the macro enters scope after the definition (note that it can still be
175+
used recursively, since names are looked up from the invocation site), up until
176+
its surrounding scope, typically a module, is closed. This can enter child
177+
modules and even span across multiple files:
178+
179+
```rust,ignore
180+
//// src/lib.rs
181+
mod has_macro {
182+
// m!{} // Error: m is not in scope.
183+
184+
macro_rules! m {
185+
() => {};
186+
}
187+
m!{} // OK: appears after declaration of m.
188+
189+
mod uses_macro;
190+
}
191+
192+
// m!{} // Error: m is not in scope.
193+
194+
//// src/has_macro/uses_macro.rs
195+
196+
m!{} // OK: appears after delcaration of m in src/lib.rs
197+
```
198+
199+
It is not an error to define a macro multiple times; the most recent declaration
200+
will shadow the previous one unless it has gone out of scope.
201+
202+
```rust
203+
macro_rules! m {
204+
(1) => {};
205+
}
206+
207+
m!(1);
208+
209+
mod inner {
210+
m!(1);
211+
212+
macro_rules! m {
213+
(2) => {};
214+
}
215+
// m!(1); // Error: no rule matches '1'
216+
m!(2);
217+
218+
macro_rules! m {
219+
(3) => {};
220+
}
221+
m!(3);
222+
}
223+
224+
m!(1);
225+
```
226+
227+
Macros can be declared and used locally inside functions as well, and work
228+
similarly:
229+
230+
```rust
231+
fn foo() {
232+
// m!(); // Error: m is not in scope.
233+
macro_rules! m {
234+
() => {};
235+
}
236+
m!();
237+
}
238+
239+
240+
// m!(); // Error: m is not in scope.
241+
```
242+
243+
The `#[macro_use]` attribute has two purposes. First, it can be used to make a
244+
module's macro scope not end when the module is closed, by applying it to a
245+
module:
246+
247+
```rust
248+
#[macro_use]
249+
mod inner {
250+
macro_rules! m {
251+
() => {};
252+
}
253+
}
254+
255+
m!();
256+
```
257+
258+
Second, it can be used to import macros from another crate, by attaching it to
259+
an `extern crate` declaration appearing in the crate's root module. Macros
260+
imported this way are imported into the prelude of the crate, not textually,
261+
which means that they can be uare shadowed by any other name. While macros
262+
imported by `#[macro_use]` can be used before the import statement, in case of a
263+
conflict, the last macro imported wins. Optionally, a list of macros to import
264+
can be specified; this is not supported when `#[macro_use]` is applied to a
265+
module.
266+
267+
```rust,ignore
268+
#[macro_use(lazy_static)] // Or #[macro_use] to import all macros.
269+
extern crate lazy_static;
270+
271+
lazy_static!{};
272+
// self::lazy_static!{} // Error: lazy_static is not defined inself
273+
```
274+
275+
Macros to be imported with `#[macro_use]` must be exported with
276+
`#[macro_export]`, which is described below.
277+
278+
### Path-Based Scope
279+
280+
By default, a macro has no path-based scope. However, if it has the
281+
`#[macro_export]` attribute, then it is declared in the crate root scope and can
282+
be referred to normally as such:
283+
284+
```rust
285+
self::m!();
286+
m!(); // OK: Path-based lookup finds m in the current module.
287+
288+
mod inner {
289+
super::m!();
290+
crate::m!();
291+
}
292+
293+
mod mac {
294+
#[macro_export]
295+
macro_rules! m {
296+
() => {};
297+
}
298+
}
299+
```
300+
301+
Macros labeled with `#[macro_export]` are always `pub` and can be referred to
302+
by other crates, either by path or by `#[macro_use]` as described above.
303+
304+
## Hygiene
305+
306+
By default, all identifiers referred to in a macro are expanded as-is, and are
307+
looked up at the macro's invocation site. This can lead to issues if a macro
308+
refers to an item or macro which isn't in scope at the invocation site. In order
309+
to alleviate this, the `$crate` metavariable can be used at the start of a path
310+
in order to force lookup to occur inside the crate defining the macro.
311+
312+
```rust,ignore
313+
//// Definitions in the `helper_macro` crate.
314+
#[macro_export]
315+
macro_rules! helped {
316+
// () => { helper!() } // This might lead to an error due to 'helper' not being in scope.
317+
() => { $crate::helper!() }
318+
}
319+
320+
#[macro_export]
321+
macro_rules! helper {
322+
() => { () }
323+
}
324+
325+
//// Usage in another crate.
326+
// Note that `helper_macro::helper` is not imported!
327+
use helper_macro::helped;
328+
329+
fn unit() {
330+
helped!();
331+
}
332+
```
333+
334+
Note that, because `$crate` refers to the current crate, it must be used with a
335+
fully qualified module path when referring to non-macro items:
336+
337+
```rust
338+
pub mod inner {
339+
#[macro_export]
340+
macro_rules! call_foo {
341+
() => { $crate::inner::foo() };
342+
}
343+
344+
pub fn foo() {}
345+
}
346+
```
347+
348+
Additionally, even though `$crate` allows a macro to refer to items within its
349+
own crate when expanding, its use has no effect on visibility. An item or macro
350+
referred to must still be visible from the invocation site. In the following
351+
example, any attempt to invoke `call_foo!()` from outside its crate will fail
352+
because `foo()` is not public.
353+
354+
```rust
355+
#[macro_export]
356+
macro_rules! call_foo {
357+
() => { $crate::foo() };
358+
}
359+
360+
fn foo() {}
361+
```
362+
363+
> **Version & Edition Differences**: Prior to Rust 1.30, `$crate` and
364+
> `local_inner_macros` (below) were unsupported. They were added alongside
365+
> path-based imports of macros (described above), in order to ensure that helper
366+
> macros did not need to be manually imported by users of a macro-exporting
367+
> crate. Crates written for earlier versions of Rust that use helper macros need
368+
> to be modified to use `$crate` or `local_inner_macros` in order to work well
369+
> with path-based imports.
370+
371+
When a macro is exported, the `#[macro_export]` attribute can have the
372+
`local_inner_macros` keyword added in order to automatically prefix all
373+
contained macro invocations with `$crate::`. This is intended primarily as a
374+
tool to migrate code written before `$crate` was added to the language to work
375+
with Rust 2018's path-based imports of macros. Its use is discouraged in new
376+
code.
377+
378+
```rust
379+
#[macro_export(local_inner_macros)]
380+
macro_rules! helped {
381+
() => { helper!() } // Automatically converted to $crate::helper!().
382+
}
383+
384+
#[macro_export]
385+
macro_rules! helper {
386+
() => { () }
387+
}
388+
```
389+
390+
## Follow-set Ambiguity Restrictions
391+
392+
The parser used by the macro system is reasonably powerful, but it is limited in
393+
order to prevent ambiguity in current or future versions of the language. In
394+
particular, in addition to the rule about ambiguous expansions, a nonterminal
395+
matched by a metavariable must be followed by a token which has been decided can
396+
be safely used after that kind of match.
397+
398+
As an example, a macro matcher like `$i:expr [ , ]` could in theory be accepted
399+
in Rust today, since `[,]` cannot be part of a legal expression and therefore
400+
the parse would always be unambiguous. However, because `[` can start trailing
401+
expressions, `[` is not a character which can safely be ruled out as coming
402+
after an expression. If `[,]` were accepted in a later version of Rust, this
403+
matcher would become ambiguous or would misparse, breaking working code.
404+
Matchers like `$i:expr,` or `$i:expr;` would be legal, however, because `,` and
405+
`;` are legal expression separators. The specific rules are:
406+
407+
* `expr` and `stmt` may only be followed by one of: `=>`, `,`, or `;`.
408+
* `pat` may only be followed by one of: `=>`, `,`, `=`, `|`, `if`, or `in`.
409+
* `path` and `ty` may only be followed by one of: `=>`, `,`, `=`, `|`, `;`,
410+
`:`, `>`, `>>`, `[`, `{`, `as`, `where`, or a macro variable of `block`
411+
fragment specifier.
412+
* `vis` may only be followed by one of: `,`, an identifier other than a
413+
non-raw `priv`, any token that can begin a type, or a metavariable with a
414+
`ident`, `ty`, or `path` fragment specifier.
415+
* All other fragment specifiers have no restrictions.
416+
417+
When repetitions are involved, then the rules apply to every possible number of
418+
expansions, taking separators into account. This means:
419+
420+
* If the repetition includes a separator, that separator must be able to
421+
follow the contents of the repitition.
422+
* If the repitition can repeat multiple times (`*` or `+`), then the contents
423+
must be able to follow themselves.
424+
* The contents of the repetition must be able to follow whatever comes
425+
before, and whatever comes after must be able to follow the contents of the
426+
repitition.
427+
* If the repitition can match zero times (`*` or `?`), then whatever comes
428+
after must be able to follow whatever comes before.
429+
430+
431+
For more detail, see the [formal specification].
75432

76433
[IDENTIFIER]: identifiers.html
77434
[IDENTIFIER_OR_KEYWORD]: identifiers.html
78435
[LIFETIME_TOKEN]: tokens.html#lifetimes-and-loop-labels
436+
[formal specification]: macro-ambiguity.html
79437
[_BlockExpression_]: expressions/block-expr.html
438+
[_DelimTokenTree_]: macros.html
80439
[_Expression_]: expressions.html
81440
[_Item_]: items.html
82441
[_LiteralExpression_]: expressions/literal-expr.html
83442
[_MetaItem_]: attributes.html
84443
[_Pattern_]: patterns.html
85444
[_Statement_]: statements.html
86445
[_TokenTree_]: macros.html#macro-invocation
446+
[_Token_]: tokens.html
87447
[_TypePath_]: paths.html#paths-in-types
88448
[_Type_]: types.html#type-expressions
89449
[_Visibility_]: visibility-and-privacy.html
90450
[token]: tokens.html
91-
92-
In the transcriber, the
93-
designator is already known, and so only the name of a matched nonterminal comes
94-
after the dollar sign.
95-
96-
In both the matcher and transcriber, the Kleene star-like operator indicates
97-
repetition. The Kleene star operator consists of `$` and parentheses,
98-
optionally followed by a separator token, followed by `*`, `+`, or `?`. `*`
99-
means zero or more repetitions; `+` means _at least_ one repetition; `?` means
100-
at most one repetition. The parentheses are not matched or transcribed. On the
101-
matcher side, a name is bound to _all_ of the names it matches, in a structure
102-
that mimics the structure of the repetition encountered on a successful match.
103-
The job of the transcriber is to sort that structure out. Also, `?`, unlike `*`
104-
and `+`, does _not_ allow a separator, since one could never match against it
105-
anyway.
106-
107-
> **Edition Differences**: The `?` Kleene operator did not exist before the
108-
> 2018 edition.
109-
110-
> **Edition Differences**: Prior to the 2018 Edition, `?` was an allowed
111-
> separator token, rather than a Kleene operator. It is no longer allowed as a
112-
> separator as of the 2018 edition. This avoids ambiguity with the `?` Kleene
113-
> operator.
114-
115-
The rules for transcription of these repetitions are called "Macro By Example".
116-
Essentially, one "layer" of repetition is discharged at a time, and all of them
117-
must be discharged by the time a name is transcribed. Therefore, `( $( $i:ident
118-
),* ) => ( $i )` is an invalid macro, but `( $( $i:ident ),* ) => ( $( $i:ident
119-
),* )` is acceptable (if trivial).
120-
121-
When Macro By Example encounters a repetition, it examines all of the `$`
122-
_name_ s that occur in its body. At the "current layer", they all must repeat
123-
the same number of times, so ` ( $( $i:ident ),* ; $( $j:ident ),* ) => ( $(
124-
($i,$j) ),* )` is valid if given the argument `(a,b,c ; d,e,f)`, but not
125-
`(a,b,c ; d,e)`. The repetition walks through the choices at that layer in
126-
lockstep, so the former input transcribes to `(a,d), (b,e), (c,f)`.
127-
128-
Nested repetitions are allowed.
129-
130-
### Parsing limitations
131-
132-
The parser used by the macro system is reasonably powerful, but the parsing of
133-
Rust syntax is restricted in two ways:
134-
135-
1. Macro definitions are required to include suitable separators after parsing
136-
expressions and other bits of the Rust grammar. This implies that
137-
a macro definition like `$i:expr [ , ]` is not legal, because `[` could be part
138-
of an expression. A macro definition like `$i:expr,` or `$i:expr;` would be legal,
139-
however, because `,` and `;` are legal separators. See [RFC 550] for more information.
140-
Specifically:
141-
142-
* `expr` and `stmt` may only be followed by one of `=>`, `,`, or `;`.
143-
* `pat` may only be followed by one of `=>`, `,`, `=`, `|`, `if`, or `in`.
144-
* `path` and `ty` may only be followed by one of `=>`, `,`, `=`, `|`, `;`,
145-
`:`, `>`, `>>`, `[`, `{`, `as`, `where`, or a macro variable of `block`
146-
fragment type.
147-
* `vis` may only be followed by one of `,`, `priv`, a raw identifier, any
148-
token that can begin a type, or a macro variable of `ident`, `ty`, or
149-
`path` fragment type.
150-
* All other fragment types have no restrictions.
151-
152-
2. The parser must have eliminated all ambiguity by the time it reaches a `$`
153-
_name_ `:` _designator_. This requirement most often affects name-designator
154-
pairs when they occur at the beginning of, or immediately after, a `$(...)*`;
155-
requiring a distinctive token in front can solve the problem. For example:
156-
157-
```rust
158-
// The matcher `$($i:ident)* $e:expr` would be ambiguous because the parser
159-
// would be forced to choose between an identifier or an expression. Use some
160-
// token to distinguish them.
161-
macro_rules! example {
162-
($(I $i:ident)* E $e:expr) => { ($($i)-*) * $e };
163-
}
164-
let foo = 2;
165-
let bar = 3;
166-
// The following expands to `(foo - bar) * 5`
167-
example!(I foo I bar E 5);
168-
```
169-
170-
[RFC 550]: https://github.com/rust-lang/rfcs/blob/master/text/0550-macro-future-proofing.md
171-
[_DelimTokenTree_]: macros.html
172-
[_Token_]: tokens.html
451+
[Hygiene]: #hygiene

0 commit comments

Comments
 (0)
Please sign in to comment.