Skip to content

Commit 858dfdf

Browse files
committed
Updated macros to address Niko's comments
1 parent ba3dd18 commit 858dfdf

File tree

1 file changed

+130
-42
lines changed

1 file changed

+130
-42
lines changed

src/macro-expansion.md

Lines changed: 130 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -2,56 +2,136 @@
22

33
Macro expansion happens during parsing. `rustc` has two parsers, in fact: the
44
normal Rust parser, and the macro parser. During the parsing phase, the normal
5-
Rust parser will call into the macro parser when it encounters a macro. The
6-
macro parser, in turn, may call back out to the Rust parser when it needs to
7-
bind a metavariable (e.g. `$my_expr`). There are a few aspects of this system to
8-
be explained. The code for macro expansion is in `src/libsyntax/ext/tt/`.
5+
Rust parser will call into the macro parser when it encounters a macro
6+
definition or macro invocation (TODO: verify). The macro parser, in turn, may
7+
call back out to the Rust parser when it needs to bind a metavariable (e.g.
8+
`$my_expr`) while parsing the contents of a macro invocation. The code for macro
9+
expansion is in [`src/libsyntax/ext/tt/`][code_dir]. This chapter aims to
10+
explain how macro expansion works.
11+
12+
### Example
13+
14+
It's helpful to have an example to refer to. For the remainder of this chapter,
15+
whenever we refer to the "example _definition_", we mean the following:
16+
17+
```rust
18+
macro_rules! printer {
19+
(print $mvar:ident) => {
20+
println!("{}", $mvar);
21+
}
22+
(print twice $mvar:ident) => {
23+
println!("{}", $mvar);
24+
println!("{}", $mvar);
25+
}
26+
}
27+
```
28+
29+
`$mvar` is called a _metavariable_. Unlike normal variables, rather than binding
30+
to a value in a computation, a metavariable binds _at compile time_ to a tree of
31+
_tokens_. A _token_ zero or more symbols that together have some meaning. For
32+
example, in our example definition, `print`, `$mvar`, `=>`, `{` are all tokens
33+
(though that's not an exhaustive list). There are also other special tokens,
34+
such as `EOF`, which indicates that there are no more tokens. The process of
35+
producing a stream of tokens from the raw bytes of the source file is called
36+
_lexing_. For more information about _lexing_, see the [Parsing
37+
chapter][parsing] of this book.
38+
39+
Whenever we refer to the "example _invocation_", we mean the following snippet:
40+
41+
```rust
42+
printer!(print foo); // Assume `foo` is a variable defined somewhere else...
43+
```
44+
45+
The process of expanding the macro invocation into the syntax tree
46+
`println!("{}", foo)` and then expanding that into a call to `Display::fmt` is
47+
called _macro expansion_, it is the topic of this chapter.
948

1049
### The macro parser
1150

51+
There are two parts to macro expansion: parsing the definition and parsing the
52+
invocations. Interestingly, both are done by the macro parser.
53+
1254
Basically, the macro parser is like an NFA-based regex parser. It uses an
1355
algorithm similar in spirit to the [Earley parsing
1456
algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is
15-
defined in `src/libsyntax/ext/tt/macro_parser.rs`.
16-
17-
In a traditional NFA-based parser, one common approach is to have some pattern
18-
which we are trying to match an input against. Moreover, we may try to capture
19-
some portion of the input and bind it to variable in the pattern. For example:
20-
suppose we have a pattern (borrowing Rust macro syntax) such as `a $b:ident a`
21-
-- that is, an `a` token followed by an `ident` token followed by another `a`
22-
token. Given an input `a foo a`, the _metavariable_ `$b` would bind to the
23-
`ident` `foo`. On the other hand, an input `a foo b` would be rejected as a
24-
parse failure because the pattern `a <ident> a` cannot match `a foo b` (or as
25-
the compiler would put it, "no rules expected token `b`").
26-
27-
The macro parser does pretty much exactly that with one exception: in order to
28-
parse different types of metavariables, such as `ident`, `block`, `expr`, etc.,
29-
the macro parser must sometimes call back to the normal Rust parser.
30-
31-
Interestingly, both definitions and invokations of macros are parsed using the
32-
macro parser. This is extremely non-intuitive and self-referential. The code to
33-
parse macro _definitions_ is in `src/libsyntax/ext/tt/macro_rules.rs`. It
34-
defines the pattern for matching for a macro definition as `$( $lhs:tt =>
35-
$rhs:tt );+`. In other words, a `macro_rules` defintion should have in its body
36-
at least one occurence of a token tree followed by `=>` followed by another
37-
token tree. When the compiler comes to a `macro_rules` definition, it uses this
38-
pattern to match the two token trees per rule in the definition of the macro
39-
_using the macro parser itself_.
40-
41-
When the compiler comes to a macro invokation, it needs to parse that
42-
invokation. This is also known as _macro expansion_. The same NFA-based macro
43-
parser is used that is described above. Notably, the "pattern" (or _matcher_)
44-
used is the first token tree extracted from the rules of the macro _definition_.
45-
In other words, given some pattern described by the _definition_ of the macro,
46-
we want to match the contents of the _invokation_ of the macro.
47-
48-
The algorithm is exactly the same, but when the macro parser comes to a place in
49-
the current matcher where it needs to match a _non-terminal_ (i.e. a
50-
metavariable), it calls back to the normal Rust parser to get the contents of
51-
that non-terminal. Then, the macro parser proceeds in parsing as normal.
57+
defined in [`src/libsyntax/ext/tt/macro_parser.rs`][code_mp].
58+
59+
The interface of the macro parser is as follows (this is slightly simplified):
60+
61+
```rust
62+
fn parse(
63+
sess: ParserSession,
64+
tts: TokenStream,
65+
ms: &[TokenTree]
66+
) -> NamedParseResult
67+
```
68+
69+
In this interface:
70+
71+
- `sess` is a "parsing session", which keeps track of some metadata. Most
72+
notably, this is used to keep track of errors that are generated so they can
73+
be reported to the user.
74+
- `tts` is a stream of tokens. The macro parser's job is to consume the raw
75+
stream of tokens and output a binding of metavariables to corresponding token
76+
trees.
77+
- `ms` a _matcher_. This is a sequence of token trees that we want to match
78+
`tts` against.
79+
80+
In the analogy of a regex parser, `tts` is the input and we are matching it
81+
against the pattern `ms`. Using our examples, `tts` could be the stream of
82+
tokens containing the inside of the example invocation `print foo`, while `ms`
83+
might be the sequence of token (trees) `print $mvar:ident`.
84+
85+
The output of the parser is a `NamedParserResult`, which indicates which of
86+
three cases has occured:
87+
88+
- Success: `tts` matches the given matcher `ms`, and we have produced a binding
89+
from metavariables to the corresponding token trees.
90+
- Failure: `tts` does not match `ms`. This results in an error message such as
91+
"No rule expected token _blah_".
92+
- Error: some fatal error has occured _in the parser_. For example, this happens
93+
if there are more than one pattern match, since that indicates the macro is
94+
ambiguous.
95+
96+
The full interface is defined [here][code_parse_int].
97+
98+
The macro parser does pretty much exactly the same as a normal regex parser with
99+
one exception: in order to parse different types of metavariables, such as
100+
`ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the
101+
normal Rust parser.
102+
103+
As mentioned above, both definitions and invocations of macros are parsed using
104+
the macro parser. This is extremely non-intuitive and self-referential. The code
105+
to parse macro _definitions_ is in
106+
[`src/libsyntax/ext/tt/macro_rules.rs`][code_mr]. It defines the pattern for
107+
matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words,
108+
a `macro_rules` defintion should have in its body at least one occurence of a
109+
token tree followed by `=>` followed by another token tree. When the compiler
110+
comes to a `macro_rules` definition, it uses this pattern to match the two token
111+
trees per rule in the definition of the macro _using the macro parser itself_.
112+
In our example definition, the metavariable `$lhs` would match the patterns of
113+
both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs`
114+
would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{
115+
println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this
116+
knowledge around for when it needs to expand a macro invocation.
117+
118+
When the compiler comes to a macro invocation, it parses that invocation using
119+
the same NFA-based macro parser that is described above. However, the matcher
120+
used is the first token tree (`$lhs`) extracted from the arms of the macro
121+
_definition_. Using our example, we would try to match the token stream `print
122+
foo` from the invocation against the matchers `print $mvar:ident` and `print
123+
twice $mvar:ident` that we previously extracted from the definition. The
124+
algorithm is exactly the same, but when the macro parser comes to a place in the
125+
current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
126+
it calls back to the normal Rust parser to get the contents of that
127+
non-terminal. In this case, the Rust parser would look for an `ident` token,
128+
which it finds (`foo`) and returns to the macro parser. Then, the macro parser
129+
proceeds in parsing as normal. Also, note that exactly one of the matchers from
130+
the various arms should match the invocation (otherwise, the macro is
131+
ambiguous).
52132

53133
For more information about the macro parser's implementation, see the comments
54-
in `src/libsyntax/ext/tt/macro_parser.rs`.
134+
in [`src/libsyntax/ext/tt/macro_parser.rs`][code_mp].
55135

56136
### Hygiene
57137

@@ -64,3 +144,11 @@ TODO
64144
### Custom Derive
65145

66146
TODO
147+
148+
149+
150+
[code_dir]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt
151+
[code_mp]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt/macro_parser.rs
152+
[code_mp]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt/macro_rules.rs
153+
[code_parse_int]: https://github.com/rust-lang/rust/blob/a97cd17f5d71fb4ec362f4fbd79373a6e7ed7b82/src/libsyntax/ext/tt/macro_parser.rs#L421
154+
[parsing]: ./the-parser.md

0 commit comments

Comments
 (0)