|
2 | 2 |
|
3 | 3 | Macro expansion happens during parsing. `rustc` has two parsers, in fact: the
|
4 | 4 | normal Rust parser, and the macro parser. During the parsing phase, the normal
|
5 |
| -Rust parser will call into the macro parser when it encounters a macro. The |
6 |
| -macro parser, in turn, may call back out to the Rust parser when it needs to |
7 |
| -bind a metavariable (e.g. `$my_expr`). There are a few aspects of this system to |
8 |
| -be explained. The code for macro expansion is in `src/libsyntax/ext/tt/`. |
| 5 | +Rust parser will call into the macro parser when it encounters a macro |
| 6 | +definition or macro invocation (TODO: verify). The macro parser, in turn, may |
| 7 | +call back out to the Rust parser when it needs to bind a metavariable (e.g. |
| 8 | +`$my_expr`) while parsing the contents of a macro invocation. The code for macro |
| 9 | +expansion is in [`src/libsyntax/ext/tt/`][code_dir]. This chapter aims to |
| 10 | +explain how macro expansion works. |
| 11 | + |
| 12 | +### Example |
| 13 | + |
| 14 | +It's helpful to have an example to refer to. For the remainder of this chapter, |
| 15 | +whenever we refer to the "example _definition_", we mean the following: |
| 16 | + |
| 17 | +```rust |
| 18 | +macro_rules! printer { |
| 19 | + (print $mvar:ident) => { |
| 20 | + println!("{}", $mvar); |
| 21 | + } |
| 22 | + (print twice $mvar:ident) => { |
| 23 | + println!("{}", $mvar); |
| 24 | + println!("{}", $mvar); |
| 25 | + } |
| 26 | +} |
| 27 | +``` |
| 28 | + |
| 29 | +`$mvar` is called a _metavariable_. Unlike normal variables, rather than binding |
| 30 | +to a value in a computation, a metavariable binds _at compile time_ to a tree of |
| 31 | +_tokens_. A _token_ zero or more symbols that together have some meaning. For |
| 32 | +example, in our example definition, `print`, `$mvar`, `=>`, `{` are all tokens |
| 33 | +(though that's not an exhaustive list). There are also other special tokens, |
| 34 | +such as `EOF`, which indicates that there are no more tokens. The process of |
| 35 | +producing a stream of tokens from the raw bytes of the source file is called |
| 36 | +_lexing_. For more information about _lexing_, see the [Parsing |
| 37 | +chapter][parsing] of this book. |
| 38 | + |
| 39 | +Whenever we refer to the "example _invocation_", we mean the following snippet: |
| 40 | + |
| 41 | +```rust |
| 42 | +printer!(print foo); // Assume `foo` is a variable defined somewhere else... |
| 43 | +``` |
| 44 | + |
| 45 | +The process of expanding the macro invocation into the syntax tree |
| 46 | +`println!("{}", foo)` and then expanding that into a call to `Display::fmt` is |
| 47 | +called _macro expansion_, it is the topic of this chapter. |
9 | 48 |
|
10 | 49 | ### The macro parser
|
11 | 50 |
|
| 51 | +There are two parts to macro expansion: parsing the definition and parsing the |
| 52 | +invocations. Interestingly, both are done by the macro parser. |
| 53 | + |
12 | 54 | Basically, the macro parser is like an NFA-based regex parser. It uses an
|
13 | 55 | algorithm similar in spirit to the [Earley parsing
|
14 | 56 | algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is
|
15 |
| -defined in `src/libsyntax/ext/tt/macro_parser.rs`. |
16 |
| - |
17 |
| -In a traditional NFA-based parser, one common approach is to have some pattern |
18 |
| -which we are trying to match an input against. Moreover, we may try to capture |
19 |
| -some portion of the input and bind it to variable in the pattern. For example: |
20 |
| -suppose we have a pattern (borrowing Rust macro syntax) such as `a $b:ident a` |
21 |
| --- that is, an `a` token followed by an `ident` token followed by another `a` |
22 |
| -token. Given an input `a foo a`, the _metavariable_ `$b` would bind to the |
23 |
| -`ident` `foo`. On the other hand, an input `a foo b` would be rejected as a |
24 |
| -parse failure because the pattern `a <ident> a` cannot match `a foo b` (or as |
25 |
| -the compiler would put it, "no rules expected token `b`"). |
26 |
| - |
27 |
| -The macro parser does pretty much exactly that with one exception: in order to |
28 |
| -parse different types of metavariables, such as `ident`, `block`, `expr`, etc., |
29 |
| -the macro parser must sometimes call back to the normal Rust parser. |
30 |
| - |
31 |
| -Interestingly, both definitions and invokations of macros are parsed using the |
32 |
| -macro parser. This is extremely non-intuitive and self-referential. The code to |
33 |
| -parse macro _definitions_ is in `src/libsyntax/ext/tt/macro_rules.rs`. It |
34 |
| -defines the pattern for matching for a macro definition as `$( $lhs:tt => |
35 |
| -$rhs:tt );+`. In other words, a `macro_rules` defintion should have in its body |
36 |
| -at least one occurence of a token tree followed by `=>` followed by another |
37 |
| -token tree. When the compiler comes to a `macro_rules` definition, it uses this |
38 |
| -pattern to match the two token trees per rule in the definition of the macro |
39 |
| -_using the macro parser itself_. |
40 |
| - |
41 |
| -When the compiler comes to a macro invokation, it needs to parse that |
42 |
| -invokation. This is also known as _macro expansion_. The same NFA-based macro |
43 |
| -parser is used that is described above. Notably, the "pattern" (or _matcher_) |
44 |
| -used is the first token tree extracted from the rules of the macro _definition_. |
45 |
| -In other words, given some pattern described by the _definition_ of the macro, |
46 |
| -we want to match the contents of the _invokation_ of the macro. |
47 |
| - |
48 |
| -The algorithm is exactly the same, but when the macro parser comes to a place in |
49 |
| -the current matcher where it needs to match a _non-terminal_ (i.e. a |
50 |
| -metavariable), it calls back to the normal Rust parser to get the contents of |
51 |
| -that non-terminal. Then, the macro parser proceeds in parsing as normal. |
| 57 | +defined in [`src/libsyntax/ext/tt/macro_parser.rs`][code_mp]. |
| 58 | + |
| 59 | +The interface of the macro parser is as follows (this is slightly simplified): |
| 60 | + |
| 61 | +```rust |
| 62 | +fn parse( |
| 63 | + sess: ParserSession, |
| 64 | + tts: TokenStream, |
| 65 | + ms: &[TokenTree] |
| 66 | +) -> NamedParseResult |
| 67 | +``` |
| 68 | + |
| 69 | +In this interface: |
| 70 | + |
| 71 | +- `sess` is a "parsing session", which keeps track of some metadata. Most |
| 72 | + notably, this is used to keep track of errors that are generated so they can |
| 73 | + be reported to the user. |
| 74 | +- `tts` is a stream of tokens. The macro parser's job is to consume the raw |
| 75 | + stream of tokens and output a binding of metavariables to corresponding token |
| 76 | + trees. |
| 77 | +- `ms` a _matcher_. This is a sequence of token trees that we want to match |
| 78 | + `tts` against. |
| 79 | + |
| 80 | +In the analogy of a regex parser, `tts` is the input and we are matching it |
| 81 | +against the pattern `ms`. Using our examples, `tts` could be the stream of |
| 82 | +tokens containing the inside of the example invocation `print foo`, while `ms` |
| 83 | +might be the sequence of token (trees) `print $mvar:ident`. |
| 84 | + |
| 85 | +The output of the parser is a `NamedParserResult`, which indicates which of |
| 86 | +three cases has occured: |
| 87 | + |
| 88 | +- Success: `tts` matches the given matcher `ms`, and we have produced a binding |
| 89 | + from metavariables to the corresponding token trees. |
| 90 | +- Failure: `tts` does not match `ms`. This results in an error message such as |
| 91 | + "No rule expected token _blah_". |
| 92 | +- Error: some fatal error has occured _in the parser_. For example, this happens |
| 93 | + if there are more than one pattern match, since that indicates the macro is |
| 94 | + ambiguous. |
| 95 | + |
| 96 | +The full interface is defined [here][code_parse_int]. |
| 97 | + |
| 98 | +The macro parser does pretty much exactly the same as a normal regex parser with |
| 99 | +one exception: in order to parse different types of metavariables, such as |
| 100 | +`ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the |
| 101 | +normal Rust parser. |
| 102 | + |
| 103 | +As mentioned above, both definitions and invocations of macros are parsed using |
| 104 | +the macro parser. This is extremely non-intuitive and self-referential. The code |
| 105 | +to parse macro _definitions_ is in |
| 106 | +[`src/libsyntax/ext/tt/macro_rules.rs`][code_mr]. It defines the pattern for |
| 107 | +matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words, |
| 108 | +a `macro_rules` defintion should have in its body at least one occurence of a |
| 109 | +token tree followed by `=>` followed by another token tree. When the compiler |
| 110 | +comes to a `macro_rules` definition, it uses this pattern to match the two token |
| 111 | +trees per rule in the definition of the macro _using the macro parser itself_. |
| 112 | +In our example definition, the metavariable `$lhs` would match the patterns of |
| 113 | +both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs` |
| 114 | +would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{ |
| 115 | +println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this |
| 116 | +knowledge around for when it needs to expand a macro invocation. |
| 117 | + |
| 118 | +When the compiler comes to a macro invocation, it parses that invocation using |
| 119 | +the same NFA-based macro parser that is described above. However, the matcher |
| 120 | +used is the first token tree (`$lhs`) extracted from the arms of the macro |
| 121 | +_definition_. Using our example, we would try to match the token stream `print |
| 122 | +foo` from the invocation against the matchers `print $mvar:ident` and `print |
| 123 | +twice $mvar:ident` that we previously extracted from the definition. The |
| 124 | +algorithm is exactly the same, but when the macro parser comes to a place in the |
| 125 | +current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`), |
| 126 | +it calls back to the normal Rust parser to get the contents of that |
| 127 | +non-terminal. In this case, the Rust parser would look for an `ident` token, |
| 128 | +which it finds (`foo`) and returns to the macro parser. Then, the macro parser |
| 129 | +proceeds in parsing as normal. Also, note that exactly one of the matchers from |
| 130 | +the various arms should match the invocation (otherwise, the macro is |
| 131 | +ambiguous). |
52 | 132 |
|
53 | 133 | For more information about the macro parser's implementation, see the comments
|
54 |
| -in `src/libsyntax/ext/tt/macro_parser.rs`. |
| 134 | +in [`src/libsyntax/ext/tt/macro_parser.rs`][code_mp]. |
55 | 135 |
|
56 | 136 | ### Hygiene
|
57 | 137 |
|
|
64 | 144 | ### Custom Derive
|
65 | 145 |
|
66 | 146 | TODO
|
| 147 | + |
| 148 | + |
| 149 | + |
| 150 | +[code_dir]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt |
| 151 | +[code_mp]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt/macro_parser.rs |
| 152 | +[code_mp]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt/macro_rules.rs |
| 153 | +[code_parse_int]: https://github.com/rust-lang/rust/blob/a97cd17f5d71fb4ec362f4fbd79373a6e7ed7b82/src/libsyntax/ext/tt/macro_parser.rs#L421 |
| 154 | +[parsing]: ./the-parser.md |
0 commit comments