|
| 1 | +# Grammar |
| 2 | + |
| 3 | +The Reference grammar is written in markdown code blocks using a modified BNF-like syntax (with a blend of regex and other arbitrary things). The `mdbook-spec` extension parses these rules and converts them to a renderable format, including railroad diagrams. |
| 4 | + |
| 5 | +The code block should have a lang string with the word "grammar", a comma, and the category of the grammar, like this: |
| 6 | + |
| 7 | +~~~ |
| 8 | +```grammar,items |
| 9 | +ProductionName -> SomeExpression |
| 10 | +``` |
| 11 | +~~~ |
| 12 | + |
| 13 | +The category is used to group similar productions on the grammar summary page in the appendix. |
| 14 | + |
| 15 | +## Grammar syntax |
| 16 | + |
| 17 | +The syntax for the grammar itself is pretty close to what is described in the [Notation chapter](../src/notation.md), though there are some rendering differences. |
| 18 | + |
| 19 | +A "root" production, marked with `@root`, is one that is not used in any other production. |
| 20 | + |
| 21 | +The syntax for the grammar itself (written in itself, hopefully that's not too confusing) is: |
| 22 | + |
| 23 | +``` |
| 24 | +Grammar -> Production+ |
| 25 | +
|
| 26 | +BACKTICK -> U+0060 |
| 27 | +
|
| 28 | +LF -> U+000A |
| 29 | +
|
| 30 | +Production -> `@root`? Name ` ->` Expression |
| 31 | +
|
| 32 | +Name -> <Alphanumeric or `_`>+ |
| 33 | +
|
| 34 | +Expression -> Sequence (` `* `|` ` `* Sequence)* |
| 35 | +
|
| 36 | +Sequence -> (` `* AdornedExpr)+ |
| 37 | +
|
| 38 | +AdornedExpr -> ExprRepeat Suffix? Footnote? |
| 39 | +
|
| 40 | +Suffix -> ` _` <not underscore, unless in backtick>* `_` |
| 41 | +
|
| 42 | +Footnote -> `[^` ~[`]` LF]+ `]` |
| 43 | +
|
| 44 | +ExprRepeat -> |
| 45 | + Expr1 `?` |
| 46 | + | Expr1 `*?` |
| 47 | + | Expr1 `*` |
| 48 | + | Expr1 `+?` |
| 49 | + | Expr1 `+` |
| 50 | + | Expr1 `{` Range? `..` Range? `}` |
| 51 | +
|
| 52 | +Range -> [0-9]+ |
| 53 | +
|
| 54 | +Expr1 -> |
| 55 | + Unicode |
| 56 | + | NonTerminal |
| 57 | + | Break |
| 58 | + | Terminal |
| 59 | + | Charset |
| 60 | + | Prose |
| 61 | + | Group |
| 62 | + | NegativeExpression |
| 63 | +
|
| 64 | +Unicode -> `U+` [`A`-`Z` `0`-`9`]4..4 |
| 65 | +
|
| 66 | +NonTerminal -> Name |
| 67 | +
|
| 68 | +Break -> LF ` `+ |
| 69 | +
|
| 70 | +Terminal -> BACKTICK ~[LF]+ BACKTICK |
| 71 | +
|
| 72 | +Charset -> `[` (` `* Characters)+ ` `* `]` |
| 73 | +
|
| 74 | +Characters -> |
| 75 | + CharacterRange |
| 76 | + | CharacterTerminal |
| 77 | + | CharacterName |
| 78 | +
|
| 79 | +CharacterRange -> BACKTICK <any char> BACKTICK `-` BACKTICK <any char> BACKTICK |
| 80 | +
|
| 81 | +CharacterTerminal -> Terminal |
| 82 | +
|
| 83 | +CharacterName -> Name |
| 84 | +
|
| 85 | +Prose -> `<` ~[`>` LF]+ `>` |
| 86 | +
|
| 87 | +Group -> `(` ` `* Expression ` `* `)` |
| 88 | +
|
| 89 | +NegativeExpression -> `~` ( Charset | Terminal | NonTerminal ) |
| 90 | +``` |
| 91 | + |
| 92 | +The general format is a series of productions separated by blank lines. The expressions are: |
| 93 | + |
| 94 | +| Expression | Example | Description | |
| 95 | +|------------|---------|-------------| |
| 96 | +| Unicode | U+0060 | A single unicode character. | |
| 97 | +| NonTerminal | FunctionParameters | A reference to another production by name. | |
| 98 | +| Break | | This is used internally by the renderer to detect line breaks and indentation. | |
| 99 | +| Terminal | \`example\` | This is a sequence of exact characters, surrounded by backticks | |
| 100 | +| Charset | [ \`A\`-\`Z\` \`0\`-\`9\` \`_\` ] | A choice from a set of characters, space separated. There are three different forms. | |
| 101 | +| CharacterRange | [ \`A\`-\`Z\` ] | A range of characters, each character should be in backticks. |
| 102 | +| CharacterTerminal | [ \`x\` ] | A single character, surrounded by backticks. | |
| 103 | +| CharacterName | [ LF ] | A nonterminal, referring to another production. | |
| 104 | +| Prose | \<any ASCII character except CR\> | This is an English description of what should be matched, surrounded in angle brackets. | |
| 105 | +| Group | (\`,\` Parameter)+ | This groups an expression for the purpose of precedence, such as applying a repetition operator to a sequence of other expressions. |
| 106 | +| NegativeExpression | ~[\` \` LF] | Matches anything except the given Charset, Terminal, or Nonterminal. | |
| 107 | +| Sequence | \`fn\` Name Parameters | A sequence of expressions, where they must match in order. | |
| 108 | +| Alternation | Expr1 \| Expr2 | Matches only one of the given expressions, separated by the vertical pipe character. | |
| 109 | +| Suffix | \_except \[LazyBooleanExpression\]\_ | This adds a suffix to the previous expression to provide an additional English description to it, rendered in subscript. This can have limited markdown, but try to avoid anything except basics like links. | |
| 110 | +| Footnote | \[^extern-safe\] | This adds a footnote, which can supply some extra information that may be helpful to the user. The footnote itself should be defined outside of the code block like a normal markdown footnote. | |
| 111 | +| Optional | Expr? | The preceding expression is optional. | |
| 112 | +| Repeat | Expr* | The preceding expression is repeated 0 or more times. | |
| 113 | +| Repeat (non-greedy) | Expr*? | The preceding expression is repeated 0 or more times without being greedy. | |
| 114 | +| RepeatPlus | Expr+ | The preceding expression is repeated 1 or more times. | |
| 115 | +| RepeatPlus (non-greedy) | Expr+? | The preceding expression is repeated 1 or more times without being greedy. | |
| 116 | +| RepeatRange | Expr{2..4} | The preceding expression is repeated between the range of times specified. Either bounds can be excluded, which works just like Rust ranges. | |
| 117 | + |
| 118 | +## Automatic linking |
| 119 | + |
| 120 | +The plugin automatically adds markdown link definitions for all the production names on every page. If you want to link directly to a production name, all you need to do is surround it in square brackets, like `[ArrayExpression]`. |
| 121 | + |
| 122 | +In some cases there might be name collisions with the automatic linking of rule names. In that case, disambiguate with the `grammar-` prefix, such as `[Type][grammar-Type]`. You can also do that if you just feel like being more explicit. |
0 commit comments