Skip to content

Commit 3340922

Browse files
authored
Merge pull request #1787 from ehuss/railroad-grammar
Add a new grammar renderer
2 parents 9d57724 + dab73ea commit 3340922

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

87 files changed

+3306
-1805
lines changed

book.toml

+1
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ smart-punctuation = true
1212

1313
[output.html.search.chapter]
1414
"test-summary.md" = { enable = false }
15+
"grammar.md" = { enable = false }
1516

1617
[output.html.redirect]
1718
"/expressions/enum-variant-expr.html" = "struct-expr.html"

docs/authoring.md

+4
Original file line numberDiff line numberDiff line change
@@ -214,3 +214,7 @@ r[foo.bar.edition2021]
214214
> [!EDITION-2021]
215215
> Describe what changed in 2021.
216216
```
217+
218+
## Grammar
219+
220+
See [Grammar](grammar.md) for details on how to write grammar rules.

docs/grammar.md

+122
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Grammar
2+
3+
The Reference grammar is written in markdown code blocks using a modified BNF-like syntax (with a blend of regex and other arbitrary things). The `mdbook-spec` extension parses these rules and converts them to a renderable format, including railroad diagrams.
4+
5+
The code block should have a lang string with the word "grammar", a comma, and the category of the grammar, like this:
6+
7+
~~~
8+
```grammar,items
9+
ProductionName -> SomeExpression
10+
```
11+
~~~
12+
13+
The category is used to group similar productions on the grammar summary page in the appendix.
14+
15+
## Grammar syntax
16+
17+
The syntax for the grammar itself is pretty close to what is described in the [Notation chapter](../src/notation.md), though there are some rendering differences.
18+
19+
A "root" production, marked with `@root`, is one that is not used in any other production.
20+
21+
The syntax for the grammar itself (written in itself, hopefully that's not too confusing) is:
22+
23+
```
24+
Grammar -> Production+
25+
26+
BACKTICK -> U+0060
27+
28+
LF -> U+000A
29+
30+
Production -> `@root`? Name ` ->` Expression
31+
32+
Name -> <Alphanumeric or `_`>+
33+
34+
Expression -> Sequence (` `* `|` ` `* Sequence)*
35+
36+
Sequence -> (` `* AdornedExpr)+
37+
38+
AdornedExpr -> ExprRepeat Suffix? Footnote?
39+
40+
Suffix -> ` _` <not underscore, unless in backtick>* `_`
41+
42+
Footnote -> `[^` ~[`]` LF]+ `]`
43+
44+
ExprRepeat ->
45+
Expr1 `?`
46+
| Expr1 `*?`
47+
| Expr1 `*`
48+
| Expr1 `+?`
49+
| Expr1 `+`
50+
| Expr1 `{` Range? `..` Range? `}`
51+
52+
Range -> [0-9]+
53+
54+
Expr1 ->
55+
Unicode
56+
| NonTerminal
57+
| Break
58+
| Terminal
59+
| Charset
60+
| Prose
61+
| Group
62+
| NegativeExpression
63+
64+
Unicode -> `U+` [`A`-`Z` `0`-`9`]4..4
65+
66+
NonTerminal -> Name
67+
68+
Break -> LF ` `+
69+
70+
Terminal -> BACKTICK ~[LF]+ BACKTICK
71+
72+
Charset -> `[` (` `* Characters)+ ` `* `]`
73+
74+
Characters ->
75+
CharacterRange
76+
| CharacterTerminal
77+
| CharacterName
78+
79+
CharacterRange -> BACKTICK <any char> BACKTICK `-` BACKTICK <any char> BACKTICK
80+
81+
CharacterTerminal -> Terminal
82+
83+
CharacterName -> Name
84+
85+
Prose -> `<` ~[`>` LF]+ `>`
86+
87+
Group -> `(` ` `* Expression ` `* `)`
88+
89+
NegativeExpression -> `~` ( Charset | Terminal | NonTerminal )
90+
```
91+
92+
The general format is a series of productions separated by blank lines. The expressions are:
93+
94+
| Expression | Example | Description |
95+
|------------|---------|-------------|
96+
| Unicode | U+0060 | A single unicode character. |
97+
| NonTerminal | FunctionParameters | A reference to another production by name. |
98+
| Break | | This is used internally by the renderer to detect line breaks and indentation. |
99+
| Terminal | \`example\` | This is a sequence of exact characters, surrounded by backticks |
100+
| Charset | [ \`A\`-\`Z\` \`0\`-\`9\` \`_\` ] | A choice from a set of characters, space separated. There are three different forms. |
101+
| CharacterRange | [ \`A\`-\`Z\` ] | A range of characters, each character should be in backticks.
102+
| CharacterTerminal | [ \`x\` ] | A single character, surrounded by backticks. |
103+
| CharacterName | [ LF ] | A nonterminal, referring to another production. |
104+
| Prose | \<any ASCII character except CR\> | This is an English description of what should be matched, surrounded in angle brackets. |
105+
| Group | (\`,\` Parameter)+ | This groups an expression for the purpose of precedence, such as applying a repetition operator to a sequence of other expressions.
106+
| NegativeExpression | ~[\` \` LF] | Matches anything except the given Charset, Terminal, or Nonterminal. |
107+
| Sequence | \`fn\` Name Parameters | A sequence of expressions, where they must match in order. |
108+
| Alternation | Expr1 \| Expr2 | Matches only one of the given expressions, separated by the vertical pipe character. |
109+
| Suffix | \_except \[LazyBooleanExpression\]\_ | This adds a suffix to the previous expression to provide an additional English description to it, rendered in subscript. This can have limited markdown, but try to avoid anything except basics like links. |
110+
| Footnote | \[^extern-safe\] | This adds a footnote, which can supply some extra information that may be helpful to the user. The footnote itself should be defined outside of the code block like a normal markdown footnote. |
111+
| Optional | Expr? | The preceding expression is optional. |
112+
| Repeat | Expr* | The preceding expression is repeated 0 or more times. |
113+
| Repeat (non-greedy) | Expr*? | The preceding expression is repeated 0 or more times without being greedy. |
114+
| RepeatPlus | Expr+ | The preceding expression is repeated 1 or more times. |
115+
| RepeatPlus (non-greedy) | Expr+? | The preceding expression is repeated 1 or more times without being greedy. |
116+
| RepeatRange | Expr{2..4} | The preceding expression is repeated between the range of times specified. Either bounds can be excluded, which works just like Rust ranges. |
117+
118+
## Automatic linking
119+
120+
The plugin automatically adds markdown link definitions for all the production names on every page. If you want to link directly to a production name, all you need to do is surround it in square brackets, like `[ArrayExpression]`.
121+
122+
In some cases there might be name collisions with the automatic linking of rule names. In that case, disambiguate with the `grammar-` prefix, such as `[Type][grammar-Type]`. You can also do that if you just feel like being more explicit.

mdbook-spec/Cargo.lock

+16
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

mdbook-spec/Cargo.toml

+2
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ edition = "2024"
55
license = "MIT OR Apache-2.0"
66
description = "An mdBook preprocessor to help with the Rust specification."
77
repository = "https://github.com/rust-lang/spec/"
8+
default-run = "mdbook-spec"
89

910
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
1011

@@ -15,6 +16,7 @@ once_cell = "1.19.0"
1516
pathdiff = "0.2.1"
1617
# Try to keep in sync with mdbook.
1718
pulldown-cmark = { version = "0.10.3", default-features = false }
19+
railroad = { version = "0.3.2", default-features = false }
1820
regex = "1.9.4"
1921
semver = "1.0.21"
2022
serde_json = "1.0.113"

0 commit comments

Comments
 (0)