Skip to content

Commit c5c4783

Browse files
authored
Point to the good ideas
1 parent 950b746 commit c5c4783

File tree

1 file changed

+1
-235
lines changed

1 file changed

+1
-235
lines changed

README.md

+1-235
Original file line numberDiff line numberDiff line change
@@ -1,238 +1,4 @@
11
If anyone stumbles here trying to find an incremental parser, here is the 1998 paper ([Efficient and Flexible Incremental Parsing](https://www.researchgate.net/profile/SL_Graham/publication/2377179_Efficient_and_Flexible_Incremental_Parsing/links/004635294e13f23ef1000000/Efficient-and-Flexible-Incremental-Parsing.pdf
22
)) that I should have found before embarking in this failed (expectations not met) project.
33

4-
# Parsley
5-
6-
## Parsnip
7-
8-
Parsley has been a test bed and a proof of concept for total incremental parsers. However it suffers from severe limitations (mainly revolving around lookaheads, both at the lexeme and production level) which hinder further development and acceptance.
9-
10-
Further development of the concepts and techniques explored in Parsley will occur in [Parsnip](https://github.com/cgrand/parsnip/).
11-
12-
## Introduction
13-
14-
Parsley generates *total and truly incremental parsers*.
15-
16-
Total: a Parsley parser *yields a parse-tree for any input string*.
17-
18-
Truly incremental: a Parsley parser can operate as a text buffer, in best cases
19-
recomputing the parse-tree after a sequence of edits happens in *logarithmic
20-
time* (worst case: it behaves like a restartable parser).
21-
22-
Parsley parsers have *no separate lexer*, this allows for better compositionality
23-
of grammars.
24-
25-
For now Parsley uses the same technique (for lexer-less parsing) as described
26-
in this paper:
27-
Context-Aware Scanning for Parsing Extensible Languages
28-
http://www.umsec.umn.edu/publications/Context-Aware-Scanning-Parsing-Extensible-Language
29-
30-
(I independently rediscovered this technique and dubbed it LR+.)
31-
32-
Without a separate lexer, a language is entirely defined by its grammar.
33-
A grammar is an alternation of keywords (non-terminal names) and other values.
34-
A keyword and another value form a production rule.
35-
36-
37-
## Specifying grammars
38-
39-
A simple grammar is:
40-
41-
:expr #{"x" ["(" :expr* ")"]}
42-
43-
`x` `()` `(xx)` `((x)())` are recognized by this grammar.
44-
45-
By default the main production of a grammar is the first one.
46-
47-
A production right value is a combination of:
48-
49-
* strings and regexes (terminals -- the set of terminal types is broader and
50-
even open, more later)
51-
* keywords (non-terminals) which can be suffixed by `*`, `+` or `?` to denote
52-
repetitions or options.
53-
* sets to denote an alternative
54-
* vectors to denote a sequence. Inside vectors `:*`, `:+` and `:?` are postfix unary
55-
operators. That is `["ab" :+]` denotes a non-empty repetition of the `ab`
56-
string
57-
58-
A production left value is always a keyword. If this keyword is suffixed by `-`,
59-
no node will be generated in the parse-tree for this rule, its child nodes are
60-
inlined in the parent node. Rules with such names are called anonymous rules.
61-
An anonymous rule must be referred to by its base name (without the `-`).
62-
63-
These two grammars specify the same language but the resulting parse-trees will
64-
be different (additional `:expr-rep` nodes):
65-
66-
:expr #{"x" ["(" :expr* ")"]}
67-
68-
:expr #{"x" :expr-rep}
69-
:expr-rep ["(" :expr* ")"]
70-
71-
These two grammars specify the same language and the same parse-trees:
72-
73-
:expr #{"x" ["(" :expr* ")"]}
74-
75-
:expr #{"x" :expr-rep}
76-
:expr-rep- ["(" :expr* ")"]
77-
78-
## Creating parsers
79-
80-
A parser is created using the `parser` or `make-parser` functions.
81-
82-
(require '[net.cgrand.parsley :as p])
83-
(def p (p/parser :expr #{"x" ["(" :expr* ")"]}))
84-
(pprint (p "(x(x))"))
85-
86-
{:tag :net.cgrand.parsley/root,
87-
:content
88-
[{:tag :expr,
89-
:content
90-
["("
91-
{:tag :expr, :content ["x"]}
92-
{:tag :expr, :content ["(" {:tag :expr, :content ["x"]} ")"]}
93-
")"]}]}
94-
95-
; running on malformed input with garbage
96-
(pprint (p "a(zldxn(dez)"))
97-
98-
{:tag :net.cgrand.parsley/unfinished,
99-
:content
100-
[{:tag :net.cgrand.parsley/unexpected, :content ["a"]}
101-
{:tag :net.cgrand.parsley/unfinished,
102-
:content
103-
["("
104-
{:tag :net.cgrand.parsley/unexpected, :content ["zld"]}
105-
{:tag :expr, :content ["x"]}
106-
{:tag :net.cgrand.parsley/unexpected, :content ["n"]}
107-
{:tag :expr,
108-
:content
109-
["("
110-
{:tag :net.cgrand.parsley/unexpected, :content ["dez"]}
111-
")"]}]}]}
112-
113-
114-
## Creating buffers
115-
116-
Creating a buffer, editing it and getting its resulting parse-tree:
117-
118-
(-> p p/incremental-buffer (p/edit 0 0 "(") (p/edit 1 0 "(x)") p/parse-tree pprint)
119-
120-
{:tag :net.cgrand.parsley/unfinished,
121-
:content
122-
[{:tag :net.cgrand.parsley/unfinished,
123-
:content
124-
["("
125-
{:tag :expr, :content ["(" {:tag :expr, :content ["x"]} ")"]}]}]}
126-
127-
Incremental parsing at work:
128-
129-
=> (def p (p/parser :expr #{"x" "\n" ["(" :expr* ")"]}))
130-
#'net.cgrand.parsley/p
131-
=> (let [line (apply str "\n" (repeat 10 "((x))"))
132-
input (str "(" (apply str (repeat 1000 line)) ")")
133-
buf (p/incremental-buffer p)
134-
buf (p/edit buf 0 0 input)]
135-
(time (p/parse-tree buf))
136-
(time (p/parse-tree (-> buf (p/edit 2 0 "(") (p/edit 51002 0 ")"))))
137-
nil)
138-
"Elapsed time: 508.834 msecs"
139-
"Elapsed time: 86.038 msecs"
140-
nil
141-
142-
Hence, *reparsing the buffer only took a fraction of the original time* despite
143-
the buffer having been modified at the start and at the end.
144-
145-
## Incremental parsing
146-
147-
The input string is split into _chunks_ (lines by default) and chunks are always
148-
reparsed as a whole, so don't experiment with incremental parsing with 1-line
149-
inputs!
150-
151-
Let's look at a bit more complex example:
152-
153-
=> (def p (p/parser {:main :expr*
154-
:space :ws?
155-
:make-node (fn [tag content] {:tag tag :content content :id (gensym)})}
156-
:ws #"\s+"
157-
:expr #{#"\w+" ["(" :expr* ")"]}))
158-
159-
This example introduces the option map: if the first arg to `parser` is a map
160-
(instead of a keyword), it's a map of options. See Options for more.
161-
162-
The important option here is that we redefine how nodes of the parse-tree are
163-
constructed (via the `make-node` option). We add a unique identifier to each
164-
node.
165-
166-
Now let's create a 3-line input and parse it:
167-
168-
=> (def buf (-> p incremental-buffer (edit 0 0 "((a)\n(b)\n(c))")))
169-
=> (-> buf parse-tree pprint)
170-
nil
171-
{:tag :net.cgrand.parsley/root,
172-
:content
173-
[{:tag :expr,
174-
:content
175-
["("
176-
{:tag :expr,
177-
:content ["(" {:tag :expr, :content ["a"], :id G__1806} ")"],
178-
:id G__1807}
179-
{:tag :ws, :content ["\n"], :id G__1808}
180-
{:tag :expr,
181-
:content ["(" {:tag :expr, :content ["b"], :id G__1809} ")"],
182-
:id G__1810}
183-
{:tag :ws, :content ["\n"], :id G__1811}
184-
{:tag :expr,
185-
:content ["(" {:tag :expr, :content ["c"], :id G__1812} ")"],
186-
:id G__1813}
187-
")"],
188-
:id G__1814}],
189-
:id G__1815}
190-
191-
Now, let's modify this "B" in "BOO" and parse the buffer again:
192-
193-
=> (-> buf (edit 6 1 "BOO") parse-tree pprint)
194-
nil
195-
{:tag :net.cgrand.parsley/root,
196-
:content
197-
[{:tag :expr,
198-
:content
199-
["("
200-
{:tag :expr,
201-
:content ["(" {:tag :expr, :content ["a"], :id G__1806} ")"],
202-
:id G__1807}
203-
{:tag :ws, :content ["\n"], :id G__1818}
204-
{:tag :expr,
205-
:content ["(" {:tag :expr, :content ["BOO"], :id G__1819} ")"],
206-
:id G__1820}
207-
{:tag :ws, :content ["\n"], :id G__1811}
208-
{:tag :expr,
209-
:content ["(" {:tag :expr, :content ["c"], :id G__1812} ")"],
210-
:id G__1813}
211-
")"],
212-
:id G__1821}],
213-
:id G__1822}
214-
-----
215-
216-
We can spot that 5 out of the 10 nodes are shared with the previous parse-tree.
217-
218-
219-
## Options
220-
221-
`:main` specifies the root production, by default this is the the first
222-
production of the grammar.
223-
224-
`:root-tag` specifies the tag name to use for the root node
225-
(`:net.cgrand.parsley/root` by default).
226-
227-
`:space` specifies a production which will be interspersed between every symbol
228-
(terminal or not) *except in a sequence created with `unspaced`.*
229-
230-
`:make-node` specifies a function whose arglist is `[tag children-vec]` which
231-
returns a new node. By default create instances the Node record with keys `tag`
232-
and `content`.
233-
234-
`:make-unexpected` specifies a 1-arg function which converts a string (of
235-
unexpected characters) to a node. By defaut delegates to `:make-node`.
236-
237-
`:make-leaf` specifies a 1-arg function which converts a string (token) to a
238-
node, by default behaves like identity.
4+
If you are still interested in Parsley, go read the old [README](DONTREADME.md)

0 commit comments

Comments
 (0)