@@ -123,7 +123,7 @@ be a <em>sub</em> language we need to more completely integrate it
123
123
with Felix code.
124
124
125
125
The @{Match} function always matches the whole string. To work around this
126
- we can define do this:
126
+ we can define this:
127
127
@tangle re1.flx
128
128
var anys = ".*";
129
129
regdef anystring = perl(anys);
@@ -140,7 +140,7 @@ repex string generating code you like.
140
140
141
141
@h2 The term tree
142
142
There is a second way to generate a regex by using a combinator tree
143
- or type @[ regex} directly. In fact, the Regdef DSSL grammar just provides a convenient
143
+ or type @{ regex} directly. In fact, the Regdef DSSL grammar just provides a convenient
144
144
syntax for generating these trees. Here is the important part of the
145
145
definition from the library:
146
146
@felix
@@ -169,39 +169,135 @@ which have to be cited literally as shown. What if you wanted to load
169
169
the keyword list from a file?
170
170
171
171
You mean, like this?
172
- @felix
173
- var kws : Regdef::regex = Regdef::Alts(load("keywords.txt").split("\n"));
174
- regdef appl = " "* felix (kws) " "+ ffloat;
172
+ @tangle re1.flx
173
+ var data = "proc fun variant";
174
+ var lines = split(data," ");
175
+ var kws = unbox$ map (fun (x:string) => Regdef::String x) lines;
176
+ var kws_r = kws.Regdef::Alts;
177
+ println$ kws_r.str;
175
178
@
176
- Here we constructed the alternatives term directly from a list of strings
177
- loaded from a file, and then lifted it into the grammar. As you can guess
178
- since the parser is building a @{regex} tree anyhow, the @{felix} term is
179
- a kind of escape of quotation which has no semantics, it just avoid
180
- translating the quoted term.
181
-
182
- And as you can see you can put Perl strings directly in there too
183
- using the @{Perl} constructor of the variant. Which of course
184
- is exactly what the @{perl} syntax element of the grammar does!
185
-
186
- SO you basically have a three level language system: a simple
187
- DSSL, the combinatorial form which is properly type checked,
188
- and a super lame <em>do not use except in emergency</em> form
189
- using strings with Perl encoded regexps.
190
-
191
- The primary point to be demonsrated here is the <em>sub</em> part
192
- of the DSSL concept. We have a domain specific language, yes, but
193
- it integrates completely with Felix. This ensure all the power
194
- of the base language is available in the sub language, whilst
195
- the sub language grammar eliminates error prone and hard to read boilerplate
196
- although it might not fully cover all capabilities.
197
179
180
+ @h2 Inline regex
181
+ It is possible to build regex inline like this:
182
+ @tangle re1.flx
183
+ var cid2 = regexp ( cidlead cidtrail*);
184
+ @
185
+ which in this case is precisely equivalent to saying
186
+ @felix
187
+ regdef cide = cidlead | cidtrail*;
188
+ @
189
+ The parser switched to the regex syntax inside the parens.
190
+ Remember, a @{regex} is just a simple variant that builds a term tree.
191
+ The grammar that does all this is defined in the library; that is,
192
+ in user space.
193
+
194
+ @h1 Pattern matching
195
+ It is possible to pattern match with regexps. The ability to do
196
+ this is quite general built in to the way the compiler works,
197
+ rather than a feature of the regex DSSL: we enable user defined
198
+ pattern matches using a feature of Felix pattern matching
199
+ known as <em>higher order pattern matching</em>.
200
+
201
+ The way pattern matches work in Felix requires two functions:
202
+ <ol>
203
+ <li>The <em>match checker</em> tests to see if a pattern matches the match argument</li>
204
+ <li>The <em>extractor</em> fetches the argument of the variant constructor matched</li>
205
+ </ol>
206
+
207
+ The way this works with Felix data types is built in to the compiler,
208
+ but there is a way to write your own checker and extractor functions:
209
+ @tangle re1.flx
210
+ // the match checker
211
+ fun _match_ctor_re (r:RE2) (x:string) => x in r;
212
+
213
+ // the extractor
214
+ fun _ctor_arg_re (r:RE2) (x:string) =>
215
+ match Match (r,x) with
216
+ | Some y => y
217
+ // None case shouldn't happen!
218
+ endmatch
219
+ ;
220
+
221
+ // test case
222
+ match "Hello" with
223
+ | re "H(ell)o".RE2 y => println$ y.1;
224
+ endmatch;
225
+ @
198
226
227
+ In this code we invent a new pretend constructor @{re} which takes two arguments.
228
+ The first is the pattern we want to consider, and the second is the data we're
229
+ checking against. In our case the pattern is an @{RE2} and the data is a string;
230
+ but the technique is fully general.
199
231
232
+ The first function has a magic name used for checking is the string is in
233
+ the regexp, the second provides the way to get data out of it. Notice the
234
+ extractor is called <em>if, and only if</em> the match checker returns true.
200
235
236
+ Therefore the word @{re} followed by a term of type @{RE2} will be treated
237
+ as a pattern with one argument.
201
238
239
+ Although the higher order pattern matching feature is not specific to
240
+ regular expressions .. it was in fact added to the compiler specifically
241
+ to support that use case.
202
242
243
+ @h1 Iteration
244
+ In Felix we have a concept of an iterator: it corresponds with
245
+ the C++ notation of an input iterator. The library contains
246
+ an iterator allowing regexps to find matching substrings of a string.
247
+ Unlike @{Match} our iterator scans for the first match, and returns.
248
+ If the iterator is called again, it finds the next match.
203
249
250
+ Here's how to use it:
251
+ @tangle re1.flx
252
+ var kw = kws_r.Regdef::render.RE2;
253
+ for xxv in iterator (kw, "proc blah fun proc") do
254
+ println$ xxv.0;
255
+ done
256
+ @
204
257
205
-
258
+ Here's the implementation
259
+ @felix
260
+ gen iterator (r:RE2, var target:string) () : opt[varray[string]] = {
261
+ var emptystring = "";
262
+ var l = len target;
263
+ var s = StringPiece target;
264
+ var p1 = s.data;
265
+ var p = 0;
266
+ var n = NumberOfCapturingGroups(r)+1;
267
+ var v1 = varray[StringPiece] (n.size,StringPiece emptystring);
268
+ var v2 = varray[string] (n.size,"");
269
+ again:>
270
+ var result = Match(r, s, p, UNANCHORED,v1.stl_begin, n);
271
+ if not result goto endoff;
272
+ for var i in 0 upto n - 1 do set(v2, i.size, string(v1.i)); done
273
+ var p2 = v1.0.data;
274
+ assert(v1.0.len.int > 0); // prevent infinite loop
275
+ p = (p2 - p1).int+v1.0.len.int;
276
+ yield Some v2;
277
+ goto again;
278
+ endoff:>
279
+ return None[varray[string]];
280
+ }
281
+ @
206
282
283
+ This uses a whole lot of features of the Google RE2 system
284
+ which are lifted into Felix, such as the data type @{StringPiece}
285
+ which is roughly a view of a string, and which I am not going
286
+ to explain.
287
+
288
+ Rather the take away is that we can implement high level DSSL
289
+ features based on a C/C++ library, by lifting the library API
290
+ into Felix, more or less secretly, and then use them to implement
291
+ the operations we actually want, with the syntax we actually want.
292
+
293
+ In many case the core system deliberately has features to support
294
+ this kind of modelling technology. In the code above, the key
295
+ piece of magic is the @{yield} operation, which returns a value,
296
+ but also saves the current location in the code, so that re-invoking
297
+ the generator will restart the code with the same context and at the
298
+ same point it previously left off.
299
+
300
+ <em>Yielding Generators</em> are a common feature of many
301
+ languages in including Python and Rust. They are in fact a weak
302
+ form of coroutine.
207
303
0 commit comments