You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The basic idea of a `Matcher` is to expose a`Iterator`-like interface for
132
-
iterating through all matches of a pattern in the given haystack.
140
+
The basic idea of a `Searcher` is to expose a interface for
141
+
iterating through all connected string fragments of the haystack while classifing them as either a match, or a reject.
133
142
134
-
Similar to iterators, depending on the concrete implementation a matcher can have
143
+
This happens in form of the returned enum value. A `Match` needs to contain the start and end indices of a complete non-overlapping match, while a `Rejects` may be emitted for arbitary non-overlapping rejected parts of the string, as long as the start and end indices lie on valid utf8 boundaries.
144
+
145
+
Similar to iterators, depending on the concrete implementation a searcher can have
135
146
additional capabilities that build on each other, which is why they will be
136
147
defined in terms of a three-tier hierarchy:
137
148
138
-
-`Matcher<'a>` is the basic trait that all matchers need to implement.
139
-
It contains a `next_match()` method that returns the `start` and `end` indices of
140
-
the next non-overlapping match in the haystack, with the search beginning at the front
149
+
-`Searcher<'a>` is the basic trait that all searchers need to implement.
150
+
It contains a `next()` method that returns the `start` and `end` indices of
151
+
the next match or reject in the haystack, with the search beginning at the front
141
152
(left) of the string. It also contains a `haystack()` getter for returning the
142
153
actual haystack, which is the source of the `'a` lifetime on the hierarchy.
143
154
The reason for this getter being made part of the trait is twofold:
144
-
- Every matcher needs to store some reference to the haystack anyway.
155
+
- Every searcher needs to store some reference to the haystack anyway.
145
156
- Users of this trait will need access to the haystack in order
146
157
for the individual match results to be useful.
147
-
-`ReverseMatcher<'a>` adds an `next_match_back` method, for also allowing to efficiently
148
-
search for matches in reverse (starting from the right).
158
+
-`ReverseSearcher<'a>` adds an `next_back()` method, for also allowing to efficiently
159
+
search in reverse (starting from the right).
149
160
However, the results are not required to be equal to the results of
150
-
`next_match` in reverse, (as would be the case for the `DoubleEndedIterator` trait)
151
-
as that can not be efficiently guaranteed for all matchers. (For an example, see further below)
152
-
- Instead `DoubleEndedMatcher<'a>` is provided as an marker trait for expressing
153
-
that guarantee - If a matcher implements this trait, all results found from the
161
+
`next()` in reverse, (as would be the case for the `DoubleEndedIterator` trait)
162
+
because that can not be efficiently guaranteed for all searchers. (For an example, see further below)
163
+
- Instead `DoubleEndedSearcher<'a>` is provided as an marker trait for expressing
164
+
that guarantee - If a searcher implements this trait, all results found from the
154
165
left need to be equal to all results found from the right in reverse order.
155
166
156
167
As an important last detail, both
157
-
`Matcher` and `ReverseMatcher` are marked as `unsafe` traits, even though the actual methods
168
+
`Searcher` and `ReverseSearcher` are marked as `unsafe` traits, even though the actual methods
158
169
aren't. This is because every implementation of these traits need to ensure that all
159
-
indices returned by `next_match` and `next_match_back` lay on valid utf8 boundaries
170
+
indices returned by `next()` and `next_back()` lie on valid utf8 boundaries
160
171
in the haystack.
161
172
162
173
Without that guarantee, every single match returned by a matcher would need to be
@@ -171,6 +182,15 @@ Given that most implementations of these traits will likely
171
182
live in the std library anyway, and are thoroughly tested, marking these traits `unsafe`
172
183
doesn't seem like a huge burden to bear for good, optimizable performance.
173
184
185
+
### The role of the additional default methods
186
+
187
+
`Pattern`, `Searcher` and `ReverseSearcher` each offer a few additional
188
+
default methods that give better optimization opportunities.
189
+
190
+
Most consumers of the pattern API will use them to more narrowly constraint
191
+
how they are looking for a pattern, which given an optimized implementantion,
192
+
should lead to mostly optimal code being generated.
193
+
174
194
### Example for the issue with double-ended searching
175
195
176
196
Let the haystack be the string `"fooaaaaabar"`, and let the pattern be the string `"aa"`.
@@ -190,10 +210,11 @@ be considered a different operation than "matching from the back".
190
210
191
211
### Why `(uint, uint)` instead of `&str`
192
212
193
-
It would be possible to define `next_match` and `next_match_back` to return an `&str`
194
-
to the match instead of `(uint, uint)`.
213
+
> Note: This section is a bit outdated now
195
214
196
-
A concrete matcher impl could then make use of unsafe code to construct such an slice cheaply,
215
+
It would be possible to define `next` and `next_back` to return `&str`s instead of `(uint, uint)` tuples.
216
+
217
+
A concrete searcher impl could then make use of unsafe code to construct such an slice cheaply,
197
218
and by its very nature it is guaranteed to lie on utf8 boundaries,
198
219
which would also allow not marking the traits as unsafe.
199
220
@@ -224,7 +245,7 @@ as the "simple" default design.
224
245
225
246
## New methods on `StrExt`
226
247
227
-
With the `Pattern` and `Matcher` traits defined and implemented, the actual `str`
248
+
With the `Pattern` and `Searcher` traits defined and implemented, the actual `str`
228
249
methods will be changed to make use of them:
229
250
230
251
```rust
@@ -245,17 +266,17 @@ pub trait StrExt for ?Sized {
0 commit comments