You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Introduces parse_new, which takes advantage of a generalized sequence
parser and slice patterns to (hopefully) make the mapping between a byte
pattern and the corresponding terminal behavior more explicit.
This initial implementation is running in "dark launch" mode: we run
both parse_new and parse_classic (always taking the latter's output as
canonical), and print a warning when the two don't match.
This has resulted in a number of "bug-compatible" changes to the parser
to match the old behavior precisely, both in an effort to ensure that
we're faithfully reproducing the semantics of the old parser and to give
us the opportunity to directly compare the relative merits of the old &
new approaches.
In terms of completeness, this current implementation has a few
limitations:
* Most notably, we don't handle the "redraw all" request given
(currently) by the sequence "\u{1b}[VxD": per the standard, that
parses as `[CSI, .., "V"]` followed by the unrelated bytes `xD`. We
can either extend the parser to recognize this sequence, or, as I
would prefer, change the sequence to "fit" within the standard.
* The sequence enumeration and handling feels pretty good to me, both in
terms of how the existing sequences are handled and ease of adding new
ones (including, as `set_text_mode` demonstrates, the flexibility to
integrate external combinatorial parsers if necessary), especially
with respect to the parameter parsing. However, the parser itself is a
mess: the standard proved less helpful in recognizing the set of
sequences we've encountered in the wild" than I'd hoped, so I think we
could definitely do better if we revisit it with fresh eyes.
* Not all of the error cases are exactly the same when given "weird"
sequences like "\u{1b}[?m"; they both produce an Err::Failure, but
marking slightly different portions of the input. I believe this to be
roughly acceptable (we'd still make progress towards parsing the entire
input, just while printing out slightly different results for the
unrecognized sequences).
* I based the current general parser on the ECMA standard rather than
ANSI's, despite being in `ansi.rs`. So that's potential for some
comedy maybe.
And some work that remains entirely untouched is:
* Integrating a utf-8 parser so we can correctly handle multi-byte
sequences.
* Collapsing the Text/Op hierarchy so the parser can more
directly split incoming inputs rather than "leaking" those details
into parse_str_tail.
* Critically evaluating the efficiency and throughpout of the parser,
especially with an eye towards reducing the number of times we scan
over the whole input.
* Replacing the allocating branches (TextOp, DecPrivate*) with
iterators.
0 commit comments