-
Notifications
You must be signed in to change notification settings - Fork 320
Refactor KaTeX parsing of inline styles and vlists; normalize survey output a bit more #1722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
gnprice
wants to merge
24
commits into
zulip:main
Choose a base branch
from
gnprice:pr-katex-refactor
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This makes the output of the survey script more stable as our KaTeX parser gets refactored and otherwise edited.
… equal This helps make the output more stable from run to run, so that it's easier to spot changes (or confirm the absence of changes) when editing the code.
This was here only for a reference in a doc. In general we try to avoid imports of widgets from model code; it's an inversion of layers.
This was here only for references in docs. Better to avoid the layer-inverting import.
If this were missing, it's not clear to me that zero would be an appropriate default. In any case, in an empirical corpus, it's always present. So just require that.
…nStyles Also document the existing _parseSpanInlineStyles method, and describe our plan for eliminating most of its call sites.
This has a small effect on the survey script's list of failure reasons: in the rare case that the "unexpected shape of inline CSS" error appears, it now fires before any of the CSS properties in the same inline style are processed. That can mean fewer entries added to unsupportedInlineCssProperties. This difference is only possible, though, on a KaTeX expression that is going to reach the same hard failure either way. So it has no effect on behavior seen by a user.
Also make the error message for this case a bit more specific.
Until the previous commit, this bit of code was handling both the case where the value was unexpected (for which this message was accurate) and the case where the property itself was unexpected (for which it wasn't). Now the first case is handled elsewhere, so fix the remaining case.
…erties We know at this spot that there are just two specific CSS properties we expect to see, and we'll end up handling them directly rather than through a KatexSpanStyles object. So parse them directly, rather than build a whole KatexSpanStyles object (and then another one with `.filter()`).
All the remaining call sites of _parseSpanInlineStyles would throw anyway if this property were actually found. We only expect it in a specific context, namely a strut.
…anStyles This lets us skip allocating these objects (two in each case -- the second one comes from `.filter()`). We also get to skip parsing the value of `height`, since we don't intend to use it.
There's only a handful of specific properties we expect to see on this type of span; so handle those explicitly. In fact, making this list explicit brings to light that there's one property here which doesn't actually appear on KaTeX's vlist children: height. We'll cut that in a separate non-NFC commit. This will also open up ways to simplify the interaction between this and the margin-handling logic below.
These spans are highly structured; the only properties that go into their inline styles are top, margin-left, and margin-right.
…panNode This just pulls these three pieces of closely-related logic next to each other. That will make it easier to refactor them further. This causes one change in the survey script's list of failure reasons: when the `delimcenter` class occurs with an inline `top` property, we now record the unsupported class before reaching the hard fail for the unsupported property. This has no user-visible effect, though, because it can only happen when the expression is going to reach that hard failure either way.
And inline the effect of the `merge` method, eliminating that method too. This way we get to construct just one KatexSpanStyles object, rather than constructing three of them when inline styles are present.
This map pattern syntax looks an awful lot like it's saying that no other keys should be present -- after all, that's what the corresponding syntax in a list pattern would mean. So I initially read this to mean that this code would ignore the inline styles if any other attribute was present on the element; which wouldn't be desirable logic. In fact it's just saying that this one key should be present and match the given pattern. But there are simpler ways to say that; so use one.
Each of these swathes of logic has no interaction with the others. Splitting them into their own methods makes that structure easy for the reader to see.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Now that the KaTeX support we've had in recent releases is merged into main (with #1559) (*), I've gone and made a number of the refactors that I'd been contemplating for this code but hadn't wanted to do before those were merged. In this branch:
tools/content/check-features katex-check
more stable and easier to compare across changes to the implementation. This helped me validate that on an empirical corpus, none of the above changes caused any KaTeX expressions to stop being successfully parsed.In particular I ran
tools/content/check-features katex-check
near the start of the branch, after the first three commits:8c4106d katex [nfc]: Show line numbers only on unknown hard-fails, not others
d269628 katex [nfc]: Add messages for remaining hard-fail cases seen in corpus
e74557b tools/content [nfc]: Tie-break on reason text when number of failures equal
and again at the end of the branch. I then compared the output with commands like:
There were no changes in the total number of messages or KaTeX expressions where the parser failed. For a small number of expressions, there were changes in the set of soft failures reported before reaching a hard failure; the reasons for those are in a couple of individual commit messages. The full diff was (with a denominator of "28109 of them were KaTeX containing messages and 3370 of those failed"):
(*) One change is still outstanding, in #1720. But it won't much interact with the main logic.
Selected commit messages
8c4106d katex [nfc]: Show line numbers only on unknown hard-fails, not others
This makes the output of the survey script more stable as our KaTeX
parser gets refactored and otherwise edited.
d269628 katex [nfc]: Add messages for remaining hard-fail cases seen in corpus
e74557b tools/content [nfc]: Tie-break on reason text when number of failures equal
This helps make the output more stable from run to run, so that
it's easier to spot changes (or confirm the absence of changes)
when editing the code.
16deb86 katex [nfc]: Cut stray import of widgets library
This was here only for a reference in a doc. In general we try to
avoid imports of widgets from model code; it's an inversion of layers.
46c74e7 katex [nfc]: Fix a variable name to specify its units, namely em
e223229 katex: Require height on pstrut spans
If this were missing, it's not clear to me that zero would be an
appropriate default.
In any case, in an empirical corpus, it's always present.
So just require that.
b62979c katex [nfc]: Separate _parseInlineStyles from constructing a KatexSpanStyles
Also document the existing _parseSpanInlineStyles method, and
describe our plan for eliminating most of its call sites.
a88de14 katex [nfc]: Push more parsing into _parseInlineStyles; return Map
This has a small effect on the survey script's list of failure
reasons: in the rare case that the "unexpected shape of inline CSS"
error appears, it now fires before any of the CSS properties in the
same inline style are processed. That can mean fewer entries added
to unsupportedInlineCssProperties.
This difference is only possible, though, on a KaTeX expression that
is going to reach the same hard failure either way. So it has no
effect on behavior seen by a user.
ade7f66 katex: Fix a misleading log line: unexpected CSS property, not value
Until the previous commit, this bit of code was handling both the case
where the value was unexpected (for which this message was accurate)
and the case where the property itself was unexpected (for which it
wasn't). Now the first case is handled elsewhere, so fix the
remaining case.
9c32bd6 katex [nfc]: Skip building whole KatexSpanStyles for struts' two properties
We know at this spot that there are just two specific CSS properties
we expect to see, and we'll end up handling them directly rather than
through a KatexSpanStyles object. So parse them directly, rather
than build a whole KatexSpanStyles object (and then another one
with
.filter()
).6c2e55c katex [nfc]: Cut vertical-align from generic style properties
All the remaining call sites of _parseSpanInlineStyles would throw
anyway if this property were actually found. We only expect it
in a specific context, namely a strut.
f12c749 katex [nfc]: Check "only has height" directly, without making KatexSpanStyles
This lets us skip allocating these objects (two in each case -- the
second one comes from
.filter()
). We also get to skip parsing thevalue of
height
, since we don't intend to use it.e1a2c9c katex [nfc]: Get pstrut height directly, without making KatexSpanStyles
7443e1a katex [nfc]: Directly handle expected inline styles on vlist child
There's only a handful of specific properties we expect to see on this
type of span; so handle those explicitly.
In fact, making this list explicit brings to light that there's one
property here which doesn't actually appear on KaTeX's vlist children:
height. We'll cut that in a separate non-NFC commit.
This will also open up ways to simplify the interaction between this
and the margin-handling logic below.
8e059b3 katex: Don't expect height on vlist child spans
These spans are highly structured; the only properties that go
into their inline styles are top, margin-left, and margin-right.
e61c6c7 katex [nfc]: Consolidate logic for computing overall styles of KatexSpanNode
This just pulls these three pieces of closely-related logic next to
each other. That will make it easier to refactor them further.
This causes one change in the survey script's list of failure reasons:
when the
delimcenter
class occurs with an inlinetop
property, wenow record the unsupported class before reaching the hard fail for the
unsupported property. This has no user-visible effect, though,
because it can only happen when the expression is going to reach that
hard failure either way.
77b1c2f katex [nfc]: Inline remaining/main use of _parseSpanInlineStyles
And inline the effect of the
merge
method, eliminating thatmethod too.
This way we get to construct just one KatexSpanStyles object, rather
than constructing three of them when inline styles are present.
636ac6a katex [nfc]: Construct vlist child's styles directly, without filter
c744e45 katex [nfc]: Note that heightEm might turn out not to be needed
26ac991 katex [nfc]: Split out _parseStrut, _parseVlist, _parseGenericSpan
Each of these swathes of logic has no interaction with the others.
Splitting them into their own methods makes that structure easy for
the reader to see.