Refactor KaTeX parsing of inline styles and vlists; normalize survey output a bit more #1722

gnprice · 2025-07-19T05:58:36Z

Now that the KaTeX support we've had in recent releases is merged into main (with #1559) (*), I've gone and made a number of the refactors that I'd been contemplating for this code but hadn't wanted to do before those were merged. In this branch:

Parse each element's inline styles in a more direct way, letting the parsing logic for different types of elements apply their respective context of what styles are expected. This also means less building and copying of generic KatexSpanStyles objects.
Simplify some of the vlist parsing code, notably around margins.
Split the parsing of struts and vlists into their own respective methods.
At the start of the branch, a few adjustments to make the output of tools/content/check-features katex-check more stable and easier to compare across changes to the implementation. This helped me validate that on an empirical corpus, none of the above changes caused any KaTeX expressions to stop being successfully parsed.

In particular I ran tools/content/check-features katex-check near the start of the branch, after the first three commits:
8c4106d katex [nfc]: Show line numbers only on unknown hard-fails, not others
d269628 katex [nfc]: Add messages for remaining hard-fail cases seen in corpus
e74557b tools/content [nfc]: Tie-break on reason text when number of failures equal

and again at the end of the branch. I then compared the output with commands like:

$ diff -U2 <(grep -P 'failed|Because' tmp/tex.old) \
           <(grep -P 'failed|Because' tmp/tex.new)

There were no changes in the total number of messages or KaTeX expressions where the parser failed. For a small number of expressions, there were changes in the set of soft failures reported before reaching a hard failure; the reasons for those are in a couple of individual commit messages. The full diff was (with a denominator of "28109 of them were KaTeX containing messages and 3370 of those failed"):

@@ -38,4 +38,6 @@
   Because of unsupported css class: frac-line:
     335 messages failed.
+  Because of unsupported css class: delimcenter:
+    184 messages failed.
   Because of unsupported css class: mtable:
     177 messages failed.
@@ -90,6 +92,4 @@
   Because of unsupported css class: cd-arrow-pad:
     8 messages failed.
-  Because of unsupported inline css property: border-right-width:
-    5 messages failed.
   Because of unsupported css class: boxpad:
     4 messages failed.
@@ -102,4 +102,6 @@
   Because of unsupported css class: rule:
     3 messages failed.
+  Because of unsupported inline css property: border-right-width:
+    3 messages failed.
   Because of unsupported inline css property: border-top-width:
     3 messages failed.
@@ -120,5 +122,5 @@
   Because of unsupported css class: textup:
     2 messages failed.
-  Because of unsupported inline css property: border-right-style:
+  Because of unsupported css class: vertical-separator:
     2 messages failed.
   Because of unsupported css class: mover:

(*) One change is still outstanding, in #1720. But it won't much interact with the main logic.

Selected commit messages

`8c4106d` katex [nfc]: Show line numbers only on unknown hard-fails, not others

This makes the output of the survey script more stable as our KaTeX
parser gets refactored and otherwise edited.

`d269628` katex [nfc]: Add messages for remaining hard-fail cases seen in corpus

`e74557b` tools/content [nfc]: Tie-break on reason text when number of failures equal

This helps make the output more stable from run to run, so that
it's easier to spot changes (or confirm the absence of changes)
when editing the code.

`16deb86` katex [nfc]: Cut stray import of widgets library

This was here only for a reference in a doc. In general we try to
avoid imports of widgets from model code; it's an inversion of layers.

`46c74e7` katex [nfc]: Fix a variable name to specify its units, namely em

`e223229` katex: Require height on pstrut spans

If this were missing, it's not clear to me that zero would be an
appropriate default.

In any case, in an empirical corpus, it's always present.
So just require that.

`b62979c` katex [nfc]: Separate _parseInlineStyles from constructing a KatexSpanStyles

Also document the existing _parseSpanInlineStyles method, and
describe our plan for eliminating most of its call sites.

`a88de14` katex [nfc]: Push more parsing into _parseInlineStyles; return Map

This has a small effect on the survey script's list of failure
reasons: in the rare case that the "unexpected shape of inline CSS"
error appears, it now fires before any of the CSS properties in the
same inline style are processed. That can mean fewer entries added
to unsupportedInlineCssProperties.

This difference is only possible, though, on a KaTeX expression that
is going to reach the same hard failure either way. So it has no
effect on behavior seen by a user.

`ade7f66` katex: Fix a misleading log line: unexpected CSS property, not value

Until the previous commit, this bit of code was handling both the case
where the value was unexpected (for which this message was accurate)
and the case where the property itself was unexpected (for which it
wasn't). Now the first case is handled elsewhere, so fix the
remaining case.

`9c32bd6` katex [nfc]: Skip building whole KatexSpanStyles for struts' two properties

We know at this spot that there are just two specific CSS properties
we expect to see, and we'll end up handling them directly rather than
through a KatexSpanStyles object. So parse them directly, rather
than build a whole KatexSpanStyles object (and then another one
with .filter()).

`6c2e55c` katex [nfc]: Cut vertical-align from generic style properties

All the remaining call sites of _parseSpanInlineStyles would throw
anyway if this property were actually found. We only expect it
in a specific context, namely a strut.

`f12c749` katex [nfc]: Check "only has height" directly, without making KatexSpanStyles

This lets us skip allocating these objects (two in each case -- the
second one comes from .filter()). We also get to skip parsing the
value of height, since we don't intend to use it.

`e1a2c9c` katex [nfc]: Get pstrut height directly, without making KatexSpanStyles

`7443e1a` katex [nfc]: Directly handle expected inline styles on vlist child

There's only a handful of specific properties we expect to see on this
type of span; so handle those explicitly.

In fact, making this list explicit brings to light that there's one
property here which doesn't actually appear on KaTeX's vlist children:
height. We'll cut that in a separate non-NFC commit.

This will also open up ways to simplify the interaction between this
and the margin-handling logic below.

`8e059b3` katex: Don't expect height on vlist child spans

These spans are highly structured; the only properties that go
into their inline styles are top, margin-left, and margin-right.

`e61c6c7` katex [nfc]: Consolidate logic for computing overall styles of KatexSpanNode

This just pulls these three pieces of closely-related logic next to
each other. That will make it easier to refactor them further.

This causes one change in the survey script's list of failure reasons:
when the delimcenter class occurs with an inline top property, we
now record the unsupported class before reaching the hard fail for the
unsupported property. This has no user-visible effect, though,
because it can only happen when the expression is going to reach that
hard failure either way.

`77b1c2f` katex [nfc]: Inline remaining/main use of _parseSpanInlineStyles

And inline the effect of the merge method, eliminating that
method too.

This way we get to construct just one KatexSpanStyles object, rather
than constructing three of them when inline styles are present.

`636ac6a` katex [nfc]: Construct vlist child's styles directly, without filter

`c744e45` katex [nfc]: Note that heightEm might turn out not to be needed

`26ac991` katex [nfc]: Split out _parseStrut, _parseVlist, _parseGenericSpan

Each of these swathes of logic has no interaction with the others.
Splitting them into their own methods makes that structure easy for
the reader to see.

This makes the output of the survey script more stable as our KaTeX parser gets refactored and otherwise edited.

… equal This helps make the output more stable from run to run, so that it's easier to spot changes (or confirm the absence of changes) when editing the code.

This was here only for a reference in a doc. In general we try to avoid imports of widgets from model code; it's an inversion of layers.

This was here only for references in docs. Better to avoid the layer-inverting import.

If this were missing, it's not clear to me that zero would be an appropriate default. In any case, in an empirical corpus, it's always present. So just require that.

…nStyles Also document the existing _parseSpanInlineStyles method, and describe our plan for eliminating most of its call sites.

This has a small effect on the survey script's list of failure reasons: in the rare case that the "unexpected shape of inline CSS" error appears, it now fires before any of the CSS properties in the same inline style are processed. That can mean fewer entries added to unsupportedInlineCssProperties. This difference is only possible, though, on a KaTeX expression that is going to reach the same hard failure either way. So it has no effect on behavior seen by a user.

Also make the error message for this case a bit more specific.

Until the previous commit, this bit of code was handling both the case where the value was unexpected (for which this message was accurate) and the case where the property itself was unexpected (for which it wasn't). Now the first case is handled elsewhere, so fix the remaining case.

…erties We know at this spot that there are just two specific CSS properties we expect to see, and we'll end up handling them directly rather than through a KatexSpanStyles object. So parse them directly, rather than build a whole KatexSpanStyles object (and then another one with `.filter()`).

All the remaining call sites of _parseSpanInlineStyles would throw anyway if this property were actually found. We only expect it in a specific context, namely a strut.

…anStyles This lets us skip allocating these objects (two in each case -- the second one comes from `.filter()`). We also get to skip parsing the value of `height`, since we don't intend to use it.

There's only a handful of specific properties we expect to see on this type of span; so handle those explicitly. In fact, making this list explicit brings to light that there's one property here which doesn't actually appear on KaTeX's vlist children: height. We'll cut that in a separate non-NFC commit. This will also open up ways to simplify the interaction between this and the margin-handling logic below.

These spans are highly structured; the only properties that go into their inline styles are top, margin-left, and margin-right.

…panNode This just pulls these three pieces of closely-related logic next to each other. That will make it easier to refactor them further. This causes one change in the survey script's list of failure reasons: when the `delimcenter` class occurs with an inline `top` property, we now record the unsupported class before reaching the hard fail for the unsupported property. This has no user-visible effect, though, because it can only happen when the expression is going to reach that hard failure either way.

And inline the effect of the `merge` method, eliminating that method too. This way we get to construct just one KatexSpanStyles object, rather than constructing three of them when inline styles are present.

This map pattern syntax looks an awful lot like it's saying that no other keys should be present -- after all, that's what the corresponding syntax in a list pattern would mean. So I initially read this to mean that this code would ignore the inline styles if any other attribute was present on the element; which wouldn't be desirable logic. In fact it's just saying that this one key should be present and match the given pattern. But there are simpler ways to say that; so use one.

Each of these swathes of logic has no interaction with the others. Splitting them into their own methods makes that structure easy for the reader to see.

gnprice added 24 commits July 18, 2025 20:54

katex [nfc]: Show line numbers only on unknown hard-fails, not others

8c4106d

This makes the output of the survey script more stable as our KaTeX parser gets refactored and otherwise edited.

katex [nfc]: Add messages for remaining hard-fail cases seen in corpus

d269628

tools/content [nfc]: Tie-break on reason text when number of failures…

e74557b

… equal This helps make the output more stable from run to run, so that it's easier to spot changes (or confirm the absence of changes) when editing the code.

katex [nfc]: Cut stray import of widgets library

16deb86

This was here only for a reference in a doc. In general we try to avoid imports of widgets from model code; it's an inversion of layers.

binding [nfc]: Cut stray import of a widgets library

3a9ed06

This was here only for references in docs. Better to avoid the layer-inverting import.

katex [nfc]: Fix a variable name to specify its units, namely em

46c74e7

katex: Require height on pstrut spans

e223229

If this were missing, it's not clear to me that zero would be an appropriate default. In any case, in an empirical corpus, it's always present. So just require that.

katex [nfc]: Separate _parseInlineStyles from constructing a KatexSpa…

b62979c

…nStyles Also document the existing _parseSpanInlineStyles method, and describe our plan for eliminating most of its call sites.

katex [nfc]: Factor out _takeStyleEm from _parseSpanInlineStyles

57b980e

Also make the error message for this case a bit more specific.

katex [nfc]: Cut vertical-align from generic style properties

6c2e55c

All the remaining call sites of _parseSpanInlineStyles would throw anyway if this property were actually found. We only expect it in a specific context, namely a strut.

katex [nfc]: Check "only has height" directly, without making KatexSp…

f12c749

…anStyles This lets us skip allocating these objects (two in each case -- the second one comes from `.filter()`). We also get to skip parsing the value of `height`, since we don't intend to use it.

katex [nfc]: Get pstrut height directly, without making KatexSpanStyles

e1a2c9c

katex: Don't expect height on vlist child spans

8e059b3

These spans are highly structured; the only properties that go into their inline styles are top, margin-left, and margin-right.

katex [nfc]: Inline remaining/main use of _parseSpanInlineStyles

77b1c2f

And inline the effect of the `merge` method, eliminating that method too. This way we get to construct just one KatexSpanStyles object, rather than constructing three of them when inline styles are present.

katex [nfc]: Construct vlist child's styles directly, without filter

636ac6a

katex [nfc]: Dedupe logic for vlist child between margin/no-margin cases

465dc25

katex [nfc]: Note that heightEm might turn out not to be needed

c744e45

katex [nfc]: Split out _parseStrut, _parseVlist, _parseGenericSpan

26ac991

Each of these swathes of logic has no interaction with the others. Splitting them into their own methods makes that structure easy for the reader to see.

gnprice assigned rajveermalviya Jul 19, 2025

gnprice added the maintainer review PR ready for review by Zulip maintainers label Jul 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor KaTeX parsing of inline styles and vlists; normalize survey output a bit more #1722

Refactor KaTeX parsing of inline styles and vlists; normalize survey output a bit more #1722

Uh oh!

gnprice commented Jul 19, 2025

Uh oh!

Uh oh!

Refactor KaTeX parsing of inline styles and vlists; normalize survey output a bit more #1722

Are you sure you want to change the base?

Refactor KaTeX parsing of inline styles and vlists; normalize survey output a bit more #1722

Uh oh!

Conversation

gnprice commented Jul 19, 2025

Selected commit messages

8c4106d katex [nfc]: Show line numbers only on unknown hard-fails, not others

d269628 katex [nfc]: Add messages for remaining hard-fail cases seen in corpus

e74557b tools/content [nfc]: Tie-break on reason text when number of failures equal

16deb86 katex [nfc]: Cut stray import of widgets library

46c74e7 katex [nfc]: Fix a variable name to specify its units, namely em

e223229 katex: Require height on pstrut spans

b62979c katex [nfc]: Separate _parseInlineStyles from constructing a KatexSpanStyles

a88de14 katex [nfc]: Push more parsing into _parseInlineStyles; return Map

ade7f66 katex: Fix a misleading log line: unexpected CSS property, not value

9c32bd6 katex [nfc]: Skip building whole KatexSpanStyles for struts' two properties

6c2e55c katex [nfc]: Cut vertical-align from generic style properties

f12c749 katex [nfc]: Check "only has height" directly, without making KatexSpanStyles

e1a2c9c katex [nfc]: Get pstrut height directly, without making KatexSpanStyles

7443e1a katex [nfc]: Directly handle expected inline styles on vlist child

8e059b3 katex: Don't expect height on vlist child spans

e61c6c7 katex [nfc]: Consolidate logic for computing overall styles of KatexSpanNode

77b1c2f katex [nfc]: Inline remaining/main use of _parseSpanInlineStyles

636ac6a katex [nfc]: Construct vlist child's styles directly, without filter

c744e45 katex [nfc]: Note that heightEm might turn out not to be needed

26ac991 katex [nfc]: Split out _parseStrut, _parseVlist, _parseGenericSpan

Uh oh!

Uh oh!

`8c4106d` katex [nfc]: Show line numbers only on unknown hard-fails, not others

`d269628` katex [nfc]: Add messages for remaining hard-fail cases seen in corpus

`e74557b` tools/content [nfc]: Tie-break on reason text when number of failures equal

`16deb86` katex [nfc]: Cut stray import of widgets library

`46c74e7` katex [nfc]: Fix a variable name to specify its units, namely em

`e223229` katex: Require height on pstrut spans

`b62979c` katex [nfc]: Separate _parseInlineStyles from constructing a KatexSpanStyles

`a88de14` katex [nfc]: Push more parsing into _parseInlineStyles; return Map

`ade7f66` katex: Fix a misleading log line: unexpected CSS property, not value

`9c32bd6` katex [nfc]: Skip building whole KatexSpanStyles for struts' two properties

`6c2e55c` katex [nfc]: Cut vertical-align from generic style properties

`f12c749` katex [nfc]: Check "only has height" directly, without making KatexSpanStyles

`e1a2c9c` katex [nfc]: Get pstrut height directly, without making KatexSpanStyles

`7443e1a` katex [nfc]: Directly handle expected inline styles on vlist child

`8e059b3` katex: Don't expect height on vlist child spans

`e61c6c7` katex [nfc]: Consolidate logic for computing overall styles of KatexSpanNode

`77b1c2f` katex [nfc]: Inline remaining/main use of _parseSpanInlineStyles

`636ac6a` katex [nfc]: Construct vlist child's styles directly, without filter

`c744e45` katex [nfc]: Note that heightEm might turn out not to be needed

`26ac991` katex [nfc]: Split out _parseStrut, _parseVlist, _parseGenericSpan