Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KaTeX (1/n): Initial support for displaying basic KaTeX content #1408

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

rajveermalviya
Copy link
Member

@rajveermalviya rajveermalviya commented Mar 13, 2025

This initial implementation include:

  • Displaying the characters in their corresponding
    custom text styles (fonts, font weight, font style).

  • Character and symbol sizing.

  • And some subset of inline styles.

This results in support for displaying some simple KaTeX functions.

Related: #46

@rajveermalviya rajveermalviya force-pushed the pr-tex-content-1 branch 2 times, most recently from 476f6aa to 5164619 Compare March 13, 2025 20:43
@rajveermalviya rajveermalviya force-pushed the pr-tex-content-1 branch 4 times, most recently from c2a9c4a to babe5c8 Compare April 1, 2025 16:19
@rajveermalviya rajveermalviya changed the title content: Add initial support for displaying some KaTeX content KaTeX (1/n): Initial support for displaying basic KaTeX content Apr 1, 2025
@PIG208 PIG208 requested review from gnprice and PIG208 April 1, 2025 22:17
@PIG208 PIG208 self-assigned this Apr 1, 2025
@PIG208 PIG208 added the maintainer review PR ready for review by Zulip maintainers label Apr 1, 2025
Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting!

Generally this structure looks good. Here's a high-level review — so just a handful of comments mostly about things that I think will help make the changes clearer to understand.

Then once these aspects look good (should be quick, I think), we'll do maintainer review as usual.

Comment on lines 82 to 85
check(globalSettings).getBool(BoolGlobalSetting.renderKatex)
.isFalse();
assert(!BoolGlobalSetting.placeholderIgnore.default_);
assert(!BoolGlobalSetting.renderKatex.default_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add these feature flags to the tests (or to remove them from the tests when later taking them out) — adding them can be purely a matter of the one line adding the enum value, plus its dartdoc (and blank line above that).

That also means that adding the feature flag can be squashed into the first commit that uses it.

final spanClass = spanClasses[index];
switch (spanClass) {
case 'textbf':
// .textbf { font-weight: bold; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are quoting from KaTeX's CSS (or rather the source file for its CSS), and that's an important reference for understanding this code. So let's include a link to that in a comment.

Comment on lines 376 to 377
final String? text;
final List<KatexSpanNode> nodes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these two are mutually exclusive: there's always at most one present, and in fact exactly one.

Let's make that invariant explicit in this class, by at least an assertion at the constructor.

It'd be good to also add some dartdoc on these two fields to say how they relate to each other and to the parent node.


final String texSource;
final List<KatexSpanNode>? nodes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the next PR #1452 changes this to be a list of KatexNode, a new common base class for KatexSpanNode and some others.

It's probably cleanest if this says KatexNode from the start, then. That makes the conceptual relationship to this parent node (MathBlockNode) somewhat clearer — it's not that a math block necessarily contains only "KaTeX span nodes" as direct children, it can contain "KaTeX nodes" in general, and it's just that at this stage the spans are the only kind of KaTeX node that are yet implemented.

Comment on lines +276 to +345
default:
// TODO handle more CSS properties
assert(debugLog('Unsupported CSS property: $property of type ${expression.runtimeType}'));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this code has three levels of how well it supports a given KaTeX blob:

  • The HTML doesn't follow any structure we've yet anticipated. The parser in this file throws KatexHtmlParseError, producing nodes of null.
  • The HTML contains something we know we don't yet support, but there's no exception; nodes ends up with some kind of parse result.
  • The HTML consists only of constructs we believe are fully supported — meaning we believe the current version of the code will produce widgets that make the intended math show up exactly the same as with KaTeX on the web.

This default case is an example of that middle state.

The middle state is very useful for development: you want to let the parser do its thing as best it currently can, and see how the resulting widgets look, while you work on supporting that case.

But I'd also really like to be able to run the parser in a mode where it will only accept constructs we believe are fully supported. That way we can run it on a corpus of public messages collected by the scripts in tools/content/, and get a survey of what remains to be implemented. That mode could also be useful for turning KaTeX support on by default, but only for expressions we believe will show up exactly right, and for other expressions falling back to the raw TeX like we do in main today.

I think the most effective way to draw that boundary will be to do so starting in this first PR, so that when we include a given construct on the "fully supported" side of it we do so at the same time as we're implementing support for that construct and thinking about it in detail.

Maybe have two experimental flags?

  • One flag enables showing KaTeX at all. Things that are fully supported get rendered; things that are incompletely supported, just like those not supported, get the fallback.
  • A second flag enables showing KaTeX even where it's incomplete — so the behavior this revision has when renderKatex is true. (When the first flag is false, this one would just do nothing.)

Then when support is mostly complete, we might turn the first flag into always-true, while keeping the second flag as experimental (and default-false).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, added another flag (forceRenderKatex) to control if the KaTeX content should be rendered even if there were some errors encountered while parsing the HTML. Currently, the "ignorable errors" happen either when the parser encounters an unknown CSS class or an unknown CSS property in the inline styles.

It will be used in later commits to parse inline CSS styles
in HTML while parsing KaTeX HTML spans.
These fonts will be used in later commits to show KaTeX
content.
This later avoids a collision for the `TextDirection` type,
which is also defined in `dart:ui`.
…tyles

With this, if the new experimental flag is enabled, the result will be
really basic rendering of each text character in KaTeX spans.
This adds another experimental flag called `forceRenderKatex`
which, if enabled, ignores any errors generated by the parser
(like when encountering an unsupported CSS class) tries to do
a "broken" render of the available span and their styles.
Allowing the developer to test the different KaTeX content in
the wild easily, while still in development.
@rajveermalviya
Copy link
Member Author

Thanks for the review @gnprice. Pushed an update, PTAL.

@rajveermalviya rajveermalviya requested a review from gnprice April 2, 2025 21:43
Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revision!

This resolves the high-level feedback I had — @PIG208, over to you for maintainer review.

Comment on lines +165 to +170
}

// Work around the duplicated case statement with a new switch block,
// to preserve the same order and to keep the cases mirroring the CSS
// definitions in katex.scss .
switch (spanClass) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I had wondered about these two separate switch statements 🙂 — this comment is helpful.

Copy link
Member

@PIG208 PIG208 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I just went through the entire PR and played with this a bit on my device. It looks pretty good!

With regards to testing, I'm not sure what the current plan is. It doesn't seem quite useful to have tests structured basically a duplicate of all the KaTeX classes we support. Maybe it would be better to test with examples crawled online?

@@ -0,0 +1,21 @@
The MIT License (MIT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fonts: Add KaTeX custom fonts
    
These fonts will be used in later commits to show KaTeX
content.

For future reference, perhaps we can also mention where we found these fonts in the commit message with a link.

testWidgets('displays TeX source; experimental flag default', (tester) async {
final globalSettings = testBinding.globalStore.settings;
await globalSettings.setBool(BoolGlobalSetting.renderKatex, null);
check(globalSettings.getBool(BoolGlobalSetting.renderKatex)).isFalse();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can use check(globalSettings).getBool here

@@ -864,7 +906,10 @@ class GlobalTimeNode extends InlineContentNode {

////////////////////////////////////////////////////////////////

String? _parseMath(dom.Element element, {required bool block}) {
({List<KatexNode>? spans, bool debugHasError, String texSource})? _parseMath(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be cleaner to make a class for the return value of this, and make factories for classes are constructed from the result?

result.add(MathBlockNode(
texSource: texSource,
texSource: parsed.texSource,
nodes: parsed.spans,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentionally that we leave out debugHasError for this?

//
// Each case in the switch blocks below is a separate CSS class definition
// in the same order as in katex.scss :
// https://github.com/KaTeX/KaTeX/blob/2fe1941b7e6c0603680ef6edd799bd8a8b46871a/src/styles/katex.scss
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use just the first 8 characters of the commit hash to shorten the line


KatexSpanStyles? _parseSpanInlineStyles(dom.Element element) {
if (element.attributes case {'style': final styleStr}) {
final stylesheet = css_parser.parse('*{$styleStr}');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need to sanitize styleStr. I guess not: the html package might have done that, and in the worst case this will just fail to parse.

Regardless, it should be helpful to explain why we want to wrap styleStr here.

if (expression is css_visitor.EmTerm && expression.value is num) {
return (expression.value as num).toDouble();
}
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this case, i.e. non-em values, something that we eventually want to add support for?

switch (property) {
case 'margin-left':
marginLeftEm = _getEm(expression);
if (marginLeftEm != null) continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this continue and the following ones look like no-ops


default:
// TODO handle more CSS properties
assert(debugLog('Unsupported CSS property: $property of type ${expression.runtimeType}'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also use _logError here?

}

class KatexSpanStyles {
double? heightEm;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like heightEm was introduced here but not used/handled. Is this intended for the next PR?

@gnprice
Copy link
Member

gnprice commented Apr 3, 2025

With regards with testing

Yeah, good question. It looks like the current version of the PR doesn't add tests for most of the functionality. Let's at a minimum have tests that cover the interesting logic like for sizing and delimsizing.

It doesn't seem quite useful to have tests structured basically a duplicate of all the KaTeX classes we support. Maybe it would be better to test with examples crawled online?

I think these are two complementary types of tests which are both useful:

  • In the test suite, we have examples that are simplified as much as possible — to make them clear to understand, and easy to update for refactorings when needed — while still being complex enough to exercise the desired logic.
  • Outside the test suite, but at times like when we're doing major development on this area, we systematically test a corpus of public real-world examples. When that finds things we'd previously missed, we take those examples and reduce them into new test cases to add to the test suite.

We'll definitely want to be doing the second kind once we're a couple of PRs in to this current effort and feel that we've covered the majority of usage — it'll help us prioritize which remaining areas to do next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintainer review PR ready for review by Zulip maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants