Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements for QCH preprocessing #64

Closed
wants to merge 5 commits into from

Conversation

PeterFeicht
Copy link
Contributor

I don't know if you had a chance to run the preprocessing step with the new cssless QCH preprocessor yet, but I can tell you it's incredibly slow (like, hours for the complete site on my Intel Core i5-7200U laptop). I tried to improve the performance of that, however, my changes might be somewhat controversial.

I profiled the cssless preprocessing script and noticed that most of the execution time was spent parsing CSS and dumping it to text again and again. So I decided to see whether this could be done using regular expressions instead, since the CSS used on the site is not too complicated. Turns out this works just fine, cutting the execution time of the unit tests in half.

A lot of the CSS parsing happens in premailer, not in our preprocessing script. So the second step I took was to include the premailer source code into the project (premailer hasn't been seriously maintained in several years), so we could modify it. I did the same thing again, using regular expressions instead of full CSS parsing where possible. This again worked great, the execution times for unit tests (which include full preprocessing of two files) is now down to about 1.4s from over 7s at the beginning.

Now the thing that might be controversial is that of course this could break at any time if the site started using CSS that is more complex than a simple regex can handle. I don't see that happening, but feel free to reject this PR if you're concerned about that.

This greatly improves the speed of cssless processing by using regular
expressions instead of a full-blown CSS parser to manipulate the `style`
attribute where possible. The CSS used on the site is simple enough so
this works just fine.

The tests needed some massaging to get them to pass again, because the
new handling caused the following minor changes to the output.

 * Floating point values are always formatted with the decimal point,
   which was omitted previously when the fractional part was zero.
 * Existing CSS declarations are not touched when other declarations are
   added or removed, so there are some differences where spaces were
   added previously but aren't now.
Premailer is used by the cssless preprocessing script to inline CSS
styles, it hasn't been maintained in a while. This adds the source code
of the premailer module so we can make performance improvements to
cssless preprocessing. These are going to be changes specific to this
project, which won't be of any use to upstream premailer.
This adds an option to premailer for dropping style tags even if there
are leftover rules. The cssless preprocessing would drop those tags
after premailer processing anyway, this way we don't even keep them.
This further improves the speed of cssless processing by using regular
expressions in premailer instead of full parsing, if possible. Premailer
is now also used with validation turned off, we just assume that the CSS
we get from the site is valid.

This takes the time for running unit tests from over 7 seconds before
the improvements down to about 1.4 seconds, which should make building a
release more bearable (keep in mind we're talking about 5k files to be
processed, each taking about 0.7 seconds).
Add missing release files, add `doc_html` to `all` target, make
formatting more consistent.
@p12tic
Copy link
Owner

p12tic commented Oct 2, 2018

Thanks for shouting, I agree that the performance of cssless preprocessing is too slow. I don't think that we should compromise the correctness for speed though :). This PR improves the speed of the CSS processing on our side roughly twice just by caching the results as the majority of CSS rules are roughly similar.

The improvements on premailer side look good to me. Ideally they should go to that repo, I'm sure we can work something out as I've contributed a PR there in recent past. Would you mind if I contributed your changes (with appropriate attribution of course) or would you like to do a PR yourself?

@PeterFeicht
Copy link
Contributor Author

I also had the idea of caching parsed CSS styles and tried to put the cssutils objects on their respective DOM elements, unfortunately there's no way to put non-string attributes there.

About the premailer improvements, I don't think (apart from the drop_styles option) that those would be accepted upstream, since they're rather specific to our use case. I made the changes so premailer tests still pass, however, that doesn't mean that they will work with all kinds of valid CSS. Go ahead and make a PR if you think they will accept it.

@PeterFeicht PeterFeicht closed this Oct 2, 2018
@p12tic
Copy link
Owner

p12tic commented Oct 4, 2018

So according to my benchmarks the following results are obtained by processing the first 30 pages sequentially (needs modification of the preprocess_qch.py file):

Original (cppreference-doc @ 824afb2, peterbe/premailer@ebfd310): 71.2s

Caching results on cppreference-doc side (cppreference-doc @ 49f023a, peterbe/premailer@ebfd310): 49.5s (1.4x improvement)

Caching results on both cppreference-doc and premailer (cppreference-doc @ 0ab29ba,
peterbe/premailer@21436d4): 19.6s (2.5x improvement since above, 3.6x improvement total).

The 3.6x improvement is much less than what you achieved, but at least we don't compromise on correctness. It's maybe worth exploring other optimizations.

See these PRs:
peterbe/premailer#198, #67, #66.

@PeterFeicht PeterFeicht deleted the cssless-performance branch October 7, 2018 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants