-
Notifications
You must be signed in to change notification settings - Fork 166
Tectonic is slower than xelatex #452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
An option to make tectonic faster is to use a non-cryptography hash function, like https://github.com/Cyan4973/xxHash. |
Yeah... But I am a bit surprised that it is twice as slow. I did make a flamegraph a long time ago where one can see where sha2 shows up: When testing at that time, tectonic was only a tiny bit slower: https://tectonic.newton.cx/t/profiling-tectonic/32/3?u=rekka I should probably try again on the current version |
Here's a rust implementation of xxHash, for reference: https://crates.io/crates/twox-hash In my measurements long time ago, the difference is bigger for small documents since in that case a lot of time is spend in loading the format file, which is decompressed (gzip) and sha2 hashed. Surprisingly more time is spend in computing crc32 than sha2 :) There is some low-hanging fruit in the io layer of tectonic for sure. |
It is actually rather trivial to remove almost all sha256 computation since most of it is never needed, see pull request #453. Tectonic also does not compress the format file anymore, so that is not an issue. |
I'll try building a flamegraph and perhaps documenting how to profile. While converting a 76 page document using recent builds from master, I saw it take 90 seconds or more on macOS. |
@efx You mean 90 s with only using local cache? Or does this include downloading files form the online bundle? |
@efx Yeah, would like to see any profiling you can muster on that — we'll be slower, but that's a lot slower, I imagine. |
I ran both on macOS from a newly compiled version (f8b1590).
The example where I see the bigger slowdown is autogenerated LaTeX from pandoc. It has a number of images and we are using a custom TTF font. My hunch is that bottlenecks the PDF generation as I observed 2-3x slowdowns in a 1 page document when using this font.
|
@efx Hm, I've never seen such a slow compilation. Even beamer presentations or figure heavy documents take only a few seconds to compile for me. Can you remove
This should generate |
Interesting. I will see if I can get dtrace permissions / run on another machine tomorrow:
I could not recreate the subsequent slowdowns for the |
I generated a flamegraph while converting my document on a Linux machine: https://github.com/efx/tectonic/blob/run-flamegraph/flamegraph.svg And here is the full log output:
|
Thanks! Unfortunately, in this case it doesn't seem to be tectonic's overhead. You just have a beast of a document. I'd guess lots of (png?) images. You also mentioned a large font. I do not know how that is handled in the pdf file. Even thought there are 3 tex passes, they take together only 20% of the runtime. The main tectonic's overhead, sha256 digest computation is only 2% of the runtime. I would think that xetex will be about as slow as tectonic. (If you have a chance to compare with xetex run time, it would be helpful.) About 80% is spent compressing pdf objects at the output stage and I believe that xetex has the same default behavior. Both xetex and tectonic use zlib's compression level 9: the slowest compression. There might be some discussion whether it is worth it, see Figure 1 at https://clearlinux.org/news-blogs/linux-os-data-compression-options-comparing-behavior. From a brief reading of the xdvipdfmx backend, I'd guess that png images are decompressed and then compressed again as pdf objects. I tested with a disabled compression on a file of mine with a lot of pngs, and the compression ratio is above 50:1 on the pdf file so it might not be a good idea to disable the compression. :) |
Thank you for diagnosing and explaining the problem.
Bingo. It has 99 PNG files being inserted, alongside a logo on each page (74 pages). These images have been compressed too so it would be nice if there was a configuration option to disable compression of images while generating a PDF. |
Currently, xetex (and pdflatex it seems) does decompress all png images in the following function: tectonic/tectonic/dpx-pngimage.c Line 154 in 84cdc69
From what I read about pdf and png formats, this is not necessary for non-interlaced pngs without an alpha channel (transparency) because in that case the png data stream and pdf image stream are completely compatible. Here's a code that takes advantage of this for reference: img2pdf. There is also another advantage to copying the data directly, besides saving time, since it is possible to optimize png compression by tools like oxipng and the decompression/compression cycle breaks it. It would be easy enough to test if the png format is compatible and just copy the data directly. But there are some issues like gamma correction and color spaces and I am not too familiar with those. |
@malbarbo Now when #453 is merged, could you try rerunning the test on your machine with the version on
Still runs bibtex but that's quick anyway. On
macbook pro Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz. xelatex version
Just for fun with
|
@rekka Thanks for working on this, you got really incredible results! I noticed later that I should use
and
and
Although I could not get tectonic to run as fast as xelatex, definitively #453 improved the results. Trying some files which I have performance problems using tectonic I found one extreme case that is still not solved by #453: using beamer with metropolis theme. Tectonic takes 42s seconds to process the following file \documentclass{beamer}
\usetheme{metropolis}
\begin{document}
\begin{frame}
Test
\end{frame}
\end{document} And the culprit is the digest calculation:
|
@malbarbo Thanks for checking. Glad to hear that for OK, so just to check the difference between mac and linux, I ran the test again on Ubuntu 18.04 with Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz (hyperfine supports markdown tables!)
What is going on with xelatex?? So on my linux machine tectonic is significantly slower than xelatex. Tectonic speed improves because it's just a faster cpu, but why is xelatex so much faster than on my mac? I mean, it's an older version:
I am quite shocked by the performance for the metropolis theme. I use it for my presentations with Fira Sans fonts and on both mac and linux it performs just fine. This is the linux machine:
sha2 computation takes about 5% of the runtime. Here's the flamegraph: Is there something particular to your setup? |
@malbarbo In case you want to try it, I replaced sha256 with xxHash for file change tracking on my branch https://github.com/rekka/tectonic/tree/feat-xxhash. |
The Fira fonts was not installed in my system. Installing the fonts reduced the time to 5.3s ( Using the feat-xxhash branch the time is reduced to 5s ( When using the feat-xxhash branch without Fira fonts installed I got 7.5s. In this case, All these numbers corresponds to running |
Thanks for checking; this is very useful. It must be then reading a lot of font data if Fira is not installed. Hm. Yes, we almost certainly do not need to compute the digest of system fonts. I'll try to look into disabling that too. |
FWIW, here is some output from a 300 page extraction of source code comments,
|
Thanks for the report! It's curious that I can't really see any part of the flamegraph that would be caused by tectonic's overhead so it is a mystery to me why it performs so much worse than xelatex (on linux only as far as I can tell). |
One thing I notice (at least on @rekka 's flamegraph branch), is that when building tectonic/*.c using I'm not sure if this is just due to the debug info settings in the flamegraph branch. On the master branch running
|
@ratmice Good sleuthing! |
@ratmice Oh, that's a good point. I wasn't sure how much the extra debug info needed for creating the flamegraph changes the timing. Thanks for testing it. Could you post the xelatex version and system you're running it on? ( I observe that the run times are like this: xelatex on Ubuntu (texlive 2017) is much faster than tectonic, which is faster than xelatex on macOS (texlive 2019). |
@rekka Fedora 29 x86-64, XeTeX 3.14159265-2.6-0.99999 (TeX Live 2018) One thing worth mentioning is that on x86-64, perf (and thus flamegraph) is capable of running without -fno-omit-frame-pointer by building the frame pointer from the dwarf, and .eh_frame, so it is possible to profile with less overhead. I can even profile the system package of xetex, by installing the separate debuginfo files. In that case there is no overhead/code gen changes needed for profiling, perhaps profiling encounters less samples not sure. If i get some time i'll try and build tectonic with -O2, and perhaps build flags the fedora package uses, to get a closer comparison, but the versions probably differ, and the package is patched so it'd probably be a better comparison to just build from the original version xetex that tectonic derives from if we want to subtract out the difference. |
I added an option to cc-rs pull req to allow us to configure whether frame pointers are omitted. I will try to remember to come up with a |
This can be observed when generating a pdf for
tests/xenia/paper.tex
.Using
tectonic
from master compiled withcargo install --path .
andxelatex
from Debian 9 in a i3-2330M 2.20GHz machine, I got the following resultsRunning
perf
show that at least 25% of the time is expended by sha2 crate:Besides using hash to check if the engine needs to be rerun, what are the other uses? Is the hash of cached files checked? (I skimmed the code but was unable to answer that).
The text was updated successfully, but these errors were encountered: