Skip to content

lemmy_db_views_modlog_combined has a weird performance regression #141006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lcnr opened this issue May 14, 2025 · 2 comments
Open

lemmy_db_views_modlog_combined has a weird performance regression #141006

lcnr opened this issue May 14, 2025 · 2 comments
Labels
C-bug Category: This is a bug. E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-prioritize Issue: Indicates that prioritization has been requested for this issue. I-slow Issue: Problems and improvements with respect to performance of generated code. regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@lcnr
Copy link
Contributor

lcnr commented May 14, 2025

In case its helpful here is profiling data for lemmy_db_views_modlog_combined using cargo 1.88.0-nightly (7918c7eb5 2025-04-27):

Item Self time % of total time Time Item count Incremental result hashingtime
check_liveness 82.00s 42.409 82.04s 43 4.60µs
mir_built 76.86s 39.752 169.03s 43 562.14µs
layout_of 16.74s 8.656 84.04s 5691 1.26ms

Originally posted by @Nutomic in #132064

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label May 14, 2025
@weiznich
Copy link
Contributor

weiznich commented May 14, 2025

Adding the content from #132064 (comment)

Steps to get these results:

  • Checkout git clone https://github.com/LemmyNet/lemmy.git --recursive --branch modlog-combined-remove-derives
  • Change to the right directory cd lemmy
  • Build with cargo check -p lemmy_db_views_modlog_combined --all-features (both the crate + the --all-features flag is relevant). This should get the dependencies build. For the issue only the lemmy_db_views_modlog_combined crate is relevant.
  • For the timing: Run the build with RUSTFLAGS="-Zself-profile" and process the output with summarize
  • For the samply profile: Run the profiler with [samply](https://github.com/mstange/samply] (it's literally just samply record cargo check -p lemmy_db_views_modlog_combined --all-features). That needs a compiler with debug symbols.
  • The linked gist contains the debug output of a self in the linked function. That output appeared ~8k times

Original comment:


These results are surprising for me as neither check_liveness nor mir_built nor layout_of are related to the trait solving at all. Given your profile these seem to account for ~90% of the build time of that particular crate.

Given these suprising results I checked out the code locally and used rustc 1.88.0-nightly (50aa04180 2025-05-08) to verify these results. I got slightly different numbers but overall the result remains the same:

+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| Item                                  | Self time | % of total time | Time     | Item count | Incremental result hashing time |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| check_liveness                        | 76.09s    | 45.199          | 76.13s   | 43         | 5.11µs                          |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| mir_built                             | 75.96s    | 45.119          | 161.82s  | 43         | 588.78µs                        |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| evaluate_obligation                   | 5.20s     | 3.087           | 5.24s    | 10254      | 1.26ms                          |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| check_match                           | 4.43s     | 2.629           | 9.68s    | 43         | 6.14µs                          |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| typeck                                | 4.10s     | 2.433           | 7.14s    | 43         | 275.40µs                        |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| check_well_formed                     | 2.05s     | 1.217           | 155.57s  | 367        | 34.51µs                         |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| normalize_canonicalized_projection_ty | 185.52ms  | 0.110           | 187.93ms | 943        | 260.83µs                        |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+

It seems like typeck is not the bottleneck here, although it still could be faster than 4s. evaluate_obligation (coherence checking) is also relatively fast with 5s. (It's still up to discussion if 4s and 5s is the best performance that the compiler could deliver there, but that's something that can wait).

Overall the main issues from my point of view are the check_liveness and the mir_built queries, as they result in >90% of the time spend compiling there. A quick search in the compiler indicates that check_liveness seems to be a pass that removes dead code and mir_built seems to be the query pass that builds MIR. Given that the crate in question only has ~500 lines of code (1700 lines if all macros are expanded) I wouldn't expect both passes to take more than a minute, but rather several orders of magnitude less time.

I attach both the recorded mm_profdata file and the chrome profile to the post so that you can checkout that easily on your own.

profile.zip

For me that all indicates:

  • The new trait solver won't fix that issue as it's not at all related to trait solving
  • My personal take away from this is that developers (including me) often see diesel + slow in some issue and automatically assume that it is always typeck that's causing this issue. So for the future: Better verify that typeck is really the issue rather than just assuming it and spend arguing about "non-issues".

EDIT:

I've uploaded a samply profile here: https://share.firefox.dev/4kefIUS. This reports that 87% of the time is spend in this function:

EDIT 2:

I dug a bit further and it seems like most of the time is spend here on normalizing that generic query type again and again and again. https://github.com/rust-lang/rust/blob/master/compiler/rustc_middle/src/ty/inhabitedness/inhabited_predicate.rs#L102
For reference the input to that function is something like https://gist.github.com/weiznich/b17704681fc78d2e12e225416c0a6658 (there are several thousand variants that differ in the number of And( and Or( wrappers in the beginning.)

I'm not sure if it would be possible to cache that so the compiler only needs to do that once and not repeat it thousand times? As I don't know anything about that part of the compiler nor do I have the capacity to fix that on my own there is not much more I can do here. From my perspective that still looks like a rather critical regression that was possibly introduced while the fix for the other issue was done and therefore went unnoticed for quite some time. (Or better the fallout was attribute to the other issue)

@jieyouxu jieyouxu added I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels May 14, 2025
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label May 14, 2025
@jieyouxu jieyouxu added E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example labels May 14, 2025
@lqd
Copy link
Member

lqd commented May 14, 2025

@lcnr To me this looks like #132064, which I've bisected to #126024, and was improved by #132625 as an alternative to the possible revert in #132075. Only I guess that this looks worse with the new lemmy code (and/or --all-features) it seems.

For this updated example, lemmy_db_views_modlog_combined cargo checks:

Do you think we need a new issue to track this?

(#140944 likely is another duplicate in practice, however that one can be worked around differently.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-prioritize Issue: Indicates that prioritization has been requested for this issue. I-slow Issue: Problems and improvements with respect to performance of generated code. regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants