-
-
Notifications
You must be signed in to change notification settings - Fork 193
Performance regression due to absence of lto #2008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So yes I'm working on this right now. I decided to toss the idea of using flto for clang-6 or 7 (it's very hard to tell which version you are on). So as long as the user is using It's taking me a minute because a few compiler settings + flto can actually lead to a slow-down(!) / wrong values on the performance tests. My guess is that too many optimizations + flto is causing too much inlining and some other algebra optims can cause slightly different numerics |
@wds15 can you throw the model and data up into a gist? If not feel free to email it to me |
This will be fixed with an upstream PR, so reopening this until that happens. |
Description
The freshly merged static matrix refactor things do slow down stan programs significantly whenever lto is not turned on.
So as of now in the static matrix world we absolutely need the lto optimisation for good performance. Thus, our makefiles should ideally switch on lto whenever the compiler is capable enough of handling it. The user should not be required to add
-flto
to themake/local
to get good performanceExample
See #2007 (comment)
where it is demonstrated that turning on lto gives back the old performance numbers.
Expected Output
Results should be produced as fast as in the pre static matrices.
Current Version:
v3.3.0
The text was updated successfully, but these errors were encountered: