-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Use division by 100 in to_string for integers
#5691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
672f1db to
7ea6121
Compare
|
I think I would be more comfortable with this if we didn't have to drag in the Ryu tables into It would be fairly simple to completely reimplement the digit tables with a constexpr helper to emit the digits. |
Do we want that? I've inserted predefined array for now. which should be better for throughput. |
After doing this, and eliminating |
|
My concern was around keeping Ryu-derived code in separate files. If we generate the tables with a freshly written constexpr function, then there's no question that they aren't a derived work. That said, the digit tables are very obviously data/facts with no creativity. |
|
Ok, generated the table. |
⚙️ Optimization
Resolves #3857. Divides by 100 instead of by 10, as proposed.
There's similar place in
to_chars, skipped for now.🏁 Benchmark
Large and small numbers., like numbers naturally seen when counting things.
Generated via log-normal distribution, as @statementreply suggested.
Picked some arbitrary parameters, to approximately fit in the integer ranges.
Benchmarked also
std::_UIntegral_to_buffseparetely as well to see how much the optimization helps on its own, avoiding #1024 limitation.⏱️ Benchmark results
i5-1235U P cores:
i5-1235U E cores:
🥉 Results interpretation
I'm not even sure if this is worth doing.
Allocating the string and copying the result there takes roughly half of the time, so the effect of micro-optimization in digits generation is small.
However, the internal function seem to show improvement. This looks like an indication that #1024 improvement would help here. It could be that the performance is limited due to failed store-to-load forwarding, as individual character stores are followed by bulk memcpy; in this case, the improvement may be somewhat negated by a longer stall.