gh-129275: avoid temporary buffer in dec_as_long() #129630

skirpichev · 2025-02-04T06:38:58Z

According to the documentation: "If rdata is non-NULL, it MUST be allocated by one of libmpdec’s allocation functions and rlen MUST be correct. If necessary, the function will resize rdata. Resizing is slow and should not occur if rlen has been obtained by a call to mpd_sizeinbase."

So, possible resizing in mpd_qexport_u32/16() is for guarding against broken log10() implementations (log10 is used in the mpd_sizeinbase()).

Issue: Avoid temporary allocation in dec_as_long() #129275

Temporary memory allocation slowdown conversion for small integers (~2 digits):

Benchmark	ref	patch
int(Decimal(1<<7))	499 ns	495 ns: 1.01x faster
int(Decimal(1<<300))	2.16 us	2.06 us: 1.05x faster
Geometric mean	(ref)	1.02x faster

Benchmark hidden because not significant (2): int(Decimal(1<<38)), int(Decimal(1<<3000))

# bench_Decimal-to-int.py

import pyperf
from decimal import Decimal

values = ['1<<7', '1<<38', '1<<300', '1<<3000']

runner = pyperf.Runner()
for v in values:
    d = Decimal(eval(v))
    bn = 'int(Decimal('+v+'))'
    runner.bench_func(bn, int, d)

According to the documentation: "If rdata is non-NULL, it MUST be allocated by one of libmpdec’s allocation functions and rlen MUST be correct. If necessary, the function will resize rdata. Resizing is slow and should not occur if rlen has been obtained by a call to mpd_sizeinbase." So, possible resizing in mpd_qexport_u32/16() is for guarding against broken log10() implementations (log10 is used in the mpd_sizeinbase()).

skirpichev · 2025-02-04T08:33:39Z

Edit: For a context, Serhiy's concerns about using mpd_sizeinbase() from old pr:

gh-127937: convert decimal module to use import API for ints (PEP 757) #127925 (comment)
gh-127937: convert decimal module to use import API for ints (PEP 757) #127925 (review)

In our case (base is a power of 2) we can avoid using mpd_sizeinbase() and implement more accurate version. Here is the mod_sizeinbase code:

cpython/Modules/_decimal/libmpdec/mpdecimal.c

Lines 8084 to 8113 in 65ae3d5

    
           size_t 
        
           mpd_sizeinbase(const mpd_t *a, uint32_t base) 
        
           { 
        
               double x; 
        
               size_t digits; 
        
               double upper_bound; 
        
               assert(mpd_isinteger(a)); 
        
               assert(base >= 2); 
        
               if (mpd_iszero(a)) { 
        
                   return 1; 
        
               } 
        
               digits = a->digits+a->exp; 
        
           #ifdef CONFIG_64 
        
               /* ceil(2711437152599294 / log10(2)) + 4 == 2**53 */ 
        
               if (digits > 2711437152599294ULL) { 
        
                   return SIZE_MAX; 
        
               } 
        
               upper_bound = (double)((1ULL<<53)-1); 
        
           #else 
        
               upper_bound = (double)(SIZE_MAX-1); 
        
           #endif 
        
               x = (double)digits / log10(base); 
        
               return (x > upper_bound) ? SIZE_MAX : (size_t)x + 1; 
        
           }

Essentially, it does ndigits * log2(10)/shift. This should be also a correct bound:

(size_t)(3.321928094887363*((ndigits + shift - 1)/shift))

For shift=30 and ndigits ~ 1<<53 (upper_bound for typical case) - it will overestimate size in just 1 digit.

erlend-aasland

Based on the previous PR and discussions, this looks correct to me. I'd like Serhiy's opinion on it, though, since he expressed concerns earlier.

skirpichev · 2025-05-17T10:26:28Z

@erlend-aasland, It seems that, that Serhiy is still against this change, see the issue thread.

So, I'm going to close this in few days, unless someone else can argue further.

skirpichev · 2025-05-24T01:28:58Z

Thanks for review to all.

bedevere-app bot mentioned this pull request Feb 4, 2025

Avoid temporary allocation in dec_as_long() #129275

Closed

skirpichev added the skip news label Feb 4, 2025

skirpichev marked this pull request as ready for review February 4, 2025 08:33

bedevere-app bot added the awaiting review label Feb 4, 2025

erlend-aasland requested a review from serhiy-storchaka May 12, 2025 18:51

erlend-aasland approved these changes May 12, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels May 12, 2025

skirpichev closed this May 24, 2025

skirpichev deleted the dec_as_long-nocopy/129275 branch May 24, 2025 01:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-129275: avoid temporary buffer in dec_as_long() #129630

gh-129275: avoid temporary buffer in dec_as_long() #129630

Uh oh!

skirpichev commented Feb 4, 2025 •

edited

Loading

Uh oh!

skirpichev commented Feb 4, 2025 •

edited

Loading

Uh oh!

erlend-aasland left a comment

Uh oh!

skirpichev commented May 17, 2025

Uh oh!

skirpichev commented May 24, 2025

Uh oh!

Uh oh!

Uh oh!

gh-129275: avoid temporary buffer in dec_as_long() #129630

gh-129275: avoid temporary buffer in dec_as_long() #129630

Uh oh!

Conversation

skirpichev commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skirpichev commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erlend-aasland left a comment

Choose a reason for hiding this comment

Uh oh!

skirpichev commented May 17, 2025

Uh oh!

skirpichev commented May 24, 2025

Uh oh!

Uh oh!

skirpichev commented Feb 4, 2025 •

edited

Loading

skirpichev commented Feb 4, 2025 •

edited

Loading