gh-102471, PEP 757: Add PyLong import and export API #121339

vstinner · 2024-07-03T15:28:00Z

Add PyLong_Export() and PyLong_Import() functions and PyLong_LAYOUT structure.

Issue: The C-API for Python to C integer conversion is, to be frank, a mess. #102471

📚 Documentation preview 📚: https://cpython-previews--121339.org.readthedocs.build/

vstinner · 2024-07-03T15:31:47Z

cc @skirpichev @casevh

Include/cpython/longintrepr.h

vstinner · 2024-07-03T20:22:27Z

See also issue #111415

skirpichev

Just my 2c.

The gmpy2 code, used for benchmarking, can be found in my fork:
https://github.com/skirpichev/gmpy/tree/trying-py-import-export

Doc/c-api/long.rst

skirpichev · 2024-07-04T02:31:36Z

Objects/longobject.c

+PyUnstable_Long_Export(PyObject *obj, PyUnstable_LongExport *long_export)
+{
+    if (!PyLong_Check(obj)) {
+        PyErr_Format(PyExc_TypeError, "expect int, got %T", obj);
+        return -1;
+    }
+    PyLongObject *self = (PyLongObject*)obj;
+
+    long_export->obj = (PyLongObject*)Py_NewRef(obj);
+    long_export->negative = _PyLong_IsNegative(self);
+    long_export->ndigits = _PyLong_DigitCount(self);
+    if (long_export->ndigits == 0) {
+        long_export->ndigits = 1;
+    }
+    long_export->digits = self->long_value.ob_digit;
+    return 0;
+}


As this mostly give a direct access to the PyLongObject - it almost as fast as using private stuff before.

Old code:

$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**2' 'mpz(x)' 1000000 loops, best of 20: 232 nsec per loop $ python -m timeit -r11 -s 'from gmpy2 import mpz;x=10**100' 'mpz(x)' 500000 loops, best of 11: 500 nsec per loop $ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**1000' 'mpz(x)' 100000 loops, best of 20: 2.53 usec per loop

With proposed API:

$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**2' 'mpz(x)' 1000000 loops, best of 20: 258 nsec per loop $ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**100' 'mpz(x)' 500000 loops, best of 20: 528 nsec per loop $ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**1000' 'mpz(x)' 100000 loops, best of 20: 2.56 usec per loop

skirpichev · 2024-07-04T02:38:51Z

Objects/longobject.c

+PyObject*
+PyUnstable_Long_Import(int negative, size_t ndigits, Py_digit *digits)
+{
+    return (PyObject*)_PyLong_FromDigits(negative, ndigits, digits);
+}


But this is something I would like to avoid. This requires allocation of a temporary buffer and using memcpy. Can we offer a writable layout to use it's digits in the mpz_export directly?

Benchmarks for old code:

$ python -m timeit -r11 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)' 2000000 loops, best of 11: 111 nsec per loop $ python -m timeit -r11 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)' 500000 loops, best of 11: 475 nsec per loop $ python -m timeit -r11 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)' 100000 loops, best of 11: 2.39 usec per loop

With new API:

$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)' 2000000 loops, best of 20: 111 nsec per loop $ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)' 500000 loops, best of 20: 578 nsec per loop $ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)' 100000 loops, best of 20: 2.53 usec per loop

This requires allocation of a temporary buffer and using memcpy.

Right, PyLongObject has to manage its own memory.

Can we offer a writable layout to use it's digits in the mpz_export directly?

That sounds strange from the Python point of view and make the internals "less opaque". I would prefer to leak less implementation details.

Right, PyLongObject has to manage its own memory.

I'm not trying to change that. More complete proposal: vstinner#4

gmpy2 patch: https://github.com/skirpichev/gmpy/tree/trying-py-import-export-v2

New benchmarks:

$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)' 2000000 loops, best of 20: 111 nsec per loop $ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)' 500000 loops, best of 20: 509 nsec per loop $ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)' 100000 loops, best of 20: 2.44 usec per loop

I would prefer to leak less implementation details.

I don't think this leak anything. It doesn't leak memory management details. PyLong_Import will just do allocation memory. Writting digits will be job for mpz_export, as before.

Without this, it seems - there are noticeable performance regression for integers of intermediate range. Something up to 20% vs 7% on my branch.

Edit: currently, proposed PyUnstable_Long_ReleaseImport() match PyUnstable_Long_ReleaseExport(). Perhaps, it could be one function (say, PyUnstable_Long_ReleaseDigitArray()), but I unsure - maybe it puts some constraints on internals of the PyLongObject.

Objects/longobject.c

Doc/c-api/long.rst

skirpichev · 2024-07-04T03:27:09Z

CC @tornaria, as Sage people might be interested in this feature.

skirpichev · 2024-07-04T05:06:03Z

CC @oscarbenjamin, you may want this for python-flint

vstinner · 2024-07-04T07:08:41Z

I updated my PR:

Use -1 for little endian and +1 for big endian.
Rename PyUnstable_LongExport to PyUnstable_Long_DigitArray.
Add "always succeed" mention in the doc.

Lib/test/test_capi/test_long.py

Misc/NEWS.d/next/C API/2024-07-03-17-26-53.gh-issue-102471.XpmKYk.rst

Doc/whatsnew/3.14.rst

oscarbenjamin · 2024-07-04T11:23:49Z

CC @oscarbenjamin, you may want this for python-flint

Absolutely. Currently python-flint uses a hex-string as an intermediate format when converting between large int and fmpz so anything more direct is an improvement. I expect that python-flint would use mpz_import/mpz_export just like gmpy2.

Objects/longobject.c

vstinner · 2024-07-04T12:56:16Z

@skirpichev: I added a PyLongWriter API similar to what @encukou proposed.

Example:

PyLongObject *
_PyLong_FromDigits(int negative, Py_ssize_t digit_count, digit *digits)
{
    PyLongWriter *writer = PyLongWriter_Create();
    if (writer == NULL) {
        return NULL;
    }

    if (negative) {
        PyLongWriter_SetSign(writer, -1);
    }

    Py_digit *writer_digits = PyLongWriter_AllocDigits(writer, digit_count);
    if (writer_digits == NULL) {
        goto error;
    }
    memcpy(writer_digits, digits, digit_count * sizeof(digit));

    return (PyLongObject*)PyLongWriter_Finish(writer);

error:
    PyLongWriter_Discard(writer);
    return NULL;
}

The PyLongWriter_Finish() function normalizes the number and gets a small number if needed. Example:

>>> import _testcapi; _testcapi.pylong_import(0, [100, 0, 0]) is 100
True

vstinner · 2024-07-04T13:00:48Z

I mark the PR as a draft until we agree on the API.

skirpichev · 2024-07-04T13:20:48Z

I added a PyLongWriter API similar to what @encukou proposed.

Yes, that looks better and should fix speed regression. I'll try to benchmark that, perhaps tomorrow.

But cost is 5 (!) public functions and one new struct, additionally to PyUnstable_Long_Import(), which will be a more slow API. Correct? C.f. just one function in above proposal.

vstinner · 2024-07-04T13:31:12Z

I updated the PR to remove the PyUnstable_ prefix, replace it with the PyLong prefix.

vstinner · 2024-07-04T13:38:57Z

But cost is 5 (!) public functions and one new struct, additionally to PyUnstable_Long_Import(), which will be a more slow API. Correct? C.f. just one function in #121339 (comment) proposal.

My concern is to avoid the problem capi-workgroup/api-evolution#36 : avoid exposing _PyLong_New() object until it's fully initialized. The "writer" API hides the implementation details but also makes sure that the object is not "leaked" to Python before it's fully initialized and valid. By the way, the implementation uses functions which are only safe if the object cannot be seen in Python: if Py_REFCNT(obj) is 1.

vstinner · 2024-07-04T13:43:25Z

@skirpichev: Would it be useful to add a PyLongWriter_SetValue(PyLongWriter *writer, long value) function? It would be similar to PyLong_FromLong(long value) (but may be less efficient), so I'm not sure if it's relevant.

picnixz

For users, I think it'd be great if they know (in the docs) whether something can be NULL or not, especially since we have a similar interface for unicode objects.

Doc/c-api/long.rst

picnixz · 2024-12-09T15:27:14Z

Modules/_testcapi/long.c

+    digit *digits = PyMem_Malloc((size_t)ndigits * sizeof(digit));
+    if (digits == NULL) {
+        return PyErr_NoMemory();


The docs say that we should check for NULL but agreed that we could update them.

Objects/longobject.c

Doc/c-api/long.rst

skirpichev · 2024-12-09T23:37:21Z

Modules/_testcapi/long.c

+    digit *digits = PyMem_Malloc((size_t)ndigits * sizeof(digit));
+    if (digits == NULL) {
+        return PyErr_NoMemory();


Maybe bug is somewhere else? Newly added functions usually are explicitly say if they set an exception on error.

Objects/longobject.c

skirpichev · 2024-12-09T23:45:37Z

Objects/longobject.c

+    int overflow;
+#if SIZEOF_LONG == 8
+    long value = PyLong_AsLongAndOverflow(obj, &overflow);
+#else
+    // Windows has 32-bit long, so use 64-bit long long instead
+    long long value = PyLong_AsLongLongAndOverflow(obj, &overflow);
+#endif
+    Py_BUILD_ASSERT(sizeof(value) == sizeof(int64_t));


Another reason for using this API was "no-error" contract. Maybe we can specify that in docs as a CPython implementation detail? IIUC, we are free to change such things in new releases without deprecation period.

Note also, that gmpy2 benchmarks measure not just CPython side, but the whole conversion path (int->gmpy2.mpz in this case).

skirpichev · 2024-12-09T23:54:55Z

Doc/c-api/long.rst

+
+   If *export_long->digits* is not ``NULL``, :c:func:`PyLong_FreeExport` must
+   be called when the export is no longer needed.
+


Suggested change

.. impl-detail::

This function always succeeds if *obj* is a Python :class:`int` object

or a subclass.

Lets see if we can restore this in a that way. It might be helpful for e.g. Sage, which doesn't support PyPy.

I would prefer to not add this note. It was controversial during PEP 757 design.

@serhiy-storchaka @encukou: What do you think? Would you be ok to declare that the PyLong_Export() function cannot fail if the argument is a Python int?

It was controversial during PEP 757 design.

It was proposed unconditionally, not as CPython's implementation detail.

I'm fine with it, as an implementation detail.

@serhiy-storchaka ?

vstinner · 2024-12-10T09:16:40Z

@picnixz: I addressed your review. Please review the updated PR.

picnixz

Final nits and LGTM. Since I'm on mobile I don't know whether the spaces are correct or not so please check it manually.

Objects/longobject.c

Co-authored-by: Bénédikt Tran <[email protected]>

vstinner · 2024-12-11T11:11:28Z

Thanks for all reviews. I think that the change is now ready to be merged, I addressed all comments. I plan to merge the PR Friday.

skirpichev · 2024-12-11T12:18:34Z

I plan to merge the PR Friday.

I would appreciate this, but... Isn't now an approval from other core developer is required for this type of pr's?

serhiy-storchaka

Update also Doc/data/refcounts.dat.

serhiy-storchaka · 2024-12-12T15:08:37Z

Objects/longobject.c

+    int overflow;
+#if SIZEOF_LONG == 8
+    long value = PyLong_AsLongAndOverflow(obj, &overflow);
+#else
+    // Windows has 32-bit long, so use 64-bit long long instead
+    long long value = PyLong_AsLongLongAndOverflow(obj, &overflow);
+#endif
+    Py_BUILD_ASSERT(sizeof(value) == sizeof(int64_t));


This code is so close to the PyLong implementation, that I think we should use _PyLong_IsCompact() + _PyLong_CompactValue() to be sure that it matches the specification.

Objects/longobject.c

serhiy-storchaka · 2024-12-12T15:10:18Z

Objects/longobject.c

+        if (export_long->ndigits == 0) {
+            export_long->ndigits = 1;
+        }


It's to make the API easier to use: the consumer doesn't have to both with ndigits==0 special case.

In fact it's assumed (see e.g. _PyLong_New), that the digit array always has at least one digit (and it's initialized for 0 too). And also, if you pass ndigits==0 to the mpz_import() as count parameter, then it just calls malloc(0).

But I think it's safe just drop this check. This condition is unreachible here, as 0 handled in the !overflow case.

Suggested change

if (export_long->ndigits == 0) {

export_long->ndigits = 1;

}

serhiy-storchaka · 2024-12-12T15:12:45Z

Objects/longobject.c

+PyLongWriter*
+PyLongWriter_Create(int negative, Py_ssize_t ndigits, void **digits)
+{
+    if (ndigits <= 0) {


Why not allow 0?

@gpshead asked to reject this case: https://discuss.python.org/t/pep-757-c-api-to-import-export-python-integers/63895/74

Actually, he asked to document that case (i.e. when ndigits==0).

We don't think it's a good idea to overbloat docs with this edge case (different functions should be used for small integers). PEP has a dedicated section to discuss import for small integers, suggesting different functions. (In fact, the whole API is about import/export for big integers.)

Modules/_testcapi/long.c

Lib/test/test_capi/test_long.py

encukou

Looks good, though I have a few suggestions.

Also, please test that PyLong_FreeExport can be used after PyLong_Export fills in the compact value. AFAICS, the tests now skip it if they can.

Doc/c-api/long.rst

encukou · 2024-12-12T15:08:13Z

Doc/c-api/long.rst

+
+   If *export_long->digits* is not ``NULL``, :c:func:`PyLong_FreeExport` must
+   be called when the export is no longer needed.
+


I'm fine with it, as an implementation detail.

Doc/c-api/long.rst

Objects/longobject.c

Doc/c-api/long.rst

Doc/conf.py

vstinner · 2024-12-12T16:48:17Z

@serhiy-storchaka and @encukou: I addressed your reviews. Please review the updated PR.

@serhiy-storchaka:

Update also Doc/data/refcounts.dat.

I added PyLong_Export() and PyLongWriter_Finish() to Doc/data/refcounts.dat, the only function which takes a PyObject*.

@encukou:

Also, please test that PyLong_FreeExport can be used after PyLong_Export fills in the compact value. AFAICS, the tests now skip it if they can.

Done.

serhiy-storchaka

LGTM.

Please do not forget to edit the commit message before merging.

vstinner · 2024-12-13T13:26:14Z

Ok, I merged the PR. Big thanks to everyone who reviewed this PR which got 272 comments and 54 commits! Special thanks to @skirpichev who wrote a big part of this work.

skirpichev · 2025-02-26T01:28:54Z

IIUIC, Now anonymous unions are allowed.

Maybe we could use them in the export API in this way:

typedef struct PyLongExport2 {
    union {
        int64_t compact_value;
        struct {
            uint8_t negative;
            Py_ssize_t ndigits;
            const void *digits;
            Py_uintptr_t _reserved;
        } digit_array;
    };
} PyLongExport2;

?

This structure has smaller size. Also, such API make a more clear distinction for alternative views.

@vstinner ?

vstinner · 2025-02-26T07:07:50Z

Also, such API make a more clear distinction for alternative views.

How do you know if compact_value or digit_array must be used?

skirpichev · 2025-02-26T07:31:47Z

How do you know if compact_value or digit_array must be used?

By PyLong_Export's return value. Errors < 0. Nonnegative values - for possible export types.

vstinner · 2025-02-26T11:42:21Z

IIUIC, Now anonymous unions capi-workgroup/decisions#30 (comment). Maybe we could use them in the export API in this way: (...)

PEP 757 had to go through the C API Working Group and then the Steering Council. I don't think that this change is worth it to restart this validation process.

vstinner requested review from erlend-aasland and corona10 as code owners July 3, 2024 15:28

bedevere-app bot mentioned this pull request Jul 3, 2024

The C-API for Python to C integer conversion is, to be frank, a mess. #102471

Open

bedevere-app bot added the awaiting core review label Jul 3, 2024

vstinner force-pushed the long_export branch from 21dd5b9 to a087b0a Compare July 3, 2024 15:30

vstinner mentioned this pull request Jul 3, 2024

Add public function PyLong_GetDigits() capi-workgroup/decisions#31

Closed

zooba reviewed Jul 3, 2024

View reviewed changes

Include/cpython/longintrepr.h Outdated Show resolved Hide resolved

vstinner requested a review from ericsnowcurrently as a code owner July 3, 2024 20:13

vstinner mentioned this pull request Jul 3, 2024

Consider restoring _PyLong_New() function as public #111415

Closed

skirpichev reviewed Jul 4, 2024

View reviewed changes

skirpichev mentioned this pull request Jul 4, 2024

Consider adding public PyLong_FlipSign() function #120446

Closed

skirpichev reviewed Jul 4, 2024

View reviewed changes

oscarbenjamin mentioned this pull request Jul 4, 2024

More efficient conversion from Python int to fmpz flintlib/python-flint#159

Open

skirpichev reviewed Jul 4, 2024

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

vstinner marked this pull request as draft July 4, 2024 13:00

bedevere-app bot removed the awaiting core review label Jul 4, 2024

vstinner force-pushed the long_export branch from a191357 to 294f53a Compare July 4, 2024 13:26

vstinner mentioned this pull request Jul 4, 2024

Avoid creating incomplete/invalid objects capi-workgroup/api-evolution#36

Open

picnixz reviewed Dec 9, 2024

View reviewed changes

skirpichev reviewed Dec 9, 2024

View reviewed changes

Address Bénédikt's review

eaebef3

picnixz approved these changes Dec 10, 2024

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

Objects/longobject.c Outdated Show resolved Hide resolved

Apply suggestions from code review

03248c7

Co-authored-by: Bénédikt Tran <[email protected]>

skirpichev mentioned this pull request Dec 11, 2024

Fast mpz <-> int conversion diofant/python-gmp#14

Closed

Address Steve's review

0a26f97

serhiy-storchaka reviewed Dec 12, 2024

View reviewed changes

encukou reviewed Dec 12, 2024

View reviewed changes

vstinner added 4 commits December 12, 2024 17:09

Add PyLong_Export to Doc/data/refcounts.dat

88a62fe

Address Serhiy's review

45517ab

Address Petr's review

92007d1

Add PyLongWriter to Doc/data/refcounts.dat

6d3cb80

serhiy-storchaka approved these changes Dec 12, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Dec 12, 2024

vstinner merged commit 6446408 into python:main Dec 13, 2024
45 checks passed

bedevere-app bot removed the awaiting merge label Dec 13, 2024

vstinner mentioned this pull request Dec 13, 2024

Add PyLong Import/Export API python/pythoncapi-compat#121

Merged

vstinner deleted the long_export branch January 14, 2025 08:19


		If export_long->digits is not ``NULL``, :c:func:`PyLong_FreeExport` must
		be called when the export is no longer needed.

+    .. impl-detail::
+        This function always succeeds if *obj* is a Python :class:`int` object
+        or a subclass.

Uh oh!

gh-102471, PEP 757: Add PyLong import and export API #121339

gh-102471, PEP 757: Add PyLong import and export API #121339

Uh oh!

Conversation

vstinner commented Jul 3, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vstinner commented Jul 3, 2024

Uh oh!

skirpichev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skirpichev Jul 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skirpichev commented Jul 4, 2024

Uh oh!

skirpichev commented Jul 4, 2024

Uh oh!

vstinner commented Jul 4, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oscarbenjamin commented Jul 4, 2024

Uh oh!

Uh oh!

vstinner commented Jul 4, 2024

Uh oh!

vstinner commented Jul 4, 2024

Uh oh!

skirpichev commented Jul 4, 2024

Uh oh!

vstinner commented Jul 4, 2024

Uh oh!

vstinner commented Jul 4, 2024

Uh oh!

vstinner commented Jul 4, 2024

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vstinner commented Jul 3, 2024 •

edited by github-actions bot

Loading

vstinner commented Jul 3, 2024 •

edited

Loading

skirpichev Jul 4, 2024 •

edited

Loading