Skip to content

Avoid mutating value field of constructed Integer #40556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

user202729
Copy link
Contributor

Problem

Run the following code

a = [Integer(i) for i in range(100)]
try:
	b = pow(2^100000, 3^100000, 5^100000)
finally:
	del a

it will take a while. Press Ctrl-C before it finishes. Then it will hang for a while before printing KeyboardInterrupt.

Meanwhile, the similar-looking code

a = [int(i) for i in range(100)]
try:
	b = pow(2^100000, 3^100000, 5^100000)
finally:
	del a

does not suffer from the problem.

Reason

There are several places in SageMath code that does something like the following

cdef Integer x = <Integer>PY_NEW(Integer)
sig_on()
mpz_powm(x.value, ...)  # mutate x.value, slow
sig_off()
return x

If the user press ctrl-C, the destructor for x will be called, but because mpz_powm is interrupted, mpz_powm may be in an inconsistent state, so the destructor might do the wrong thing.

The band-aid fix

The band-aid is described in #24986 : a function sig_occurred() is implemented in cysignals that checks whether a signal thrown by sig_on() is currently being processed, and the destructor of Integer checks:

if sig_occurred():
    leak this Integer object just in case
else:
    put this Integer object into a common object pool

The issue

There are two problems with this.

  • This leaks integers indiscriminately. In particular, in the example above, all the 100 Integer objects in a are leaked, even though only one is being mutated, being a local variable in the powering method.

    At any point in the code, we know exactly which object is currently being mutated, as such this indiscriminate leak is unnecessary.

  • sig_occurred() is slow: in the example above, it runs the garbage collector each time it's called. So, the reason for the hang is that after the user presses Ctrl-C, Python needs to run the garbage collector 100 times before reporting KeyboardInterrupt.

The actual fix

I look at the example reported in #24986coerce_actions.pyx. The problematic doctest is

E = EllipticCurve(GF(5), [4,0])
P = E([2,1,1])
from sage.doctest.util import ensure_interruptible_after
with ensure_interruptible_after(0.001): 2^(10^8) * P

by some analysis, I determine that the responsible function is _pow_long.

As such, this pull request makes sure to not call the destructor when the user interrupts.

After this pull request, we can remove the sig_occurred() workaround introduced in #24986 .

Other reported issues

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation and checked the documentation preview.

⌛ Dependencies

Copy link

github-actions bot commented Aug 9, 2025

Documentation preview for this PR (built with commit 9a5e056; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

Copy link
Collaborator

@tscrim tscrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is #31340 an issue again? What about #33081?

Comment on lines 7559 to 7563
# Check that it has exactly one limb allocated
# This is assumed later in fast_tp_new() :issue:`33081`
cdef mpz_ptr dummy = global_dummy_Integer.value
if dummy._mp_alloc == 1 and dummy._mp_size == 0:
if dummy._mp_alloc == 0 and dummy._mp_size == 0:
return True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is no longer true I believe. Are you sure fast_tp_new() is working correctly too?

Copy link
Contributor Author

@user202729 user202729 Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I know, these aren't an issue.

In old GMP version, every mpz object owns an allocated pointer.

As such, the SageMath code for fast_tp_new, when it have to create a new Integer object, must allocate a new pointer with new_mpz._mp_d = <mp_ptr>check_malloc(GMP_LIMB_BITS >> 3). So all's good.

In newer GMP version, newly allocated mpz object no longer need to hold an integer. So the old code breaks as follows:

  1. The dummy integer does not hold any pointer, thus has alloc = 0.
  2. When fast_tp_new creates a new integer by memcpy the dummy integer, the new object still has alloc = 0, yet the _mp_d member holds an allocated pointer.
  3. Subsequent operation on the new integer, such as the mpz_set_pylong, thought that alloc = 0, so it allocates a new pointer and override what we allocated just now. The leak happens here.

With this pull request, before step 3, there's nothing allocated, so there's no leak. Furthermore, the mpz object obtained by memcpy is already valid, so there's no issue with invalid mpz object representation as in #33081 .

A possible disadvantage of this pull request is that it will not support GMP version < 6.2, but I think that is already sufficiently old. Specifically 6.2.0 was released in 2020-01-17. https://web.archive.org/web/20201101053000/https://gmplib.org/#STATUS

Or do you think maintaining compatibility with versions < 6.2 is important? In that case we could use gmp version macros—that said, currently the CI etc. doesn't test this version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spkg-configure.m4 for gmp already checks for >= 6.2.1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and the meson build only looks for a pkg-config file, introduced in v6.2.0

@user202729 user202729 requested a review from orlitzky August 17, 2025 12:39
@orlitzky
Copy link
Contributor

Integer.factorial(27) still works on musl

@orlitzky
Copy link
Contributor

How does the fix relate to the stated problem of Ctrl-C taking a long time? (It still takes a long time for me when Integer is used.)

@user202729
Copy link
Contributor Author

How does the fix relate to the stated problem of Ctrl-C taking a long time? (It still takes a long time for me when Integer is used.)

I explained in the first message. This by itself won't fix the issue, but once similar changes are made throughout the codebase so that no Integer object has its value field mutated by an interruptible GMP function, we could remove the sig_occurred() check in the destructor of Integer, and then it would be fast.

Removing the check in the destructor right now may lead to inconsistent state, as explained in the previous issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants