-
-
Notifications
You must be signed in to change notification settings - Fork 648
Avoid mutating value field of constructed Integer #40556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Documentation preview for this PR (built with commit 9a5e056; changes) is ready! 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/sage/rings/integer.pyx
Outdated
# Check that it has exactly one limb allocated | ||
# This is assumed later in fast_tp_new() :issue:`33081` | ||
cdef mpz_ptr dummy = global_dummy_Integer.value | ||
if dummy._mp_alloc == 1 and dummy._mp_size == 0: | ||
if dummy._mp_alloc == 0 and dummy._mp_size == 0: | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is no longer true I believe. Are you sure fast_tp_new()
is working correctly too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I know, these aren't an issue.
In old GMP version, every mpz object owns an allocated pointer.
As such, the SageMath code for fast_tp_new
, when it have to create a new Integer
object, must allocate a new pointer with new_mpz._mp_d = <mp_ptr>check_malloc(GMP_LIMB_BITS >> 3)
. So all's good.
In newer GMP version, newly allocated mpz object no longer need to hold an integer. So the old code breaks as follows:
- The dummy integer does not hold any pointer, thus has alloc = 0.
- When fast_tp_new creates a new integer by memcpy the dummy integer, the new object still has
alloc = 0
, yet the_mp_d
member holds an allocated pointer. - Subsequent operation on the new integer, such as the
mpz_set_pylong
, thought thatalloc = 0
, so it allocates a new pointer and override what we allocated just now. The leak happens here.
With this pull request, before step 3, there's nothing allocated, so there's no leak. Furthermore, the mpz
object obtained by memcpy
is already valid, so there's no issue with invalid mpz
object representation as in #33081 .
A possible disadvantage of this pull request is that it will not support GMP version < 6.2, but I think that is already sufficiently old. Specifically 6.2.0 was released in 2020-01-17. https://web.archive.org/web/20201101053000/https://gmplib.org/#STATUS
Or do you think maintaining compatibility with versions < 6.2 is important? In that case we could use gmp version macros—that said, currently the CI etc. doesn't test this version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spkg-configure.m4 for gmp already checks for >= 6.2.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and the meson build only looks for a pkg-config file, introduced in v6.2.0
|
How does the fix relate to the stated problem of Ctrl-C taking a long time? (It still takes a long time for me when |
I explained in the first message. This by itself won't fix the issue, but once similar changes are made throughout the codebase so that no Removing the check in the destructor right now may lead to inconsistent state, as explained in the previous issues. |
Problem
Run the following code
it will take a while. Press Ctrl-C before it finishes. Then it will hang for a while before printing
KeyboardInterrupt
.Meanwhile, the similar-looking code
does not suffer from the problem.
Reason
There are several places in SageMath code that does something like the following
If the user press ctrl-C, the destructor for
x
will be called, but becausempz_powm
is interrupted,mpz_powm
may be in an inconsistent state, so the destructor might do the wrong thing.The band-aid fix
The band-aid is described in #24986 : a function
sig_occurred()
is implemented in cysignals that checks whether a signal thrown bysig_on()
is currently being processed, and the destructor ofInteger
checks:The issue
There are two problems with this.
This leaks integers indiscriminately. In particular, in the example above, all the 100
Integer
objects ina
are leaked, even though only one is being mutated, being a local variable in the powering method.At any point in the code, we know exactly which object is currently being mutated, as such this indiscriminate leak is unnecessary.
sig_occurred()
is slow: in the example above, it runs the garbage collector each time it's called. So, the reason for the hang is that after the user presses Ctrl-C, Python needs to run the garbage collector 100 times before reportingKeyboardInterrupt
.The actual fix
I look at the example reported in #24986 —
coerce_actions.pyx
. The problematic doctest isby some analysis, I determine that the responsible function is
_pow_long
.As such, this pull request makes sure to not call the destructor when the user interrupts.
After this pull request, we can remove the
sig_occurred()
workaround introduced in #24986 .Other reported issues
Strange slowdown on
str()
after handled FLINT exception #39224Nested signal handling leads to excessive slowdown of
sig_occurred()
cysignals#215 (comment)Throttle garbage collector frequency in sig_occurred cysignals#227
📝 Checklist
⌛ Dependencies