Skip to content

Debug build assertion failure with native threads attempting to acquire GIL on termination #131012

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
arielb1 opened this issue Mar 9, 2025 · 16 comments
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) pending The issue will be closed if no feedback is provided type-bug An unexpected behavior, bug, or error

Comments

@arielb1
Copy link

arielb1 commented Mar 9, 2025

Bug description:

I'm writing PyO3/pyo3#4874, which tries to avoid Rust crashing on Python interpreter termination when there are native threads attempting to acquire the GIL.

To test it, I created a test that constantly hammers the GIL on a daemon thread, and on debug builds, I get this assertion failure fairly reliably on Python 3.13:

Python/pystate.c:345: void unbind_gilstate_tstate(PyThreadState *): Assertion `tstate == tstate_tss_get(&(tstate->interp->runtime)->autoTSSkey)' failed

It looks like zapthreads is attempting to zap a native thread that does not currently hold the GIL.

Would it help if I'll reproduce this in a C example?

CPython versions tested on:

3.13

Operating systems tested on:

Linux

@arielb1 arielb1 added the type-bug An unexpected behavior, bug, or error label Mar 9, 2025
@arielb1
Copy link
Author

arielb1 commented Mar 9, 2025

So I think the GIL-hammering thread is adding itself to the list of threads after the call to _PyThreadState_RemoveExcept. I'll stand up an Ubuntu with rr to confirm.

[the test is not really simulating a production workload I have, it's just hammering the GIL to make sure that Python + Rust is stable in that case].

@eendebakpt
Copy link
Contributor

@arielb1 A minimal example reproducing the issue (in python or C) would be helpful. And would it be possible for you to test on the current main branch?

@eendebakpt eendebakpt added the 3.13 bugs and security fixes label Mar 9, 2025
@picnixz picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Mar 9, 2025
@ZeroIntensity
Copy link
Member

How are you hammering the GIL? If it's via PyGILState_Ensure, then that's a known issue, but there's nothing we can do to fix it. There's a new API being discussed for safely acquiring the GIL (or really, attaching a thread state) at finalization.

@arielb1
Copy link
Author

arielb1 commented Mar 9, 2025 via email

@ZeroIntensity
Copy link
Member

The Rust code has an exception handler that turns the pthread_exit into a hang, but the tstate issue is separate

There wouldn't be any hang or exit (that would happen at somewhere like a Py_END_ALLOW_THREADS when a thread state tries to get attached). The problem is that PyGILState_Ensure has no handling at all for when Python is shutting down; it just crashes, like you're seeing here.

@ZeroIntensity ZeroIntensity added the pending The issue will be closed if no feedback is provided label Mar 9, 2025
@arielb1
Copy link
Author

arielb1 commented Mar 9, 2025 via email

@ZeroIntensity
Copy link
Member

ZeroIntensity commented Mar 9, 2025

It's probably because PyGILState_Ensure is setting some value to tell the interpreter that it exists (but sending your code would be helpful), then the thread gets hung, and then the main thread crashes while trying to destroy it because it's only partially initialized.

Bottom line is that you can't call PyGILState_Ensure while finalizing, in any way shape or form :(

@arielb1
Copy link
Author

arielb1 commented Mar 9, 2025 via email

@ZeroIntensity
Copy link
Member

Yeah, and it will probably be fixed with the new API. But PyGILState_Ensure has several other issues anyway. I'm in favor of completely deprecating it once we get something better in place.

@arielb1
Copy link
Author

arielb1 commented Mar 9, 2025

But existing extensions will keep using it for a long time, I personally believe it should try to be safe, and safely hanging should be safe here.

If it's not safe, then every Python program using native code in a daemon thread has a chance of crashing on exit.

@arielb1
Copy link
Author

arielb1 commented Mar 9, 2025

So I think (but have no confirmation) that at this code section

cpython/Python/pystate.c

Lines 1593 to 1605 in 98fa4a4

/* We serialize concurrent creation to protect global state. */
HEAD_LOCK(interp->runtime);
// Initialize the new thread state.
interp->threads.next_unique_id += 1;
uint64_t id = interp->threads.next_unique_id;
init_threadstate(tstate, interp, id, whence);
// Add the new thread state to the interpreter.
PyThreadState *old_head = interp->threads.head;
add_threadstate(interp, (PyThreadState *)tstate, old_head);
HEAD_UNLOCK(interp->runtime);

It needs to detect that the interpreter is shutting down, release runtime_lock, and do something that triggers an hang/pthread_exit,

Of course, this won't help existing Python versions

@ZeroIntensity
Copy link
Member

You should take a look at #124622 for prior discussion (and problems to solve) here.

@arielb1
Copy link
Author

arielb1 commented Mar 9, 2025

I mean, API-wise, we should have an PyGILState_EnsureOrFail, and we should define PyGILState_Ensure in terms of it as following:

PyGILState_STATE PyGILState_Ensure() {
    PyGILState_EXTENDED_STATE state = PyGILState_EnsureOrFail();
    if (state == PyGILState_FAILED) {
        hang_or_exit_thread();
    }
    return (PyGILState_STATE)state;
}

I don't think anyone disagrees about this?

There's the problem of where should Python stop allowing thread states to be created you talked about in the other issue.

@ZeroIntensity
Copy link
Member

Yes, that's essentially what the PR does.

@arielb1
Copy link
Author

arielb1 commented Mar 9, 2025

So the crash issue is #124619

@ZeroIntensity
Copy link
Member

Looks like it. I'm going to close this for now. Feel free to send your code (in C, I'm no good with Rust) and I'll reopen it if I determine that this isn't PyGILState_Ensure shenanigans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) pending The issue will be closed if no feedback is provided type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants