[3.13] gh-128679: Fix tracemalloc.stop() race conditions#128897
[3.13] gh-128679: Fix tracemalloc.stop() race conditions#128897vstinner merged 6 commits intopython:3.13from
Conversation
tracemalloc_alloc(), tracemalloc_realloc(), PyTraceMalloc_Track(), PyTraceMalloc_Untrack() and _PyTraceMalloc_TraceRef() now check tracemalloc_config.tracing after calling TABLES_LOCK(). _PyTraceMalloc_Stop() now protects more code with TABLES_LOCK(), especially setting tracemalloc_config.tracing to 1. Add a test using PyTraceMalloc_Track() to test tracemalloc.stop() race condition.
a92f15b to
545053f
Compare
|
@ZeroIntensity: Here is a simplified change for the 3.12 and 3.13 branch. Would you mind to review it? |
ZeroIntensity
left a comment
There was a problem hiding this comment.
There's a couple other (unmodified) functions where I'm seeing lockless reads:
tracemalloc_get_traceback_PyMem_DumpTraceback_PyTraceMalloc_GetTraces_PyTraceMalloc_GetTracedMemory_PyTraceMalloc_ResetPeak
| @@ -1376,16 +1395,21 @@ _PyTraceMalloc_TraceRef(PyObject *op, PyRefTracerEvent event, void* Py_UNUSED(ig | |||
| int res = -1; | |||
|
|
|||
| TABLES_LOCK(); | |||
There was a problem hiding this comment.
We also read tracing without a lock before this.
There was a problem hiding this comment.
Checking tracing without the lock is a workaround for the Python finalization. TABLES_LOCK() can no longer be called after _PyTraceMalloc_Fini() (the lock is destroyed), whereas _PyTraceMalloc_TraceRef() is still called after _PyTraceMalloc_Fini().
tracing is checked again once we acquired the lock.
The issue was fixed properly in the main branch by calling (void)PyRefTracer_SetTracer(NULL, NULL); in _PyTraceMalloc_Stop(). Since this change is designed to be backported to Python 3.12, I didn't include this fix.
Once this change is merged in 3.13 and 3.12, we can remove the unprotected tracing read in 3.13 in _PyTraceMalloc_TraceRef() by adding (void)PyRefTracer_SetTracer(NULL, NULL);.
|
Ok, I completed my PR to cover most (almost all?) functions, instead of focusing on alloc/realloc/free and Track/Untrack functions. @ZeroIntensity: Please review the updated PR. |
Fixing properly In the main branch, I redesigned the code to be able to hold the lock for all tables operations. For that, I had to modify the behavior:
I'm open to consider fixing |
|
It looks like tests somehow were broken. |
So TABLES_LOCK() can be called in _PyMem_DumpTraceback() even if tracemalloc is not tracing.
Fixed. Well... Fixing more functions require to backport more changes. I backported the |
ZeroIntensity
left a comment
There was a problem hiding this comment.
LGTM. With the exception of a few special cases that you've mentioned, everything is properly locked! Due to the changes to the lifecycle and the size of the change, I think we should run buildbots before merging.
|
🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit 54c8f9e 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
|
|
I mark this PR as a draft because of the FreeBSD crash. Oh, I can also reproduce the |
|
So far, I cannot reproduce the crash on Linux. |
|
I can trigger a crash if I insert a delay in diff --git a/Objects/obmalloc.c b/Objects/obmalloc.c
index b103deb01ca..c20eb47e8a8 100644
--- a/Objects/obmalloc.c
+++ b/Objects/obmalloc.c
@@ -859,7 +859,14 @@ set_allocator_unlocked(PyMemAllocatorDomain domain, PyMemAllocatorEx *allocator)
{
switch(domain)
{
- case PYMEM_DOMAIN_RAW: _PyMem_Raw = *allocator; break;
+ case PYMEM_DOMAIN_RAW:
+ _PyMem_Raw.ctx = allocator->ctx;
+ (void)sched_yield();
+ _PyMem_Raw.malloc = allocator->malloc;
+ _PyMem_Raw.calloc = allocator->calloc;
+ _PyMem_Raw.realloc = allocator->realloc;
+ _PyMem_Raw.free = allocator->free;
+ break;
case PYMEM_DOMAIN_MEM: _PyMem = *allocator; break;
case PYMEM_DOMAIN_OBJ: _PyObject = *allocator; break;
/* ignore unknown domain */The write is protected by a lock (write), but |
|
Only debug build are affected, not release build. I modified the test in the main branch to skip it on debug build: #128988 |
…ython#128988) There is a race condition between PyMem_SetAllocator() and PyMem_RawMalloc()/PyMem_RawFree(). While PyMem_SetAllocator() write is protected by a lock, PyMem_RawMalloc()/PyMem_RawFree() reads are not protected by a lock. PyMem_RawMalloc()/PyMem_RawFree() can be called with an old context and the new function pointer. On a release build, it's not an issue since the context is not used. On a debug build, the debug hooks use the context and so can crash. (cherry picked from commit 9bc1964)
|
Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12. |
|
Sorry, @vstinner, I could not cleanly backport this to |
|
GH-129022 is a backport of this pull request to the 3.12 branch. |
…129022) [3.13] gh-128679: Fix tracemalloc.stop() race conditions (#128897) tracemalloc_alloc(), tracemalloc_realloc(), PyTraceMalloc_Track(), PyTraceMalloc_Untrack() and _PyTraceMalloc_TraceRef() now check tracemalloc_config.tracing after calling TABLES_LOCK(). _PyTraceMalloc_Stop() now protects more code with TABLES_LOCK(), especially setting tracemalloc_config.tracing to 1. Add a test using PyTraceMalloc_Track() to test tracemalloc.stop() race condition. Call _PyTraceMalloc_Init() at Python startup. (cherry picked from commit 6b47499)
tracemalloc_alloc(), tracemalloc_realloc(), PyTraceMalloc_Track(), PyTraceMalloc_Untrack() and _PyTraceMalloc_TraceRef() now check tracemalloc_config.tracing after calling TABLES_LOCK().
_PyTraceMalloc_Stop() now protects more code with TABLES_LOCK(), especially setting tracemalloc_config.tracing to 1.
Add a test using PyTraceMalloc_Track() to test tracemalloc.stop() race condition.