-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-128679: fix race condition in tracemalloc #128695
base: main
Are you sure you want to change the base?
Conversation
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, tracemalloc_config.tracing
isn't thread-safe even with the GIL, because it's read while the GIL is released. It's probably even worse on free-threading. (I think there's an issue somewhere about tracemalloc being non-thread-safe for FT.)
So, we need to either be more strict with the rules on when you can access it, or use atomic reads and writes. Something like _Py_atomic_load_int_relaxed
should do.
/* stop tracing Python memory allocations */ | ||
/* stop tracing Python memory allocations, | ||
but not while something might be in the middle of an operation */ | ||
TABLES_LOCK(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to lock writes, we need to lock reads as well, but see my other comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading without lock like the first check in PyTraceMalloc_Track()
can be done for an early out without problems in that particular case (though this absolutely necessitates the second check after GIL acquire). Atomic operations should be used in these cases though where they might be outside GIL protection, that should be safe then if relying on GIL. For free-threading safety more changes would be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't really mix and match atomics. If something relies on a lock for thread-safety, then you must hold that lock in order to read or write to it. It's not safe to rely on the GIL for some reads, and then rely on a different lock for others. It might be OK in some cases with the GIL because of implicit lock-ordering, but that can break very easily.
In this specific case, this would race with PyTraceMalloc_Track
if it was called without a thread state (i.e., the GIL), because that doesn't hold the tables lock when reading, so there's no synchronization between the two operations. It's only slightly OK because the GIL will serialize it for you in almost every case--there are very few people that are using PyTraceMalloc_Track
while their thread state is detached. But that's besides the point, we should go for a more robust fix here rather than relying on lock-ordering with the GIL. (We'll also kill two birds with one stone--it will fix it for both the default build and for free-threading.)
I would either go with atomic reads and writes, or a compare-exchange if there ends up being some problems with inconsistent state.
tracemalloc_alloc()
to not add a trace if tracing was turned off while the GIL was being acquired.PyTraceMalloc_Track()
for tracing after acquiring GIL in case tracing was turned off while GIL was being acquired.tracemalloc_config.tracing = 0
in_PyTraceMalloc_Stop()
usingtable_lock
to avoid turning it off in the middle of critical operations.