Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-128679: fix race condition in tracemalloc #128695

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tom-pytel
Copy link

@tom-pytel tom-pytel commented Jan 9, 2025

  • Add emergency fallback in tracemalloc_alloc() to not add a trace if tracing was turned off while the GIL was being acquired.
  • Add secondary check in PyTraceMalloc_Track() for tracing after acquiring GIL in case tracing was turned off while GIL was being acquired.
  • Protect tracemalloc_config.tracing = 0 in _PyTraceMalloc_Stop() using table_lock to avoid turning it off in the middle of critical operations.

@bedevere-app
Copy link

bedevere-app bot commented Jan 9, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, tracemalloc_config.tracing isn't thread-safe even with the GIL, because it's read while the GIL is released. It's probably even worse on free-threading. (I think there's an issue somewhere about tracemalloc being non-thread-safe for FT.)

So, we need to either be more strict with the rules on when you can access it, or use atomic reads and writes. Something like _Py_atomic_load_int_relaxed should do.

/* stop tracing Python memory allocations */
/* stop tracing Python memory allocations,
but not while something might be in the middle of an operation */
TABLES_LOCK();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to lock writes, we need to lock reads as well, but see my other comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading without lock like the first check in PyTraceMalloc_Track() can be done for an early out without problems in that particular case (though this absolutely necessitates the second check after GIL acquire). Atomic operations should be used in these cases though where they might be outside GIL protection, that should be safe then if relying on GIL. For free-threading safety more changes would be needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't really mix and match atomics. If something relies on a lock for thread-safety, then you must hold that lock in order to read or write to it. It's not safe to rely on the GIL for some reads, and then rely on a different lock for others. It might be OK in some cases with the GIL because of implicit lock-ordering, but that can break very easily.

In this specific case, this would race with PyTraceMalloc_Track if it was called without a thread state (i.e., the GIL), because that doesn't hold the tables lock when reading, so there's no synchronization between the two operations. It's only slightly OK because the GIL will serialize it for you in almost every case--there are very few people that are using PyTraceMalloc_Track while their thread state is detached. But that's besides the point, we should go for a more robust fix here rather than relying on lock-ordering with the GIL. (We'll also kill two birds with one stone--it will fix it for both the default build and for free-threading.)

I would either go with atomic reads and writes, or a compare-exchange if there ends up being some problems with inconsistent state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants