Skip to content

Race in concurrent iteration over range iterators #129068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Yhg1s opened this issue Jan 20, 2025 · 1 comment
Open

Race in concurrent iteration over range iterators #129068

Yhg1s opened this issue Jan 20, 2025 · 1 comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@Yhg1s
Copy link
Member

Yhg1s commented Jan 20, 2025

Bug report

Bug description:

Concurrent iteration over range iterators (not range objects themselves) is not thread-safe, because range iterators are currently implemented as two pieces of state (the next value and the number of values left) that aren't updated atomically. ThreadSanitizer correctly identifies these races as well. See tests in PR #128637, and an early attempt at fixing the issues in 1533d1d.

This also affects bytecode specialisation (FOR_ITER_RANGE), although in PR #128798 we're only specialising if the iterator is uniquely referenced (and so we don't care about other threads).

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

@Yhg1s Yhg1s added topic-free-threading type-bug An unexpected behavior, bug, or error labels Jan 20, 2025
@picnixz picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jan 31, 2025
@colesbury
Copy link
Contributor

I wonder if we should apply the same technique to fast_range_iter that we do in the bytecode, where we don't need to care about other threads if _PyObject_IsUniquelyReferenced(iter) is true. We could fall back to a locking code path if that's not the case. (Apologies, if this is something you've already suggested previously).

I'd want to wait for Matt's LOAD_FAST optimizations to land first, because otherwise I think we'll slow down cases like:

range_it = range(N):
for v in range_it:
  ...

That pattern exists a lot in the pyperformance benchmarks and probably in some real world code too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants