gh-115999: Make list, tuple and range iteration more thread-safe. #128637

Yhg1s · 2025-01-08T14:56:42Z

Make tuple and range iteration more thread-safe, and actually test concurrent iteration of tuple, range and list. (This is prep work for enabling specialization of FOR_ITER in free-threaded builds.) The basic premise is:

Iterating over a shared iterable (list, tuple or range) should be safe, not involve data races, and behave like iteration normally does.
Using a shared iterator should not crash or involve data races, and should only produce items regular iteration would produce. It is not guaranteed to produce all items, or produce each item only once.

Providing stronger guarantees is possible for some of these iterators, but it's not always straight-forward and can significantly hamper the common case. Since iterators in general aren't shared between threads, and it's simply impossible to concurrently use many iterators (like generators), better to make sharing iterators without explicit synchronization clearly wrong.

Specific issues fixed in order to make the tests pass:

List iteration could occasionally fail an assertion when a shared list was shrunk and an item past the new end was retrieved concurrently.
Tuple iteration could occasionally crash when the iterator's reference to the tuple was cleared on exhaustion. Like with list iteration, in free-threaded builds we can't safely and efficiently clear the iterator's reference to the iterable (doing it safely would mean extra, slow refcount operations), so just keep the iterable reference around.
Fast range iterators (for integers that fit in C longs) shared between threads would sometimes produce values past the end of the range, because the iterators use two pieces of state that we can't efficiently update atomically. Rewriting the iterators to have a single piece of state is possible, but probably means more math for each iteration and may not be worth it.
Long range iterators (for other numbers) shared between threads would crash catastrophically in a variety of ways. This now uses a critical section. Rewriting this to be more efficient is probably possible, but since it deals with arbitrary Python objects it's difficult to get right.

Issue: Make the specializing interpreter thread-safe in --disable-gil builds #115999

concurrent iteration. (This is prep work for enabling specialization of FOR_ITER in free-threaded builds.) The basic premise is: - Iterating over a shared _iterable_ (list, tuple or range) should be safe, not involve data races, and behave like iteration normally does. - Using a shared _iterator_ should not crash or involve data races, and should only produce items regular iteration would produce. It is _not_ guaranteed to produce all items, or produce each item only once. Providing stronger guarantees is possible for some of these iterators, but it's not always straight-forward and can significantly hamper the common case. Since iterators in general aren't shared between threads, and it's simply impossible to concurrently use many iterators (like generators), better to make sharing iterators without explicit synchronization clearly wrong. Specific issues fixed in order to make the tests pass: - List iteration could occasionally crash when a shared list wasn't already marked as shared when reallocated. - Tuple iteration could occasionally crash when the iterator's reference to the tuple was cleared on exhaustion. Like with list iteration, in free-threaded builds we can't safely and efficiently clear the iterator's reference to the iterable (doing it safely would mean extra, slow refcount operations), so just keep the iterable reference around. - Fast range iterators (for integers that fit in C longs) shared between threads would sometimes produce values past the end of the range, because the iterators use two bits of state that we can't efficiently update atomically. Rewriting the iterators to have a single bit of state is possible, but probably means more math for each iteration and may not be worth it. - Long range iterators (for other numbers) shared between threads would crash catastrophically in a variety of ways. This now uses a critical section. Rewriting this to be more efficient is probably possible, but since it deals with arbitrary Python objects it's difficult to get right. There seem to be no more exising races in list_get_item_ref, so drop it from the tsan suppression list.

corona10 · 2025-01-08T15:09:02Z

Objects/rangeobject.c

-    Py_INCREF(r->len);
-    return r->len;
+    PyObject *len;
+    Py_BEGIN_CRITICAL_SECTION(r);


Out of curiosity, using atomic operation is not enough for this operation?
(Maybe with _Py_TryIncref?)

It's not enough to make this an atomic read without also making the writes in the other critical sections atomic (in addition to using TryIncrefCompare here); it's still a data race to write non-atomically even if all reads are atomic. I didn't really want to do all that work just to avoid a critical section in this function, when this isn't a particularly common or performance-sensitive function, and an uncontested critical section is probably just as fast anyway.

actually correct and the real problem was an incorrect assert. The fast path still contains notionally unsafe uses of memcpy/memmove, so add list_get_item_ref back to the TSan suppressions file.

iterators, and fix build failures because labels can't technically be at the end of compound statements (a statement must follow the label, even if it's empty).

…lloc

mpage · 2025-01-09T16:53:59Z

Lib/test/test_free_threading/test_iteration.py

+        """Test iteration over a shared container"""
+        seq = self.make_testdata(NUMITEMS)
+        results = []
+        start = threading.Event()


If you are trying to synchronize when the workers start running I think threading.Barrier more closely matches that behavior.

Same comment applies below.

colesbury · 2025-01-09T17:13:09Z

Objects/listobject.c

@@ -937,7 +937,7 @@ list_ass_slice_lock_held(PyListObject *a, Py_ssize_t ilow, Py_ssize_t ihigh, PyO
    }
    for (k = 0; k < n; k++, ilow++) {
        PyObject *w = vitem[k];
-        item[ilow] = Py_XNewRef(w);
+        FT_ATOMIC_STORE_PTR_RELAXED(item[ilow], Py_XNewRef(w));


I think this should be FT_ATOMIC_STORE_PTR_RELEASE. We generally need release when writing pointers that may be load and dereferenced outside of the lock to ensure that the previously written contents are visible before the pointer itself.

colesbury · 2025-01-09T17:21:37Z

Objects/rangeobject.c

+fail:
+    ; // A statement must follow the label before Py_END_CRITICAL_SECTION.
+    Py_END_CRITICAL_SECTION();
+    return result;
 }

 static PyObject *
 longrangeiter_setstate(longrangeiterobject *r, PyObject *state)


If we need to wrap functions in critical sections, I think we should generally convert them to argument clinic, if possible, and use the @critical_section annotation to keep the boilerplate mostly in a generated file.

That's not yet possible for longrangeiter_next (because it's a tp_iternext, not a PyMethodDef entry), but it's doable for the other functions.

colesbury · 2025-01-09T17:22:40Z

Objects/rangeobject.c

+#ifdef Py_GIL_DISABLED
+    it->stop = stop;
+#endif


I'm not enthusiastic about the idea of growing the iterator size. Can we unconditionally replace len with stop?

Yhg1s requested review from mpage and colesbury January 8, 2025 14:56

Yhg1s marked this pull request as ready for review January 8, 2025 14:56

bedevere-app bot mentioned this pull request Jan 8, 2025

Make the specializing interpreter thread-safe in --disable-gil builds #115999

Open

bedevere-app bot added the awaiting core review label Jan 8, 2025

Fix lint issues.

a069f9b

Yhg1s added the skip news label Jan 8, 2025

corona10 reviewed Jan 8, 2025

View reviewed changes

Yhg1s added 4 commits January 9, 2025 16:10

Restore the original guard for list_get_item_ref's fast path, since it's

661dc7b

actually correct and the real problem was an incorrect assert. The fast path still contains notionally unsafe uses of memcpy/memmove, so add list_get_item_ref back to the TSan suppressions file.

Merge branch 'main' into list-realloc

7263d1e

Fix test failures in test_sys because of the changed size of range

484c6b2

iterators, and fix build failures because labels can't technically be at the end of compound statements (a statement must follow the label, even if it's empty).

Merge branch 'list-realloc' of github.com:Yhg1s/cpython into list-rea…

89c4629

…lloc

mpage reviewed Jan 9, 2025

View reviewed changes

colesbury reviewed Jan 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-115999: Make list, tuple and range iteration more thread-safe. #128637

gh-115999: Make list, tuple and range iteration more thread-safe. #128637

Yhg1s commented Jan 8, 2025 •

edited

Loading

corona10 Jan 8, 2025 •

edited

Loading

Yhg1s Jan 9, 2025

mpage Jan 9, 2025

colesbury Jan 9, 2025

colesbury Jan 9, 2025

colesbury Jan 9, 2025

gh-115999: Make list, tuple and range iteration more thread-safe. #128637

Are you sure you want to change the base?

gh-115999: Make list, tuple and range iteration more thread-safe. #128637

Conversation

Yhg1s commented Jan 8, 2025 • edited Loading

corona10 Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Yhg1s Jan 9, 2025

Choose a reason for hiding this comment

mpage Jan 9, 2025

Choose a reason for hiding this comment

colesbury Jan 9, 2025

Choose a reason for hiding this comment

colesbury Jan 9, 2025

Choose a reason for hiding this comment

colesbury Jan 9, 2025

Choose a reason for hiding this comment

Yhg1s commented Jan 8, 2025 •

edited

Loading

corona10 Jan 8, 2025 •

edited

Loading