gh-115999: Add free-threaded specialization for FOR_ITER #128798

Yhg1s · 2025-01-13T19:18:35Z

Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)

Issue: Make the specializing interpreter thread-safe in --disable-gil builds #115999

thread-safe as without spcialization (i.e. not much to none at all).

Yhg1s · 2025-01-13T19:20:39Z

I still need to add tests for the specialization (rather than the thread-safety, which are in #128637), but otherwise the PR is good enough to review I think.

mpage · 2025-01-14T00:34:39Z

Why not use the approach that you suggested in discord where we only specialize for the case where the iterator is uniquely referenced by the current thread? It seems like that should cover the overwhelmingly common case, would be simpler to implement, and the resulting instructions would be faster.

Yhg1s · 2025-01-14T00:53:33Z

I realised that approach wouldn't work for the list referenced by the list iterator (which is the bulk of the work for list iteration) because we're not doing any refcount operations on it (on purpose) and _PyObject_IsUniquelyReferenced() will only do the right thing if all threads hold strong references... and then I think I forgot we should still be able to use that approach for the iterators themselves. I'll see if I can poke holes into that idea tomorrow.

Objects/tupleobject.c

Python/bytecodes.c

…ly to uniquely referenced iterators. This handles the common case of 'for item in seq' (where 'seq' is a list, tuple or range object) and 'for item in generator_function()', but not, for example, 'g = gen(...); for item in g:'.

Yhg1s · 2025-01-21T15:09:03Z

I've added a test that exercises the deopt paths, although for it to pass with ThreadSanitizer relies on #128637 going in first. The test is a little wacky because for list/range/tuple iterators it's not easy to leak the uniquely referenced iterators. (For generators it could just use a weakref.) I wrote the test to make sure the deopt paths were actually safe, but I'm not sure if long-term those tests make sense. We could just keep them around while we fiddle with specialization, but drop them once we're sure the iterators can't leak (or we change the entire approach to iterator specialization.)

FOR_ITER_LIST specialization.

mpage · 2025-01-23T19:13:39Z

Python/bytecodes.c

+// For free-threaded Python, the loop exit can happen at any point during item
+// retrieval, so separate ops don't make much sense.


It doesn't seem like this comment applies to the tuple specialization. Can you delete it if not?

mpage · 2025-01-23T19:24:57Z

Python/bytecodes.c

+// For free-threaded Python, the loop exit can happen at any point during item
+// retrieval, so separate ops don't make much sense.


Can you expand a little bit on why this doesn't make sense? It'd be nice to keep the structure of the ops the same between the two builds.

Python/bytecodes.c

Include/internal/pycore_list.h

Tools/cases_generator/analyzer.py

Python/bytecodes.c

Python/specialize.c

Lib/test/test_free_threading/test_iteration_deopt.py

markshannon · 2025-01-28T11:38:22Z

Include/internal/pycore_opcode_metadata.h

@@ -2055,7 +2055,7 @@ const struct opcode_metadata _PyOpcode_opcode_metadata[266] = {
    [FORMAT_WITH_SPEC] = { true, INSTR_FMT_IX, HAS_ERROR_FLAG | HAS_ESCAPES_FLAG },
    [FOR_ITER] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_ERROR_FLAG | HAS_ERROR_NO_POP_FLAG | HAS_ESCAPES_FLAG },
    [FOR_ITER_GEN] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_DEOPT_FLAG },
-    [FOR_ITER_LIST] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_EXIT_FLAG },
+    [FOR_ITER_LIST] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_DEOPT_FLAG | HAS_EXIT_FLAG | HAS_ESCAPES_FLAG },


Why does this need the HAS_ESCAPES_FLAG? How does it escape?

The mechanism for fetching an item from a shared list includes calling _Py_TryIncrefCompare, which has a path which DECREFs.

Which path? I'm confused as why an incref would need to decref.

When the reference is removed from the array that backs the list between the time it's retrieved and it's returned.

cpython/Include/internal/pycore_object.h

Lines 595 to 610 in 180ee43

/* Tries to incref the object op and ensures that *src still points to it. */

static inline int

_Py_TryIncrefCompare(PyObject **src, PyObject *op)

{

if (_Py_TryIncrefFast(op)) {

return 1;

}

if (!_Py_TryIncRefShared(op)) {

return 0;

}

if (op != _Py_atomic_load_ptr(src)) {

Py_DECREF(op);

return 0;

}

return 1;

}

Added in #114512.

I don't see how that works. What happens if *src is modified after op != _Py_atomic_load_ptr(src) but before the function returns?

Regardless, we want to keep FOR_ITER_LIST non-escaping in the default build.

@colesbury do you remember why lists use _Py_TryXGetRef (in list_get_item_ref) or why it matters that the src ptr still refers to the same item? Can the object be invalid in some way that still requires us to DECREF it?

There are two hazards that _Py_TryXGetRef handles:

The object op may be deallocated between the initial load and the incref. The try-incref, in cooperation with the allocator, handles this case. It returns zero if op has zero refcount.

The memory block for op may be deallocated and reallocated for a different object in between the initial load and incref. In that case the try-incref succeeds, but the subsequent check fails and we need to decref op.

It does not matter if *src is modified after op != _Py_atomic_load_ptr(src). op will still be a valid reference to an object that was the next element in the list at some point during the operation. It's always been possible for the list to be concurrently modified between the execution of FOR_ITER and subsequent code that uses the result.

See https://peps.python.org/pep-0703/#optimistically-avoiding-locking

Regardless, we want to keep FOR_ITER_LIST non-escaping in the default build

Is there any performance reason to do so?

Regardless, we want to keep FOR_ITER_LIST non-escaping in the default build

Is there any performance reason to do so?

Just to follow up: this does make JIT code slightly worse for traces covering for loops over lists. Since the _ITER_NEXT_LIST at the top of the loop is now escaping, this will add a new _CHECK_VALIDITY uop to the start of each loop body where none exists currently.

Consider this loop:

def f(): x = 0 for i in list(range(10_000)): x += i

This change increases the number of validity checks per loop from 2 to 3. Before:

_MAKE_WARM _SET_IP _CHECK_PERIODIC _CHECK_VALIDITY <------------- OLD _ITER_CHECK_LIST _GUARD_NOT_EXHAUSTED_LIST _ITER_NEXT_LIST _SET_IP _STORE_FAST_1 _CHECK_VALIDITY <------------- OLD _LOAD_FAST_0 _LOAD_FAST_1 _GUARD_BOTH_INT _BINARY_OP_ADD_INT _SET_IP _STORE_FAST_0 _JUMP_TO_TOP

After:

_MAKE_WARM _SET_IP _CHECK_PERIODIC _CHECK_VALIDITY <------------- OLD _ITER_CHECK_LIST _GUARD_NOT_EXHAUSTED_LIST _ITER_NEXT_LIST _CHECK_VALIDITY_AND_SET_IP <-- NEW _STORE_FAST_1 _CHECK_VALIDITY <------------- OLD _LOAD_FAST_0 _LOAD_FAST_1 _GUARD_BOTH_INT _BINARY_OP_ADD_INT _SET_IP _STORE_FAST_0 _JUMP_TO_TOP

Just to clarify:
FOR_ITER_LIST is escaping because _PyList_GetItemRefNoLock is escaping, and _PyList_GetItemRefNoLock is escaping because _Py_TryIncrefCompareStackRef is escaping.

Either we need to find an alternative approach or make _Py_TryIncrefCompareStackRef non-escaping.

Yhg1s · 2025-03-06T15:17:20Z

@markshannon This is the FOR_ITER specialization you wanted to take another look at.

markshannon

It's a shame about FOR_ITER_LIST becoming escaping.

We need to make _Py_TryIncrefCompareStackRef non-escaping, but not in this PR.
Otherwise we'll end up with everything escaping, as more code uses _Py_TryIncrefCompareStackRef.

Python/bytecodes.c

Python/specialize.c

markshannon · 2025-03-07T11:17:16Z

Include/internal/pycore_opcode_metadata.h

@@ -2055,7 +2055,7 @@ const struct opcode_metadata _PyOpcode_opcode_metadata[266] = {
    [FORMAT_WITH_SPEC] = { true, INSTR_FMT_IX, HAS_ERROR_FLAG | HAS_ESCAPES_FLAG },
    [FOR_ITER] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_ERROR_FLAG | HAS_ERROR_NO_POP_FLAG | HAS_ESCAPES_FLAG },
    [FOR_ITER_GEN] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_DEOPT_FLAG },
-    [FOR_ITER_LIST] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_EXIT_FLAG },
+    [FOR_ITER_LIST] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_DEOPT_FLAG | HAS_EXIT_FLAG | HAS_ESCAPES_FLAG },


Just to clarify:
FOR_ITER_LIST is escaping because _PyList_GetItemRefNoLock is escaping, and _PyList_GetItemRefNoLock is escaping because _Py_TryIncrefCompareStackRef is escaping.

Either we need to find an alternative approach or make _Py_TryIncrefCompareStackRef non-escaping.

Yhg1s · 2025-03-08T01:33:22Z

We need to make _Py_TryIncrefCompareStackRef non-escaping, but not in this PR. Otherwise we'll end up with everything escaping, as more code uses _Py_TryIncrefCompareStackRef.

I think we discussed this in the weekly meeting a month or two ago... I think the only way to avoid it is to delay DECREF'ing objects we've accidentally incorrectly INCREF'ed -- that is to say, objects that were newly allocated from the same memory as the object we were trying to INCREF but we lost the INCREF/DECREF race with another thread on. I think in this case that might not be too bad: if the INCREF fails we deopt anyway, and we could delay that DECREF until that deopt. For other cases where we need to TryIncref, I'm not sure of the impact of delaying (plus, we'd need some mechanism for the delay).

Yhg1s · 2025-03-10T23:53:48Z

I think the only way to avoid it is to delay DECREF'ing objects we've accidentally incorrectly INCREF'ed -- that is to say, objects that were newly allocated from the same memory as the object we were trying to INCREF but we lost the INCREF/DECREF race with another thread on. I think in this case that might not be too bad: if the INCREF fails we deopt anyway, and we could delay that DECREF until that deopt.

@markshannon let me know if you want that to happen in this PR (but I'd prefer it not).

markshannon · 2025-03-11T15:50:13Z

No need to do it in this PR.

…n#128798) Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)

Yhg1s added 2 commits January 13, 2025 01:59

Add free-threaded specialization for lists and tuples.

54c551e

Add specialization for range iterators and generators, both about as

1433cd3

thread-safe as without spcialization (i.e. not much to none at all).

bedevere-app bot mentioned this pull request Jan 13, 2025

Make the specializing interpreter thread-safe in --disable-gil builds #115999

Closed

Yhg1s requested review from mpage and colesbury January 13, 2025 19:19

Yhg1s added 3 commits January 13, 2025 23:32

Add missing ifdef guard.

a662ecf

Fix copy-paste mistake.

2fef94b

Regen cases.

0870ce7

colesbury reviewed Jan 14, 2025

View reviewed changes

Objects/tupleobject.c Outdated Show resolved Hide resolved

Python/bytecodes.c Outdated Show resolved Hide resolved

Python/bytecodes.c Outdated Show resolved Hide resolved

Python/bytecodes.c Outdated Show resolved Hide resolved

This was referenced Jan 20, 2025

Race in concurrent iteration over range iterators #129068

Open

Race in concurrent list mutation and item retrieval #129069

Open

Yhg1s added 2 commits January 21, 2025 13:32

Fix whitespace for linter.

bb495b0

Yhg1s marked this pull request as ready for review January 21, 2025 15:21

Yhg1s requested a review from markshannon as a code owner January 21, 2025 15:21

bedevere-app bot added the awaiting core review label Jan 21, 2025

Yhg1s added 2 commits January 21, 2025 16:31

Incorporate changes from python#128445 into the free-threaded branch of

940b7c9

FOR_ITER_LIST specialization.

Drop redundant assert.

a800d75

mpage requested a review from colesbury January 23, 2025 18:35

mpage reviewed Jan 23, 2025

View reviewed changes

colesbury reviewed Jan 23, 2025

View reviewed changes

Yhg1s added 2 commits January 24, 2025 14:21

Don't mark _PyList_GetItemRefNoLock as non-escaping.

1781cad

Address reviewer comments, and drop test_iteration_deopt.

358199a

markshannon reviewed Jan 28, 2025

View reviewed changes

Yhg1s added 2 commits February 18, 2025 14:05

Merge branch 'main' into for-iter-spec

5c16a5e

Regenerate cases after merge.

07d3033

Yhg1s added 2 commits March 5, 2025 15:31

Merge branch 'main' into for-iter-spec

4326376

Make the free-threaded FOR_ITER work in the tier 2 interpreter/jit.

375399d

Yhg1s added the skip news label Mar 6, 2025

markshannon approved these changes Mar 7, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Mar 7, 2025

Yhg1s merged commit de2f7da into python:main Mar 12, 2025
76 checks passed

bedevere-app bot removed the awaiting merge label Mar 12, 2025

		// For free-threaded Python, the loop exit can happen at any point during item
		// retrieval, so separate ops don't make much sense.

	/* Tries to incref the object op and ensures that src still points to it. /
	static inline int
	_Py_TryIncrefCompare(PyObject *src, PyObject op)
	{
	if (_Py_TryIncrefFast(op)) {
	return 1;
	}
	if (!_Py_TryIncRefShared(op)) {
	return 0;
	}
	if (op != _Py_atomic_load_ptr(src)) {
	Py_DECREF(op);
	return 0;
	}
	return 1;
	}

Uh oh!

gh-115999: Add free-threaded specialization for FOR_ITER #128798

gh-115999: Add free-threaded specialization for FOR_ITER #128798

Uh oh!

Conversation

Yhg1s commented Jan 13, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yhg1s commented Jan 13, 2025

Uh oh!

mpage commented Jan 14, 2025

Uh oh!

Yhg1s commented Jan 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Yhg1s commented Jan 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

markshannon Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandtbucher Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Yhg1s commented Mar 6, 2025

Uh oh!

markshannon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Yhg1s commented Mar 8, 2025

Uh oh!

Yhg1s commented Mar 10, 2025

Uh oh!

markshannon commented Mar 11, 2025

Uh oh!

Uh oh!

Uh oh!

Yhg1s commented Jan 13, 2025 •

edited by bedevere-app bot

Loading

markshannon Jan 28, 2025 •

edited

Loading

brandtbucher Jan 29, 2025 •

edited

Loading