Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-115999: Add free-threaded specialization for FOR_ITER #128798

Merged
merged 15 commits into from
Mar 12, 2025

Conversation

Yhg1s
Copy link
Member

@Yhg1s Yhg1s commented Jan 13, 2025

Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)

@Yhg1s
Copy link
Member Author

Yhg1s commented Jan 13, 2025

I still need to add tests for the specialization (rather than the thread-safety, which are in #128637), but otherwise the PR is good enough to review I think.

@mpage
Copy link
Contributor

mpage commented Jan 14, 2025

Why not use the approach that you suggested in discord where we only specialize for the case where the iterator is uniquely referenced by the current thread? It seems like that should cover the overwhelmingly common case, would be simpler to implement, and the resulting instructions would be faster.

@Yhg1s
Copy link
Member Author

Yhg1s commented Jan 14, 2025

I realised that approach wouldn't work for the list referenced by the list iterator (which is the bulk of the work for list iteration) because we're not doing any refcount operations on it (on purpose) and _PyObject_IsUniquelyReferenced() will only do the right thing if all threads hold strong references... and then I think I forgot we should still be able to use that approach for the iterators themselves. I'll see if I can poke holes into that idea tomorrow.

Yhg1s added 2 commits January 21, 2025 13:32
…ly to

uniquely referenced iterators. This handles the common case of 'for item in
seq' (where 'seq' is a list, tuple or range object) and 'for item in
generator_function()', but not, for example, 'g = gen(...); for item in g:'.
@Yhg1s
Copy link
Member Author

Yhg1s commented Jan 21, 2025

I've added a test that exercises the deopt paths, although for it to pass with ThreadSanitizer relies on #128637 going in first. The test is a little wacky because for list/range/tuple iterators it's not easy to leak the uniquely referenced iterators. (For generators it could just use a weakref.) I wrote the test to make sure the deopt paths were actually safe, but I'm not sure if long-term those tests make sense. We could just keep them around while we fiddle with specialization, but drop them once we're sure the iterators can't leak (or we change the entire approach to iterator specialization.)

@Yhg1s Yhg1s marked this pull request as ready for review January 21, 2025 15:21
@Yhg1s Yhg1s requested a review from markshannon as a code owner January 21, 2025 15:21
@mpage mpage requested a review from colesbury January 23, 2025 18:35
Comment on lines 3114 to 3115
// For free-threaded Python, the loop exit can happen at any point during item
// retrieval, so separate ops don't make much sense.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like this comment applies to the tuple specialization. Can you delete it if not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 3031 to 3032
// For free-threaded Python, the loop exit can happen at any point during item
// retrieval, so separate ops don't make much sense.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand a little bit on why this doesn't make sense? It'd be nice to keep the structure of the ops the same between the two builds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -2055,7 +2055,7 @@ const struct opcode_metadata _PyOpcode_opcode_metadata[266] = {
[FORMAT_WITH_SPEC] = { true, INSTR_FMT_IX, HAS_ERROR_FLAG | HAS_ESCAPES_FLAG },
[FOR_ITER] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_ERROR_FLAG | HAS_ERROR_NO_POP_FLAG | HAS_ESCAPES_FLAG },
[FOR_ITER_GEN] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_DEOPT_FLAG },
[FOR_ITER_LIST] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_EXIT_FLAG },
[FOR_ITER_LIST] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_DEOPT_FLAG | HAS_EXIT_FLAG | HAS_ESCAPES_FLAG },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need the HAS_ESCAPES_FLAG? How does it escape?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mechanism for fetching an item from a shared list includes calling _Py_TryIncrefCompare, which has a path which DECREFs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which path? I'm confused as why an incref would need to decref.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the reference is removed from the array that backs the list between the time it's retrieved and it's returned.

/* Tries to incref the object op and ensures that *src still points to it. */
static inline int
_Py_TryIncrefCompare(PyObject **src, PyObject *op)
{
if (_Py_TryIncrefFast(op)) {
return 1;
}
if (!_Py_TryIncRefShared(op)) {
return 0;
}
if (op != _Py_atomic_load_ptr(src)) {
Py_DECREF(op);
return 0;
}
return 1;
}

Added in #114512.

Copy link
Member

@markshannon markshannon Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how that works. What happens if *src is modified after op != _Py_atomic_load_ptr(src) but before the function returns?

Regardless, we want to keep FOR_ITER_LIST non-escaping in the default build.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colesbury do you remember why lists use _Py_TryXGetRef (in list_get_item_ref) or why it matters that the src ptr still refers to the same item? Can the object be invalid in some way that still requires us to DECREF it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two hazards that _Py_TryXGetRef handles:

  1. The object op may be deallocated between the initial load and the incref. The try-incref, in cooperation with the allocator, handles this case. It returns zero if op has zero refcount.
  2. The memory block for op may be deallocated and reallocated for a different object in between the initial load and incref. In that case the try-incref succeeds, but the subsequent check fails and we need to decref op.

It does not matter if *src is modified after op != _Py_atomic_load_ptr(src). op will still be a valid reference to an object that was the next element in the list at some point during the operation. It's always been possible for the list to be concurrently modified between the execution of FOR_ITER and subsequent code that uses the result.

See https://peps.python.org/pep-0703/#optimistically-avoiding-locking

Regardless, we want to keep FOR_ITER_LIST non-escaping in the default build

Is there any performance reason to do so?

Copy link
Member

@brandtbucher brandtbucher Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless, we want to keep FOR_ITER_LIST non-escaping in the default build

Is there any performance reason to do so?

Just to follow up: this does make JIT code slightly worse for traces covering for loops over lists. Since the _ITER_NEXT_LIST at the top of the loop is now escaping, this will add a new _CHECK_VALIDITY uop to the start of each loop body where none exists currently.

Consider this loop:

def f():
    x = 0
    for i in list(range(10_000)):
        x += i

This change increases the number of validity checks per loop from 2 to 3. Before:

_MAKE_WARM
_SET_IP
_CHECK_PERIODIC
_CHECK_VALIDITY <------------- OLD
_ITER_CHECK_LIST
_GUARD_NOT_EXHAUSTED_LIST
_ITER_NEXT_LIST
_SET_IP
_STORE_FAST_1
_CHECK_VALIDITY <------------- OLD
_LOAD_FAST_0
_LOAD_FAST_1
_GUARD_BOTH_INT
_BINARY_OP_ADD_INT
_SET_IP
_STORE_FAST_0
_JUMP_TO_TOP

After:

_MAKE_WARM
_SET_IP
_CHECK_PERIODIC
_CHECK_VALIDITY <------------- OLD
_ITER_CHECK_LIST
_GUARD_NOT_EXHAUSTED_LIST
_ITER_NEXT_LIST
_CHECK_VALIDITY_AND_SET_IP <-- NEW
_STORE_FAST_1
_CHECK_VALIDITY <------------- OLD
_LOAD_FAST_0
_LOAD_FAST_1
_GUARD_BOTH_INT
_BINARY_OP_ADD_INT
_SET_IP
_STORE_FAST_0
_JUMP_TO_TOP

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify:
FOR_ITER_LIST is escaping because _PyList_GetItemRefNoLock is escaping, and _PyList_GetItemRefNoLock is escaping because _Py_TryIncrefCompareStackRef is escaping.

Either we need to find an alternative approach or make _Py_TryIncrefCompareStackRef non-escaping.

@Yhg1s Yhg1s added the skip news label Mar 6, 2025
@Yhg1s
Copy link
Member Author

Yhg1s commented Mar 6, 2025

@markshannon This is the FOR_ITER specialization you wanted to take another look at.

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame about FOR_ITER_LIST becoming escaping.

We need to make _Py_TryIncrefCompareStackRef non-escaping, but not in this PR.
Otherwise we'll end up with everything escaping, as more code uses _Py_TryIncrefCompareStackRef.

@@ -2055,7 +2055,7 @@ const struct opcode_metadata _PyOpcode_opcode_metadata[266] = {
[FORMAT_WITH_SPEC] = { true, INSTR_FMT_IX, HAS_ERROR_FLAG | HAS_ESCAPES_FLAG },
[FOR_ITER] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_ERROR_FLAG | HAS_ERROR_NO_POP_FLAG | HAS_ESCAPES_FLAG },
[FOR_ITER_GEN] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_DEOPT_FLAG },
[FOR_ITER_LIST] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_EXIT_FLAG },
[FOR_ITER_LIST] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_DEOPT_FLAG | HAS_EXIT_FLAG | HAS_ESCAPES_FLAG },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify:
FOR_ITER_LIST is escaping because _PyList_GetItemRefNoLock is escaping, and _PyList_GetItemRefNoLock is escaping because _Py_TryIncrefCompareStackRef is escaping.

Either we need to find an alternative approach or make _Py_TryIncrefCompareStackRef non-escaping.

@Yhg1s
Copy link
Member Author

Yhg1s commented Mar 8, 2025

We need to make _Py_TryIncrefCompareStackRef non-escaping, but not in this PR. Otherwise we'll end up with everything escaping, as more code uses _Py_TryIncrefCompareStackRef.

I think we discussed this in the weekly meeting a month or two ago... I think the only way to avoid it is to delay DECREF'ing objects we've accidentally incorrectly INCREF'ed -- that is to say, objects that were newly allocated from the same memory as the object we were trying to INCREF but we lost the INCREF/DECREF race with another thread on. I think in this case that might not be too bad: if the INCREF fails we deopt anyway, and we could delay that DECREF until that deopt. For other cases where we need to TryIncref, I'm not sure of the impact of delaying (plus, we'd need some mechanism for the delay).

@Yhg1s
Copy link
Member Author

Yhg1s commented Mar 10, 2025

I think the only way to avoid it is to delay DECREF'ing objects we've accidentally incorrectly INCREF'ed -- that is to say, objects that were newly allocated from the same memory as the object we were trying to INCREF but we lost the INCREF/DECREF race with another thread on. I think in this case that might not be too bad: if the INCREF fails we deopt anyway, and we could delay that DECREF until that deopt.

@markshannon let me know if you want that to happen in this PR (but I'd prefer it not).

@markshannon
Copy link
Member

No need to do it in this PR.

@Yhg1s Yhg1s merged commit de2f7da into python:main Mar 12, 2025
76 checks passed
plashchynski pushed a commit to plashchynski/cpython that referenced this pull request Mar 17, 2025
…n#128798)

Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants