-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-115999: Add free-threaded specialization for FOR_ITER #128798
Conversation
thread-safe as without spcialization (i.e. not much to none at all).
I still need to add tests for the specialization (rather than the thread-safety, which are in #128637), but otherwise the PR is good enough to review I think. |
Why not use the approach that you suggested in discord where we only specialize for the case where the iterator is uniquely referenced by the current thread? It seems like that should cover the overwhelmingly common case, would be simpler to implement, and the resulting instructions would be faster. |
I realised that approach wouldn't work for the list referenced by the list iterator (which is the bulk of the work for list iteration) because we're not doing any refcount operations on it (on purpose) and |
…ly to uniquely referenced iterators. This handles the common case of 'for item in seq' (where 'seq' is a list, tuple or range object) and 'for item in generator_function()', but not, for example, 'g = gen(...); for item in g:'.
I've added a test that exercises the deopt paths, although for it to pass with ThreadSanitizer relies on #128637 going in first. The test is a little wacky because for list/range/tuple iterators it's not easy to leak the uniquely referenced iterators. (For generators it could just use a weakref.) I wrote the test to make sure the deopt paths were actually safe, but I'm not sure if long-term those tests make sense. We could just keep them around while we fiddle with specialization, but drop them once we're sure the iterators can't leak (or we change the entire approach to iterator specialization.) |
FOR_ITER_LIST specialization.
Python/bytecodes.c
Outdated
// For free-threaded Python, the loop exit can happen at any point during item | ||
// retrieval, so separate ops don't make much sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like this comment applies to the tuple specialization. Can you delete it if not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Python/bytecodes.c
Outdated
// For free-threaded Python, the loop exit can happen at any point during item | ||
// retrieval, so separate ops don't make much sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand a little bit on why this doesn't make sense? It'd be nice to keep the structure of the ops the same between the two builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -2055,7 +2055,7 @@ const struct opcode_metadata _PyOpcode_opcode_metadata[266] = { | |||
[FORMAT_WITH_SPEC] = { true, INSTR_FMT_IX, HAS_ERROR_FLAG | HAS_ESCAPES_FLAG }, | |||
[FOR_ITER] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_ERROR_FLAG | HAS_ERROR_NO_POP_FLAG | HAS_ESCAPES_FLAG }, | |||
[FOR_ITER_GEN] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_DEOPT_FLAG }, | |||
[FOR_ITER_LIST] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_EXIT_FLAG }, | |||
[FOR_ITER_LIST] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_DEOPT_FLAG | HAS_EXIT_FLAG | HAS_ESCAPES_FLAG }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need the HAS_ESCAPES_FLAG
? How does it escape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mechanism for fetching an item from a shared list includes calling _Py_TryIncrefCompare
, which has a path which DECREFs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which path? I'm confused as why an incref would need to decref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the reference is removed from the array that backs the list between the time it's retrieved and it's returned.
cpython/Include/internal/pycore_object.h
Lines 595 to 610 in 180ee43
/* Tries to incref the object op and ensures that *src still points to it. */ | |
static inline int | |
_Py_TryIncrefCompare(PyObject **src, PyObject *op) | |
{ | |
if (_Py_TryIncrefFast(op)) { | |
return 1; | |
} | |
if (!_Py_TryIncRefShared(op)) { | |
return 0; | |
} | |
if (op != _Py_atomic_load_ptr(src)) { | |
Py_DECREF(op); | |
return 0; | |
} | |
return 1; | |
} |
Added in #114512.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how that works. What happens if *src
is modified after op != _Py_atomic_load_ptr(src)
but before the function returns?
Regardless, we want to keep FOR_ITER_LIST
non-escaping in the default build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colesbury do you remember why lists use _Py_TryXGetRef
(in list_get_item_ref) or why it matters that the src ptr still refers to the same item? Can the object be invalid in some way that still requires us to DECREF it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two hazards that _Py_TryXGetRef
handles:
- The object
op
may be deallocated between the initial load and the incref. The try-incref, in cooperation with the allocator, handles this case. It returns zero ifop
has zero refcount. - The memory block for
op
may be deallocated and reallocated for a different object in between the initial load and incref. In that case the try-incref succeeds, but the subsequent check fails and we need to decrefop
.
It does not matter if *src
is modified after op != _Py_atomic_load_ptr(src)
. op
will still be a valid reference to an object that was the next element in the list at some point during the operation. It's always been possible for the list to be concurrently modified between the execution of FOR_ITER
and subsequent code that uses the result.
See https://peps.python.org/pep-0703/#optimistically-avoiding-locking
Regardless, we want to keep FOR_ITER_LIST non-escaping in the default build
Is there any performance reason to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless, we want to keep FOR_ITER_LIST non-escaping in the default build
Is there any performance reason to do so?
Just to follow up: this does make JIT code slightly worse for traces covering for
loops over lists. Since the _ITER_NEXT_LIST
at the top of the loop is now escaping, this will add a new _CHECK_VALIDITY
uop to the start of each loop body where none exists currently.
Consider this loop:
def f():
x = 0
for i in list(range(10_000)):
x += i
This change increases the number of validity checks per loop from 2 to 3. Before:
_MAKE_WARM
_SET_IP
_CHECK_PERIODIC
_CHECK_VALIDITY <------------- OLD
_ITER_CHECK_LIST
_GUARD_NOT_EXHAUSTED_LIST
_ITER_NEXT_LIST
_SET_IP
_STORE_FAST_1
_CHECK_VALIDITY <------------- OLD
_LOAD_FAST_0
_LOAD_FAST_1
_GUARD_BOTH_INT
_BINARY_OP_ADD_INT
_SET_IP
_STORE_FAST_0
_JUMP_TO_TOP
After:
_MAKE_WARM
_SET_IP
_CHECK_PERIODIC
_CHECK_VALIDITY <------------- OLD
_ITER_CHECK_LIST
_GUARD_NOT_EXHAUSTED_LIST
_ITER_NEXT_LIST
_CHECK_VALIDITY_AND_SET_IP <-- NEW
_STORE_FAST_1
_CHECK_VALIDITY <------------- OLD
_LOAD_FAST_0
_LOAD_FAST_1
_GUARD_BOTH_INT
_BINARY_OP_ADD_INT
_SET_IP
_STORE_FAST_0
_JUMP_TO_TOP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify:
FOR_ITER_LIST
is escaping because _PyList_GetItemRefNoLock
is escaping, and _PyList_GetItemRefNoLock
is escaping because _Py_TryIncrefCompareStackRef
is escaping.
Either we need to find an alternative approach or make _Py_TryIncrefCompareStackRef
non-escaping.
@markshannon This is the FOR_ITER specialization you wanted to take another look at. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a shame about FOR_ITER_LIST
becoming escaping.
We need to make _Py_TryIncrefCompareStackRef
non-escaping, but not in this PR.
Otherwise we'll end up with everything escaping, as more code uses _Py_TryIncrefCompareStackRef
.
@@ -2055,7 +2055,7 @@ const struct opcode_metadata _PyOpcode_opcode_metadata[266] = { | |||
[FORMAT_WITH_SPEC] = { true, INSTR_FMT_IX, HAS_ERROR_FLAG | HAS_ESCAPES_FLAG }, | |||
[FOR_ITER] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_ERROR_FLAG | HAS_ERROR_NO_POP_FLAG | HAS_ESCAPES_FLAG }, | |||
[FOR_ITER_GEN] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_DEOPT_FLAG }, | |||
[FOR_ITER_LIST] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_EXIT_FLAG }, | |||
[FOR_ITER_LIST] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_DEOPT_FLAG | HAS_EXIT_FLAG | HAS_ESCAPES_FLAG }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify:
FOR_ITER_LIST
is escaping because _PyList_GetItemRefNoLock
is escaping, and _PyList_GetItemRefNoLock
is escaping because _Py_TryIncrefCompareStackRef
is escaping.
Either we need to find an alternative approach or make _Py_TryIncrefCompareStackRef
non-escaping.
I think we discussed this in the weekly meeting a month or two ago... I think the only way to avoid it is to delay DECREF'ing objects we've accidentally incorrectly INCREF'ed -- that is to say, objects that were newly allocated from the same memory as the object we were trying to INCREF but we lost the INCREF/DECREF race with another thread on. I think in this case that might not be too bad: if the INCREF fails we deopt anyway, and we could delay that DECREF until that deopt. For other cases where we need to TryIncref, I'm not sure of the impact of delaying (plus, we'd need some mechanism for the delay). |
@markshannon let me know if you want that to happen in this PR (but I'd prefer it not). |
No need to do it in this PR. |
…n#128798) Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)
Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)
--disable-gil
builds #115999