Bug report
Bug
I found a race condition on the free-threaded build at 72eca2af59043c78647b0e6be3777a947ea9ef0f happening in list.remove.
The long story short is if whatever PyObject_RichCompareBool does releases the GIL, the critical section no longer protects the list's state from other threads, which may then mutate the list and result in the wrong item being removed later.
This is the current code for list.remove with comments around the problematic logic.
static PyObject *
list_remove_impl(PyListObject *self, PyObject *value)
{
Py_ssize_t i;
for (i = 0; i < Py_SIZE(self); i++) {
// read the contents under lock
PyObject *obj = self->ob_item[i];
Py_INCREF(obj);
// this may release the GIL, suspending the critical section
int cmp = PyObject_RichCompareBool(obj, value, Py_EQ);
Py_DECREF(obj);
// it is possible that obj (which we want to remove) isn't at
// position i anymore as other threads may have mutated
// the list...
if (cmp > 0) {
// ... but we still remove it!
if (list_ass_slice_lock_held(self, i, i+1, NULL) == 0)
Py_RETURN_NONE;
return NULL;
}
}
...
}
I'm not sure whether a crash can occur from that race condition (e.g. i is the position of the last element in the list and another thread pops the last element), but regardless I think there's a correctness issue with that race condition.
It's a bit hard to come up with a reproducer due to the race condition happening under pretty specific circumstances but I can try to make (or generate) one if needed.
Possible fixes
The simplest thing I can think of is re-comparing ob_item[i] with obj after PyObject_RichCompareBool has returned to make sure the state of the list, at least as far as this specific item's position is concerned, hasn't changed, and raising a RuntimeError if not.
Alternatively, I looked at other containers and it seems like deque has a state field which is incremented every time a mutating operation is done on it; if the state value changes in an unexpected manner during mutations, it raises a RuntimeError (I don't think list has anything like that currently).
I already have a branch in my fork that implements the former "fix", but I'm also open to working on the latter if there's interest in it.
CPython versions tested on:
CPython main branch
Operating systems tested on:
macOS
Bug report
Bug
I found a race condition on the free-threaded build at
72eca2af59043c78647b0e6be3777a947ea9ef0fhappening inlist.remove.The long story short is if whatever
PyObject_RichCompareBooldoes releases the GIL, the critical section no longer protects the list's state from other threads, which may then mutate the list and result in the wrong item being removed later.This is the current code for
list.removewith comments around the problematic logic.I'm not sure whether a crash can occur from that race condition (e.g.
iis the position of the last element in the list and another thread pops the last element), but regardless I think there's a correctness issue with that race condition.It's a bit hard to come up with a reproducer due to the race condition happening under pretty specific circumstances but I can try to make (or generate) one if needed.
Possible fixes
The simplest thing I can think of is re-comparing
ob_item[i]withobjafterPyObject_RichCompareBoolhas returned to make sure the state of the list, at least as far as this specific item's position is concerned, hasn't changed, and raising aRuntimeErrorif not.Alternatively, I looked at other containers and it seems like
dequehas astatefield which is incremented every time a mutating operation is done on it; if thestatevalue changes in an unexpected manner during mutations, it raises aRuntimeError(I don't thinklisthas anything like that currently).I already have a branch in my fork that implements the former "fix", but I'm also open to working on the latter if there's interest in it.
CPython versions tested on:
CPython main branch
Operating systems tested on:
macOS