Skip to content

GH-124715: Move trashcan mechanism into Py_Dealloc #132280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 3 additions & 70 deletions Include/cpython/object.h
Original file line number Diff line number Diff line change
Expand Up @@ -429,81 +429,14 @@ PyAPI_FUNC(void) _Py_NO_RETURN _PyObject_AssertFailed(
const char *function);


/* Trashcan mechanism, thanks to Christian Tismer.

When deallocating a container object, it's possible to trigger an unbounded
chain of deallocations, as each Py_DECREF in turn drops the refcount on "the
next" object in the chain to 0. This can easily lead to stack overflows,
especially in threads (which typically have less stack space to work with).

A container object can avoid this by bracketing the body of its tp_dealloc
function with a pair of macros:

static void
mytype_dealloc(mytype *p)
{
... declarations go here ...

PyObject_GC_UnTrack(p); // must untrack first
Py_TRASHCAN_BEGIN(p, mytype_dealloc)
... The body of the deallocator goes here, including all calls ...
... to Py_DECREF on contained objects. ...
Py_TRASHCAN_END // there should be no code after this
}

CAUTION: Never return from the middle of the body! If the body needs to
"get out early", put a label immediately before the Py_TRASHCAN_END
call, and goto it. Else the call-depth counter (see below) will stay
above 0 forever, and the trashcan will never get emptied.

How it works: The BEGIN macro increments a call-depth counter. So long
as this counter is small, the body of the deallocator is run directly without
further ado. But if the counter gets large, it instead adds p to a list of
objects to be deallocated later, skips the body of the deallocator, and
resumes execution after the END macro. The tp_dealloc routine then returns
without deallocating anything (and so unbounded call-stack depth is avoided).

When the call stack finishes unwinding again, code generated by the END macro
notices this, and calls another routine to deallocate all the objects that
may have been added to the list of deferred deallocations. In effect, a
chain of N deallocations is broken into (N-1)/(Py_TRASHCAN_HEADROOM-1) pieces,
with the call stack never exceeding a depth of Py_TRASHCAN_HEADROOM.

Since the tp_dealloc of a subclass typically calls the tp_dealloc of the base
class, we need to ensure that the trashcan is only triggered on the tp_dealloc
of the actual class being deallocated. Otherwise we might end up with a
partially-deallocated object. To check this, the tp_dealloc function must be
passed as second argument to Py_TRASHCAN_BEGIN().
*/


PyAPI_FUNC(void) _PyTrash_thread_deposit_object(PyThreadState *tstate, PyObject *op);
PyAPI_FUNC(void) _PyTrash_thread_destroy_chain(PyThreadState *tstate);


/* Python 3.10 private API, invoked by the Py_TRASHCAN_BEGIN(). */

/* To avoid raising recursion errors during dealloc trigger trashcan before we reach
* recursion limit. To avoid trashing, we don't attempt to empty the trashcan until
* we have headroom above the trigger limit */
#define Py_TRASHCAN_HEADROOM 50

/* Helper function for Py_TRASHCAN_BEGIN */
PyAPI_FUNC(int) _Py_ReachedRecursionLimitWithMargin(PyThreadState *tstate, int margin_count);

#define Py_TRASHCAN_BEGIN(op, dealloc) \
do { \
PyThreadState *tstate = PyThreadState_Get(); \
if (_Py_ReachedRecursionLimitWithMargin(tstate, 2) && Py_TYPE(op)->tp_dealloc == (destructor)dealloc) { \
_PyTrash_thread_deposit_object(tstate, (PyObject *)op); \
break; \
}
/* The body of the deallocator is here. */
#define Py_TRASHCAN_END \
if (tstate->delete_later && !_Py_ReachedRecursionLimitWithMargin(tstate, 4)) { \
_PyTrash_thread_destroy_chain(tstate); \
} \
} while (0);
/* For backwards compatibility with the old trashcan mechanism */
#define Py_TRASHCAN_BEGIN(op, dealloc)
#define Py_TRASHCAN_END


PyAPI_FUNC(void *) PyObject_GetItemData(PyObject *obj);
Expand Down
26 changes: 1 addition & 25 deletions Include/internal/pycore_ceval.h
Original file line number Diff line number Diff line change
Expand Up @@ -196,25 +196,6 @@ extern void _PyEval_DeactivateOpCache(void);

/* --- _Py_EnterRecursiveCall() ----------------------------------------- */

#if !_Py__has_builtin(__builtin_frame_address) && !defined(_MSC_VER)
static uintptr_t return_pointer_as_int(char* p) {
return (uintptr_t)p;
}
#endif

static inline uintptr_t
_Py_get_machine_stack_pointer(void) {
#if _Py__has_builtin(__builtin_frame_address)
return (uintptr_t)__builtin_frame_address(0);
#elif defined(_MSC_VER)
return (uintptr_t)_AddressOfReturnAddress();
#else
char here;
/* Avoid compiler warning about returning stack address */
return return_pointer_as_int(&here);
#endif
}

static inline int _Py_MakeRecCheck(PyThreadState *tstate) {
uintptr_t here_addr = _Py_get_machine_stack_pointer();
_PyThreadStateImpl *_tstate = (_PyThreadStateImpl *)tstate;
Expand Down Expand Up @@ -249,12 +230,7 @@ PyAPI_FUNC(void) _Py_InitializeRecursionLimits(PyThreadState *tstate);
static inline int _Py_ReachedRecursionLimit(PyThreadState *tstate) {
uintptr_t here_addr = _Py_get_machine_stack_pointer();
_PyThreadStateImpl *_tstate = (_PyThreadStateImpl *)tstate;
if (here_addr > _tstate->c_stack_soft_limit) {
return 0;
}
if (_tstate->c_stack_hard_limit == 0) {
_Py_InitializeRecursionLimits(tstate);
}
assert(_tstate->c_stack_hard_limit != 0);
return here_addr <= _tstate->c_stack_soft_limit;
}

Expand Down
29 changes: 29 additions & 0 deletions Include/internal/pycore_pystate.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ extern "C" {
#endif

#include "pycore_typedefs.h" // _PyRuntimeState
#include "pycore_tstate.h"


// Values for PyThreadState.state. A thread must be in the "attached" state
Expand Down Expand Up @@ -296,6 +297,34 @@ _Py_AssertHoldsTstateFunc(const char *func)
#define _Py_AssertHoldsTstate()
#endif

#if !_Py__has_builtin(__builtin_frame_address) && !defined(_MSC_VER)
static uintptr_t return_pointer_as_int(char* p) {
return (uintptr_t)p;
}
#endif

static inline uintptr_t
_Py_get_machine_stack_pointer(void) {
#if _Py__has_builtin(__builtin_frame_address)
return (uintptr_t)__builtin_frame_address(0);
#elif defined(_MSC_VER)
return (uintptr_t)_AddressOfReturnAddress();
#else
char here;
/* Avoid compiler warning about returning stack address */
return return_pointer_as_int(&here);
#endif
}

static inline intptr_t
_Py_RecursionLimit_GetMargin(PyThreadState *tstate)
{
_PyThreadStateImpl *_tstate = (_PyThreadStateImpl *)tstate;
assert(_tstate->c_stack_hard_limit != 0);
intptr_t here_addr = _Py_get_machine_stack_pointer();
return Py_ARITHMETIC_RIGHT_SHIFT(intptr_t, here_addr - (intptr_t)_tstate->c_stack_soft_limit, PYOS_STACK_MARGIN_SHIFT);
}

#ifdef __cplusplus
}
#endif
Expand Down
16 changes: 12 additions & 4 deletions Include/pythonrun.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,25 @@ PyAPI_DATA(int) (*PyOS_InputHook)(void);
* apart. In practice, that means it must be larger than the C
* stack consumption of PyEval_EvalDefault */
#if defined(_Py_ADDRESS_SANITIZER) || defined(_Py_THREAD_SANITIZER)
# define PYOS_STACK_MARGIN 4096
# define PYOS_LOG_STACK_MARGIN 12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may rename this constant to PYOS_LOG2_STACK_MARGIN. It took me a while to understand that it was the log(n)/log(2) function :-) Or just add a a comment explaining what "LOG" stands for here.

#elif defined(Py_DEBUG) && defined(WIN32)
# define PYOS_STACK_MARGIN 4096
# define PYOS_LOG_STACK_MARGIN 12
#elif defined(__wasi__)
/* Web assembly has two stacks, so this isn't really a size */
# define PYOS_STACK_MARGIN 500
# define PYOS_LOG_STACK_MARGIN 9
#else
# define PYOS_STACK_MARGIN 2048
# define PYOS_LOG_STACK_MARGIN 11
#endif
#define PYOS_STACK_MARGIN (1 << PYOS_LOG_STACK_MARGIN)
#define PYOS_STACK_MARGIN_BYTES (PYOS_STACK_MARGIN * sizeof(void *))

#if SIZEOF_VOID_P == 8
#define PYOS_STACK_MARGIN_SHIFT (PYOS_LOG_STACK_MARGIN + 3)
#else
#define PYOS_STACK_MARGIN_SHIFT (PYOS_LOG_STACK_MARGIN + 2)
#endif


#if defined(WIN32)
#define USE_STACKCHECK
#endif
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Prevents against stack overflows when calling Py_DECREF. Third-party
extension objects no longer need to use the "trashcan" mechanism, as
protection is now built into the ``Py_DECREF`` macro.
Comment on lines +1 to +3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Prevents against stack overflows when calling Py_DECREF. Third-party
extension objects no longer need to use the "trashcan" mechanism, as
protection is now built into the ``Py_DECREF`` macro.
Prevents against stack overflows when calling :c:func:`Py_DECREF`. Third-party
extension objects no longer need to use the "trashcan" mechanism, as
protection is now built into the :c:func:`Py_DECREF` function.

2 changes: 0 additions & 2 deletions Modules/_elementtree.c
Original file line number Diff line number Diff line change
Expand Up @@ -689,7 +689,6 @@ element_dealloc(PyObject *op)

/* bpo-31095: UnTrack is needed before calling any callbacks */
PyObject_GC_UnTrack(self);
Py_TRASHCAN_BEGIN(self, element_dealloc)

if (self->weakreflist != NULL)
PyObject_ClearWeakRefs(op);
Expand All @@ -700,7 +699,6 @@ element_dealloc(PyObject *op)

tp->tp_free(self);
Py_DECREF(tp);
Py_TRASHCAN_END
}

/* -------------------------------------------------------------------- */
Expand Down
4 changes: 3 additions & 1 deletion Objects/capsule.c
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,9 @@ static void
capsule_dealloc(PyObject *op)
{
PyCapsule *capsule = _PyCapsule_CAST(op);
PyObject_GC_UnTrack(op);
if (_PyObject_GC_IS_TRACKED(op)) {
PyObject_GC_UnTrack(op);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change needed? PyObject_GC_UnTrack() already checks if the object is tracked or not, no?

if (capsule->destructor) {
capsule->destructor(op);
}
Expand Down
2 changes: 0 additions & 2 deletions Objects/descrobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -1311,11 +1311,9 @@ wrapper_dealloc(PyObject *self)
{
wrapperobject *wp = (wrapperobject *)self;
PyObject_GC_UnTrack(wp);
Py_TRASHCAN_BEGIN(wp, wrapper_dealloc)
Py_XDECREF(wp->descr);
Py_XDECREF(wp->self);
PyObject_GC_Del(wp);
Py_TRASHCAN_END
}

static PyObject *
Expand Down
2 changes: 0 additions & 2 deletions Objects/dictobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -3262,7 +3262,6 @@ dict_dealloc(PyObject *self)

/* bpo-31095: UnTrack is needed before calling any callbacks */
PyObject_GC_UnTrack(mp);
Py_TRASHCAN_BEGIN(mp, dict_dealloc)
if (values != NULL) {
if (values->embedded == 0) {
for (i = 0, n = values->capacity; i < n; i++) {
Expand All @@ -3282,7 +3281,6 @@ dict_dealloc(PyObject *self)
else {
Py_TYPE(mp)->tp_free((PyObject *)mp);
}
Py_TRASHCAN_END
}


Expand Down
2 changes: 0 additions & 2 deletions Objects/exceptions.c
Original file line number Diff line number Diff line change
Expand Up @@ -150,10 +150,8 @@ BaseException_dealloc(PyObject *op)
// bpo-44348: The trashcan mechanism prevents stack overflow when deleting
// long chains of exceptions. For example, exceptions can be chained
// through the __context__ attributes or the __traceback__ attribute.
Py_TRASHCAN_BEGIN(self, BaseException_dealloc)
(void)BaseException_clear(op);
Py_TYPE(self)->tp_free(self);
Py_TRASHCAN_END
}

static int
Expand Down
2 changes: 0 additions & 2 deletions Objects/frameobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -1916,7 +1916,6 @@ frame_dealloc(PyObject *op)
_PyObject_GC_UNTRACK(f);
}

Py_TRASHCAN_BEGIN(f, frame_dealloc);
/* GH-106092: If f->f_frame was on the stack and we reached the maximum
* nesting depth for deallocations, the trashcan may have delayed this
* deallocation until after f->f_frame is freed. Avoid dereferencing
Expand All @@ -1941,7 +1940,6 @@ frame_dealloc(PyObject *op)
Py_CLEAR(f->f_locals_cache);
Py_CLEAR(f->f_overwritten_fast_locals);
PyObject_GC_Del(f);
Py_TRASHCAN_END;
}

static int
Expand Down
2 changes: 0 additions & 2 deletions Objects/listobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -550,7 +550,6 @@ list_dealloc(PyObject *self)
PyListObject *op = (PyListObject *)self;
Py_ssize_t i;
PyObject_GC_UnTrack(op);
Py_TRASHCAN_BEGIN(op, list_dealloc)
if (op->ob_item != NULL) {
/* Do it backwards, for Christian Tismer.
There's a simple test case where somehow this reduces
Expand All @@ -569,7 +568,6 @@ list_dealloc(PyObject *self)
else {
PyObject_GC_Del(op);
}
Py_TRASHCAN_END
}

static PyObject *
Expand Down
4 changes: 0 additions & 4 deletions Objects/methodobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -166,10 +166,7 @@ static void
meth_dealloc(PyObject *self)
{
PyCFunctionObject *m = _PyCFunctionObject_CAST(self);
// The Py_TRASHCAN mechanism requires that we be able to
// call PyObject_GC_UnTrack twice on an object.
PyObject_GC_UnTrack(m);
Py_TRASHCAN_BEGIN(m, meth_dealloc);
if (m->m_weakreflist != NULL) {
PyObject_ClearWeakRefs((PyObject*) m);
}
Expand All @@ -186,7 +183,6 @@ meth_dealloc(PyObject *self)
assert(Py_IS_TYPE(self, &PyCFunction_Type));
_Py_FREELIST_FREE(pycfunctionobject, m, PyObject_GC_Del);
}
Py_TRASHCAN_END;
}

static PyObject *
Expand Down
29 changes: 24 additions & 5 deletions Objects/object.c
Original file line number Diff line number Diff line change
Expand Up @@ -2908,13 +2908,15 @@ Py_ReprLeave(PyObject *obj)
void
_PyTrash_thread_deposit_object(PyThreadState *tstate, PyObject *op)
{
_PyObject_ASSERT(op, _PyObject_IS_GC(op));
_PyObject_ASSERT(op, !_PyObject_GC_IS_TRACKED(op));
_PyObject_ASSERT(op, Py_REFCNT(op) == 0);
#ifdef Py_GIL_DISABLED
op->ob_tid = (uintptr_t)tstate->delete_later;
#else
_PyGCHead_SET_PREV(_Py_AS_GC(op), (PyGC_Head*)tstate->delete_later);
/* Store the pointer in the refcnt field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Store the pointer in the refcnt field.
/* Store the delete_later pointer in the refcnt field.

* As this object may still be tracked by the GC,
* it is important that we never store 0 (NULL). */
uintptr_t refcnt = (uintptr_t)tstate->delete_later;
*((uintptr_t*)op) = refcnt+1;
#endif
tstate->delete_later = op;
}
Expand All @@ -2933,7 +2935,9 @@ _PyTrash_thread_destroy_chain(PyThreadState *tstate)
op->ob_tid = 0;
_Py_atomic_store_ssize_relaxed(&op->ob_ref_shared, _Py_REF_MERGED);
#else
tstate->delete_later = (PyObject*) _PyGCHead_PREV(_Py_AS_GC(op));
uintptr_t refcnt = *((uintptr_t*)op);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -1 operation is a bit magic, I suggest adding this comment:

Suggested change
uintptr_t refcnt = *((uintptr_t*)op);
/* Get the delete_later pointer from the refcnt field.
* See _PyTrash_thread_deposit_object(). */
uintptr_t refcnt = *((uintptr_t*)op);

tstate->delete_later = (PyObject *)(refcnt - 1);
op->ob_refcnt = 0;
#endif

/* Call the deallocator directly. This used to try to
Expand Down Expand Up @@ -2998,13 +3002,25 @@ _PyObject_AssertFailed(PyObject *obj, const char *expr, const char *msg,
}


/*
When deallocating a container object, it's possible to trigger an unbounded
chain of deallocations, as each Py_DECREF in turn drops the refcount on "the
next" object in the chain to 0. This can easily lead to stack overflows.
To avoid that, if the C stack is nearing its limit, instead of calling
dealloc on the object, it is added to a queue to be freed later when the
stack is shallower */
void
_Py_Dealloc(PyObject *op)
{
PyTypeObject *type = Py_TYPE(op);
destructor dealloc = type->tp_dealloc;
#ifdef Py_DEBUG
PyThreadState *tstate = _PyThreadState_GET();
intptr_t margin = _Py_RecursionLimit_GetMargin(tstate);
if (margin < 2) {
_PyTrash_thread_deposit_object(tstate, (PyObject *)op);
return;
}
#ifdef Py_DEBUG
#if !defined(Py_GIL_DISABLED) && !defined(Py_STACKREF_DEBUG)
/* This assertion doesn't hold for the free-threading build, as
* PyStackRef_CLOSE_SPECIALIZED is not implemented */
Expand Down Expand Up @@ -3046,6 +3062,9 @@ _Py_Dealloc(PyObject *op)
Py_XDECREF(old_exc);
Py_DECREF(type);
#endif
if (tstate->delete_later && margin >= 4) {
_PyTrash_thread_destroy_chain(tstate);
}
}


Expand Down
Loading
Loading