Skip to content

GH-124715: Move trashcan mechanism into Py_Dealloc #132280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Apr 8, 2025

@nascheme
Copy link
Member

nascheme commented Apr 8, 2025

This is a good idea, IMHO. I had a similar PR but Mark's is more elegant than mine.

Objects/object.c Outdated
_PyObject_ASSERT(op, Py_REFCNT(op) == 0);
#ifdef Py_GIL_DISABLED
op->ob_tid = (uintptr_t)tstate->delete_later;
#else
_PyGCHead_SET_PREV(_Py_AS_GC(op), (PyGC_Head*)tstate->delete_later);
*((PyObject**)op) = tstate->delete_later;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 2915 is a little obscure. I think a comment explaining that ob_refcnt is being used to create the queue would be helpful.

@markshannon markshannon marked this pull request as ready for review April 9, 2025 12:28
@markshannon markshannon added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 9, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @markshannon for commit 8aa20f2 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F132280%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 9, 2025
@markshannon
Copy link
Member Author

Initial performance numbers are a bit down. 0.2% - 1.6% slower.

I'm testing another branch that adds PyThreadState * as a parameter to _PyDealloc, which might reduce the impact.

@markshannon
Copy link
Member Author

With faster recursion checks, performance is mixed but no worse overall:

  • Linux ARM: 0.4% slower
  • Linux x86: 1.5% slower
  • Windows x86: 0.9% faster
  • Mac (ARM): 3.8% faster

Although the 1.5% slowdown on linux x86 is concerning.

I also tried passing in the thread state as a parameter to _Py_Dealloc which is a much larger change. Results were a bit worse than this PR:

  • Linux ARM: no change
  • Linux x86: 1.3% slower
  • Windows x86: 1.3% slower
  • Mac (ARM): 3.9% faster

@iritkatriel
Copy link
Member

Maybe perf will be better with just one additional if for a decref that is not a container. So you have an implementation of decref for containers, and do one if to decide whether to apply it or stay with the current one.

@markshannon
Copy link
Member Author

Maybe perf will be better with just one additional if for a decref that is not a container. So you have an implementation of decref for containers, and do one if to decide whether to apply it or stay with the current one.

Do you mean a test in Py_DECREF or in Py_Dealloc?
We really want to avoid adding more branches to Py_DECREF as Py_DECREF is inlined in so many places.

Adding a branch in Py_Dealloc would save the stack checks, but those are cheap. Getting the "margin" is a load, subtract and shift.

Testing for collections is more expensive. PyObject_IS_GC does three loads, two of which are dependent on the first.

@iritkatriel
Copy link
Member

Maybe perf will be better with just one additional if for a decref that is not a container. So you have an implementation of decref for containers, and do one if to decide whether to apply it or stay with the current one.

Do you mean a test in Py_DECREF or in Py_Dealloc?

Sorry, yeah, dealloc. Maybe not worth it.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall design LGTM.

I added a few comments.

@@ -2656,45 +2655,6 @@ subtype_dealloc(PyObject *self)
_Py_DECREF_TYPE(type);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can now remove this empty line.

@@ -26,17 +26,25 @@ PyAPI_DATA(int) (*PyOS_InputHook)(void);
* apart. In practice, that means it must be larger than the C
* stack consumption of PyEval_EvalDefault */
#if defined(_Py_ADDRESS_SANITIZER) || defined(_Py_THREAD_SANITIZER)
# define PYOS_STACK_MARGIN 4096
# define PYOS_LOG_STACK_MARGIN 12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may rename this constant to PYOS_LOG2_STACK_MARGIN. It took me a while to understand that it was the log(n)/log(2) function :-) Or just add a a comment explaining what "LOG" stands for here.

PyObject_GC_UnTrack(op);
if (_PyObject_GC_IS_TRACKED(op)) {
PyObject_GC_UnTrack(op);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change needed? PyObject_GC_UnTrack() already checks if the object is tracked or not, no?

Comment on lines +1 to +3
Prevents against stack overflows when calling Py_DECREF. Third-party
extension objects no longer need to use the "trashcan" mechanism, as
protection is now built into the ``Py_DECREF`` macro.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Prevents against stack overflows when calling Py_DECREF. Third-party
extension objects no longer need to use the "trashcan" mechanism, as
protection is now built into the ``Py_DECREF`` macro.
Prevents against stack overflows when calling :c:func:`Py_DECREF`. Third-party
extension objects no longer need to use the "trashcan" mechanism, as
protection is now built into the :c:func:`Py_DECREF` function.

@@ -2933,7 +2935,9 @@ _PyTrash_thread_destroy_chain(PyThreadState *tstate)
op->ob_tid = 0;
_Py_atomic_store_ssize_relaxed(&op->ob_ref_shared, _Py_REF_MERGED);
#else
tstate->delete_later = (PyObject*) _PyGCHead_PREV(_Py_AS_GC(op));
uintptr_t refcnt = *((uintptr_t*)op);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -1 operation is a bit magic, I suggest adding this comment:

Suggested change
uintptr_t refcnt = *((uintptr_t*)op);
/* Get the delete_later pointer from the refcnt field.
* See _PyTrash_thread_deposit_object(). */
uintptr_t refcnt = *((uintptr_t*)op);

_PyObject_ASSERT(op, Py_REFCNT(op) == 0);
#ifdef Py_GIL_DISABLED
op->ob_tid = (uintptr_t)tstate->delete_later;
#else
_PyGCHead_SET_PREV(_Py_AS_GC(op), (PyGC_Head*)tstate->delete_later);
/* Store the pointer in the refcnt field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Store the pointer in the refcnt field.
/* Store the delete_later pointer in the refcnt field.

@rhettinger rhettinger removed their request for review April 16, 2025 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants