-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
GH-130415: Improve the JIT's unneeded uop removal pass #132333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be fine? optimize_pop_top_again should reach a fixpoint (there are only a finite number of last we can change to _NOP).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions.
The overall design looks good.
@@ -1283,7 +1283,7 @@ class Bar: | |||
load_attr_top = opnames.index("_LOAD_ATTR_NONDESCRIPTOR_WITH_VALUES", 0, call) | |||
load_attr_bottom = opnames.index("_LOAD_ATTR_NONDESCRIPTOR_WITH_VALUES", call) | |||
self.assertEqual(opnames[:load_attr_top].count("_GUARD_TYPE_VERSION"), 1) | |||
self.assertEqual(opnames[call:load_attr_bottom].count("_CHECK_VALIDITY"), 1) | |||
self.assertEqual(opnames[call:load_attr_bottom].count("_CHECK_VALIDITY"), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have two _CHECK_VALIDITY
s now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There used to be one validity check, after the setattr
call. This adds an additional one after the _POP_TOP
following the call, since in theory the function could return an object with a destructor that invalidates the JIT code when it's popped.
Because of the old structure of this optimization pass, we weren't actually treating _POP_TOP
as escaping, even though it is (this is handled in the default
case, which we fall through to now).
Python/optimizer_analysis.c
Outdated
switch (last->opcode) { | ||
case _POP_TWO_LOAD_CONST_INLINE_BORROW: | ||
last->opcode = _POP_TOP; | ||
goto optimize_pop_top_again; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to optimize again here?
If the previous instruction is _POP_TOP
I see no further optimization to apply here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. We might want to combine smaller ops into larger ones later (for example, turn _POP_TOP; _POP_TOP_LOAD_CONST_INLINE_BORROW
into _NOP; _POP_TWO_LOAD_CONST_INLINE_BORROW
), which is where I think I was going with this. But that can come later.
last->opcode = _NOP; | ||
buffer[pc].opcode = _NOP; | ||
} | ||
if (last->opcode == _REPLACE_WITH_TRUE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we not optimizing this any more? I would have thought it would go in the same case as _POP_TOP_LOAD_CONST_INLINE_BORROW
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Though these should probably just be turned into _POP_TOP_LOAD_CONST_INLINE_BORROW
by the optimizer instead.
I'll leave it this way and open up another PR for that. Having the hardcoded constant makes things more difficult, since we'd need to start writing operands instead of operating purely on opcodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I just added that change as part of this PR.
This improves our ability to remove unused constant and local variable loads during the
remove_unneeded_uops
pass in the JIT's optimizer.It now handles all combinations of:
_COPY
_LOAD_CONST_INLINE
_LOAD_CONST_INLINE_BORROW
_LOAD_FAST
_LOAD_FAST_BORROW
_LOAD_SMALL_INT
...followed by...
_POP_TOP
_POP_TOP_LOAD_CONST_INLINE
_POP_TOP_LOAD_CONST_INLINE_BORROW
_POP_TWO_LOAD_CONST_INLINE_BORROW
(It also fixes an issue where
CHECK_VALIDITY
wasn't being added after these latter four instructions, due to how the oldswitch
-case
was structured.)According to the stats, this results in:
_LOAD_FAST
_POP_TOP_LOAD_CONST_INLINE_BORROW
_POP_TOP_LOAD_CONST_INLINE
@savannahostrowski: We should see even more impact once you start using
_POP_TWO_LOAD_CONST_INLINE_BORROW
in more places. I'm currently working on turning basically all cached class attribute/method loads into constants too, which should help here as well.