-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
GH-130415: Improve the JIT's unneeded uop removal pass #132333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be fine? optimize_pop_top_again should reach a fixpoint (there are only a finite number of last we can change to _NOP).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions.
The overall design looks good.
| load_attr_bottom = opnames.index("_LOAD_ATTR_NONDESCRIPTOR_WITH_VALUES", call) | ||
| self.assertEqual(opnames[:load_attr_top].count("_GUARD_TYPE_VERSION"), 1) | ||
| self.assertEqual(opnames[call:load_attr_bottom].count("_CHECK_VALIDITY"), 1) | ||
| self.assertEqual(opnames[call:load_attr_bottom].count("_CHECK_VALIDITY"), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have two _CHECK_VALIDITYs now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There used to be one validity check, after the setattr call. This adds an additional one after the _POP_TOP following the call, since in theory the function could return an object with a destructor that invalidates the JIT code when it's popped.
Because of the old structure of this optimization pass, we weren't actually treating _POP_TOP as escaping, even though it is (this is handled in the default case, which we fall through to now).
Python/optimizer_analysis.c
Outdated
| switch (last->opcode) { | ||
| case _POP_TWO_LOAD_CONST_INLINE_BORROW: | ||
| last->opcode = _POP_TOP; | ||
| goto optimize_pop_top_again; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to optimize again here?
If the previous instruction is _POP_TOP I see no further optimization to apply here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. We might want to combine smaller ops into larger ones later (for example, turn _POP_TOP; _POP_TOP_LOAD_CONST_INLINE_BORROW into _NOP; _POP_TWO_LOAD_CONST_INLINE_BORROW), which is where I think I was going with this. But that can come later.
| last->opcode = _NOP; | ||
| buffer[pc].opcode = _NOP; | ||
| } | ||
| if (last->opcode == _REPLACE_WITH_TRUE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we not optimizing this any more? I would have thought it would go in the same case as _POP_TOP_LOAD_CONST_INLINE_BORROW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Though these should probably just be turned into _POP_TOP_LOAD_CONST_INLINE_BORROW by the optimizer instead.
I'll leave it this way and open up another PR for that. Having the hardcoded constant makes things more difficult, since we'd need to start writing operands instead of operating purely on opcodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I just added that change as part of this PR.
This improves our ability to remove unused constant and local variable loads during the
remove_unneeded_uopspass in the JIT's optimizer.It now handles all combinations of:
_COPY_LOAD_CONST_INLINE_LOAD_CONST_INLINE_BORROW_LOAD_FAST_LOAD_FAST_BORROW_LOAD_SMALL_INT...followed by...
_POP_TOP_POP_TOP_LOAD_CONST_INLINE_POP_TOP_LOAD_CONST_INLINE_BORROW_POP_TWO_LOAD_CONST_INLINE_BORROW(It also fixes an issue where
CHECK_VALIDITYwasn't being added after these latter four instructions, due to how the oldswitch-casewas structured.)According to the stats, this results in:
_LOAD_FAST_POP_TOP_LOAD_CONST_INLINE_BORROW_POP_TOP_LOAD_CONST_INLINE@savannahostrowski: We should see even more impact once you start using
_POP_TWO_LOAD_CONST_INLINE_BORROWin more places. I'm currently working on turning basically all cached class attribute/method loads into constants too, which should help here as well.