-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-131798: JIT: Split CALL_ISINSTANCE
into several uops
#133339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I just see an opportunity for further cleanup!
Python/bytecodes.c
Outdated
op(_GUARD_CALLABLE_ISINSTANCE_NULL, (callable, null, unused[oparg] -- callable, null, unused[oparg])) { | ||
DEOPT_IF(!PyStackRef_IsNull(null)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since now we know that the oparg is two, we can split out the two args and get rid of the array. Also, let's give this a better name (since it's part of the same logical family as _GUARD_TOS_NULL
and _GUARD_NOS_NULL
).
op(_GUARD_CALLABLE_ISINSTANCE_NULL, (callable, null, unused[oparg] -- callable, null, unused[oparg])) { | |
DEOPT_IF(!PyStackRef_IsNull(null)); | |
} | |
op(_GUARD_THIRD_NULL, (null, unused, unused -- null, unused, unused)) { | |
DEOPT_IF(!PyStackRef_IsNull(null)); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I agree it makes sense to make it reusable. I also moved it just under _GUARD_NOS_NULL
to help with discoverability
Python/bytecodes.c
Outdated
PyInterpreterState *interp = tstate->interp; | ||
DEOPT_IF(callable_o != interp->callable_cache.isinstance); | ||
} | ||
|
||
op(_CALL_ISINSTANCE, (callable, null, args[oparg] -- res)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here:
op(_CALL_ISINSTANCE, (callable, null, args[oparg] -- res)) { | |
op(_CALL_ISINSTANCE, (callable, null, inst, cls -- res)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inst
is a reserved keyword so I went with inst_
:)
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Python/bytecodes.c
Outdated
if (retval < 0) { | ||
ERROR_NO_POP(); | ||
} | ||
(void)null; // Silence compiler warnings about unused variables | ||
PyStackRef_CLOSE(cls); | ||
PyStackRef_CLOSE(inst_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that with named stackrefs it won't let me do this:
DEAD(null);
PyStackRef_CLOSE(cls);
PyStackRef_CLOSE(inst_);
(the error is SyntaxError: Input 'null' is not live, but 'inst_' is
)
While with the previous args
version this was fine:
DEAD(null);
PyStackRef_CLOSE(args[0]);
PyStackRef_CLOSE(args[1]);
I guess the cases generator can't reason about arrays? Another reason to use named stackrefs instead :)
CI looks good so: I have made the requested changes; please review again :) |
Thanks for making the requested changes! @brandtbucher: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One fix to the test:
Python/bytecodes.c
Outdated
PyInterpreterState *interp = tstate->interp; | ||
DEOPT_IF(callable_o != interp->callable_cache.isinstance); | ||
} | ||
|
||
op(_CALL_ISINSTANCE, (callable, null, inst_, cls -- res)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the trailing underscore. Not a huge deal, but maybe just rename to instance
or obj
or something ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it to instance
in be50e24
CALL_ISINSTANCE
into severeal uopsCALL_ISINSTANCE
into several uops
Co-authored-by: Brandt Bucher <[email protected]>
You could do this if you want. Mark wanted to do this, but no one has time to implement it proper. The steps to do this would be the following:
|
Ok I'll give it a try! Seems like a nice project for the weekend (or for PyCon if I can't get it working by then 😄) Thanks for writing down the individual steps for me, it's super helpful :) |
@Fidget-Spinner, I noticed that the optimizer can already be turned off by setting Lines 1280 to 1288 in 98e2c3a
Rather than exposing a new internal api, I think we could simply toggle this variable in the tests. I'm not sure if this is what Mark intended as this would still test more than just the optimizer proper. Though I like that these tests are more end-to-end. Anyway, here's what it'd look like. Let me know if this is something we want to pursue, otherwise I'll go back to thinking how to expose just the optimizer :) diff --git a/Lib/test/test_capi/test_opt.py b/Lib/test/test_capi/test_opt.py
index 651148336f7..b82b36fa2f5 100644
--- a/Lib/test/test_capi/test_opt.py
+++ b/Lib/test/test_capi/test_opt.py
@@ -11,6 +11,7 @@
from test.support import (script_helper, requires_specialization,
import_helper, Py_GIL_DISABLED, requires_jit_enabled,
reset_code)
+from test.support.os_helper import EnvironmentVarGuard
_testinternalcapi = import_helper.import_module("_testinternalcapi")
@@ -458,6 +459,12 @@ def _run_with_optimizer(self, testfunc, arg):
ex = get_first_executor(testfunc)
return res, ex
+ def _run_without_optimizer(self, testfunc, arg):
+ with EnvironmentVarGuard() as env:
+ env["PYTHON_UOPS_OPTIMIZE"] = "0"
+ res = testfunc(arg)
+ ex = get_first_executor(testfunc)
+ return res, ex
def test_int_type_propagation(self):
def testfunc(loops):
@@ -1951,6 +1958,17 @@ def testfunc(n):
x += 1
return x
+ res, ex = self._run_without_optimizer(testfunc, TIER2_THRESHOLD)
+ self.assertEqual(res, TIER2_THRESHOLD)
+ self.assertIsNotNone(ex)
+ uops = get_opnames(ex)
+ self.assertIn("_CALL_ISINSTANCE", uops)
+ self.assertIn("_GUARD_THIRD_NULL", uops)
+ self.assertIn("_GUARD_CALLABLE_ISINSTANCE", uops)
+
+ # Invalidate the executor to force a reoptimization
+ _testinternalcapi.invalidate_executors(testfunc.__code__)
+
res, ex = self._run_with_optimizer(testfunc, TIER2_THRESHOLD)
self.assertEqual(res, TIER2_THRESHOLD)
self.assertIsNotNone(ex) |
@tomasr8 yeah I'm not advocating for removing the end-to-end tests altogether. Rather, the optimizer tests should have a mix of both end-to-end and unit. For example, it would be nice if we could generate an optimized trace without having to even wrap it in a for loop and run till JIT threshold. This actually makes the tests really slow because JIT threshold is quite high right now. |
Got it! Yeah, being able to simplify the tests and make them run faster is definitely worth it :) |
Split
CALL_ISINSTANCE
into two guards and the uop itself.It will be easier to implement #133172 with this in place.