Skip to content

gh-131798: JIT: Split CALL_ISINSTANCE into several uops #133339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 8, 2025

Conversation

tomasr8
Copy link
Member

@tomasr8 tomasr8 commented May 3, 2025

Split CALL_ISINSTANCE into two guards and the uop itself.

It will be easier to implement #133172 with this in place.

Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I just see an opportunity for further cleanup!

Comment on lines 4352 to 4354
op(_GUARD_CALLABLE_ISINSTANCE_NULL, (callable, null, unused[oparg] -- callable, null, unused[oparg])) {
DEOPT_IF(!PyStackRef_IsNull(null));
}
Copy link
Member

@brandtbucher brandtbucher May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since now we know that the oparg is two, we can split out the two args and get rid of the array. Also, let's give this a better name (since it's part of the same logical family as _GUARD_TOS_NULL and _GUARD_NOS_NULL).

Suggested change
op(_GUARD_CALLABLE_ISINSTANCE_NULL, (callable, null, unused[oparg] -- callable, null, unused[oparg])) {
DEOPT_IF(!PyStackRef_IsNull(null));
}
op(_GUARD_THIRD_NULL, (null, unused, unused -- null, unused, unused)) {
DEOPT_IF(!PyStackRef_IsNull(null));
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I agree it makes sense to make it reusable. I also moved it just under _GUARD_NOS_NULL to help with discoverability

PyInterpreterState *interp = tstate->interp;
DEOPT_IF(callable_o != interp->callable_cache.isinstance);
}

op(_CALL_ISINSTANCE, (callable, null, args[oparg] -- res)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here:

Suggested change
op(_CALL_ISINSTANCE, (callable, null, args[oparg] -- res)) {
op(_CALL_ISINSTANCE, (callable, null, inst, cls -- res)) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inst is a reserved keyword so I went with inst_ :)

@bedevere-app
Copy link

bedevere-app bot commented May 3, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

if (retval < 0) {
ERROR_NO_POP();
}
(void)null; // Silence compiler warnings about unused variables
PyStackRef_CLOSE(cls);
PyStackRef_CLOSE(inst_);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that with named stackrefs it won't let me do this:

DEAD(null);
PyStackRef_CLOSE(cls);
PyStackRef_CLOSE(inst_);

(the error is SyntaxError: Input 'null' is not live, but 'inst_' is)
While with the previous args version this was fine:

DEAD(null);
PyStackRef_CLOSE(args[0]);
PyStackRef_CLOSE(args[1]);

I guess the cases generator can't reason about arrays? Another reason to use named stackrefs instead :)

@tomasr8
Copy link
Member Author

tomasr8 commented May 3, 2025

CI looks good so: I have made the requested changes; please review again :)

@bedevere-app
Copy link

bedevere-app bot commented May 3, 2025

Thanks for making the requested changes!

@brandtbucher: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from brandtbucher May 3, 2025 21:10
Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One fix to the test:

PyInterpreterState *interp = tstate->interp;
DEOPT_IF(callable_o != interp->callable_cache.isinstance);
}

op(_CALL_ISINSTANCE, (callable, null, inst_, cls -- res)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the trailing underscore. Not a huge deal, but maybe just rename to instance or obj or something ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to instance in be50e24

@brandtbucher brandtbucher changed the title gh-131798: JIT: Split CALL_ISINSTANCE into severeal uops gh-131798: JIT: Split CALL_ISINSTANCE into several uops May 6, 2025
@Fidget-Spinner
Copy link
Member

I think the tests could be more robust if there was a way to run the executors without optimizing the uops. Each test could then first use assertIn to ensure the relevant uops are in fact present and only then run with optimizations enabled. Though dynamically turning off optimizations would require some changes in the interpreter so I'm not sure if it's worth it.

You could do this if you want. Mark wanted to do this, but no one has time to implement it proper.

The steps to do this would be the following:

  1. Export some C API for optimize_uops in _textinternalcapi.c or something.
  2. The API should take a bytecode offset/slice to start trace projection from.
  3. Write our optimizer tests to just use that.

@tomasr8
Copy link
Member Author

tomasr8 commented May 8, 2025

Ok I'll give it a try! Seems like a nice project for the weekend (or for PyCon if I can't get it working by then 😄)

Thanks for writing down the individual steps for me, it's super helpful :)

@tomasr8 tomasr8 requested a review from brandtbucher May 8, 2025 20:18
@brandtbucher brandtbucher merged commit c492ac7 into python:main May 8, 2025
63 of 64 checks passed
@tomasr8 tomasr8 deleted the jit-split-isinstance branch May 8, 2025 21:30
@tomasr8
Copy link
Member Author

tomasr8 commented May 9, 2025

I think the tests could be more robust if there was a way to run the executors without optimizing the uops. Each test could then first use assertIn to ensure the relevant uops are in fact present and only then run with optimizations enabled. Though dynamically turning off optimizations would require some changes in the interpreter so I'm not sure if it's worth it.

You could do this if you want. Mark wanted to do this, but no one has time to implement it proper.

The steps to do this would be the following:

1. Export some C API for `optimize_uops` in `_textinternalcapi.c` or something.

2. The API should take a bytecode offset/slice to start trace projection from.

3. Write our optimizer tests to just use that.

@Fidget-Spinner, I noticed that the optimizer can already be turned off by setting PYTHON_UOPS_OPTIMIZE to '0':

cpython/Python/optimizer.c

Lines 1280 to 1288 in 98e2c3a

char *env_var = Py_GETENV("PYTHON_UOPS_OPTIMIZE");
if (env_var == NULL || *env_var == '\0' || *env_var > '0') {
length = _Py_uop_analyze_and_optimize(frame, buffer,
length,
curr_stackentries, &dependencies);
if (length <= 0) {
return length;
}
}

Rather than exposing a new internal api, I think we could simply toggle this variable in the tests. I'm not sure if this is what Mark intended as this would still test more than just the optimizer proper. Though I like that these tests are more end-to-end.

Anyway, here's what it'd look like. Let me know if this is something we want to pursue, otherwise I'll go back to thinking how to expose just the optimizer :)

diff --git a/Lib/test/test_capi/test_opt.py b/Lib/test/test_capi/test_opt.py
index 651148336f7..b82b36fa2f5 100644
--- a/Lib/test/test_capi/test_opt.py
+++ b/Lib/test/test_capi/test_opt.py
@@ -11,6 +11,7 @@
 from test.support import (script_helper, requires_specialization,
                           import_helper, Py_GIL_DISABLED, requires_jit_enabled,
                           reset_code)
+from test.support.os_helper import EnvironmentVarGuard
 
 _testinternalcapi = import_helper.import_module("_testinternalcapi")
 
@@ -458,6 +459,12 @@ def _run_with_optimizer(self, testfunc, arg):
         ex = get_first_executor(testfunc)
         return res, ex
 
+    def _run_without_optimizer(self, testfunc, arg):
+        with EnvironmentVarGuard() as env:
+            env["PYTHON_UOPS_OPTIMIZE"] = "0"
+            res = testfunc(arg)
+        ex = get_first_executor(testfunc)
+        return res, ex
 
     def test_int_type_propagation(self):
         def testfunc(loops):
@@ -1951,6 +1958,17 @@ def testfunc(n):
                     x += 1
             return x
 
+        res, ex = self._run_without_optimizer(testfunc, TIER2_THRESHOLD)
+        self.assertEqual(res, TIER2_THRESHOLD)
+        self.assertIsNotNone(ex)
+        uops = get_opnames(ex)
+        self.assertIn("_CALL_ISINSTANCE", uops)
+        self.assertIn("_GUARD_THIRD_NULL", uops)
+        self.assertIn("_GUARD_CALLABLE_ISINSTANCE", uops)
+
+        # Invalidate the executor to force a reoptimization
+        _testinternalcapi.invalidate_executors(testfunc.__code__)
+
         res, ex = self._run_with_optimizer(testfunc, TIER2_THRESHOLD)
         self.assertEqual(res, TIER2_THRESHOLD)
         self.assertIsNotNone(ex)

@Fidget-Spinner
Copy link
Member

@tomasr8 yeah I'm not advocating for removing the end-to-end tests altogether. Rather, the optimizer tests should have a mix of both end-to-end and unit.

For example, it would be nice if we could generate an optimized trace without having to even wrap it in a for loop and run till JIT threshold. This actually makes the tests really slow because JIT threshold is quite high right now.

@tomasr8
Copy link
Member Author

tomasr8 commented May 10, 2025

Got it! Yeah, being able to simplify the tests and make them run faster is definitely worth it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants