8350864: C2: verify structural invariants of the Ideal graph #26362

marc-chevalier · 2025-07-17T07:25:10Z

Some crashes are consequences of earlier misshaped ideal graphs, which could be detected earlier, closer to the source, before the possibly many transformations that lead to the crash.

Let's verify that the ideal graph is well-shaped earlier then! I propose here such a feature. This runs after IGVN, because at this point, the graph, should be cleaned up for any weirdness happening earlier or during IGVN.

This feature is enabled with the develop flag VerifyIdealStructuralInvariants. Open to renaming. No problem with me! This feature is only available in debug builds, and most of the code is even not compiled in product, since it uses some debug-only functions, such as Node::dump or Node::Name.

For now, only local checks are implemented: they are checks that only look at a node and its neighborhood, wherever it happens in the graph. Typically: under a If node, we have a IfTrue and a IfFalse. To ease development, each check is implemented in its own class, independently of the others. Nevertheless, one needs to do always the same kind of things: checking there is an output of such type, checking there is N inputs, that the k-th input has such type... To ease writing such checks, in a readable way, and in a less error-prone way than pile of copy-pasted code that manually traverse the graph, I propose a set of compositional helpers to write patterns that can be matched against the ideal graph. Since these patterns are... patterns, so not related to a specific graph, they can be allocated once and forever. When used, one provides the node (called center) around which one want to check if the pattern holds.

On top of making the description of pattern easier, these helpers allows nice printing in case of error, by showing the path from the center to the violating node. For instance (made up for the purpose of showing the formatting), a violation with a path climbing only inputs:

1 failure for node
 211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
At node
    209  CountedLoopEnd  === 182 208  [[ 210 197 ]] [lt] P=0,948966, C=23799,000000 !orig=[196] !jvms: StringLatin1::equals @ bci:12 (line 100)
  From path:
    [center] 211  OuterStripMinedLoopEnd  === 215 39  [[ 212 198 ]] P=0,948966, C=23799,000000
      <-(0)- 215  SafePoint  === 210 1 7 1 1 216 37 54 185  [[ 211 ]]  SafePoint  !orig=186 !jvms: StringLatin1::equals @ bci:29 (line 100)
      <-(0)- 210  IfFalse  === 209  [[ 215 216 ]] #0 !orig=198 !jvms: StringLatin1::equals @ bci:12 (line 100)
      <-(0)- 209  CountedLoopEnd  === 182 208  [[ 210 197 ]] [lt] P=0,948966, C=23799,000000 !orig=[196] !jvms: StringLatin1::equals @ bci:12 (line 100)
# OuterStripMinedLoopInvariants:
Unexpected type: CountedLoopEnd.

or with outputs:

1 failure for node
 413  OuterStripMinedLoopEnd  === 417 41  [[ 414 399 ]] P=0,960468, C=22887,000000
At node
    415  OuterStripMinedLoop  === 415 180 414  [[ 415 416 ]]
  From path:
    [center] 413  OuterStripMinedLoopEnd  === 417 41  [[ 414 399 ]] P=0,960468, C=22887,000000
         --> 414  IfTrue  === 413  [[ 415 ]] #1
         --> 415  OuterStripMinedLoop  === 415 180 414  [[ 415 416 ]]
# OuterStripMinedLoopInvariants:
Non-unique output of expected type. Found: 0.

So far a small set of checks are implemented:

IfProjections: check that If nodes have a IfTrue and IfFalse
PhiArity: check that Phi nodes have a Region node of the same arity as 0th input
ControlSuccessor: check that control nodes have the right amount of successors (usually 1, but 2 for if-related nodes...)
RegionSelfLoop: check that regions are either copy, or have a self loop as 0th input
CountedLoopInvariants: check the structure around the backedge of a counted loop
OuterStripMinedLoopInvariants: check the structure around OuterStripMinedLoopEnd
MultiBranchNodeOut: check that for MultiBranch, outcnt is smaller than or equal to required_outcnt (it is legitimate to have a smaller number of output, especially after some optimizations).

Some of these checks have an additional subtlety: it's ok to have some wrong shape in dead code, for instance IfProjections. After a lot of investigation, it seems that some dead loops are not always detected eagerly and can make some control path survive longer, until being removed before loop opts. This seems to be by design to avoid traversing the whole graph everytime a region lose an input. It seems such misshape is harmless because they are not reachable from the inputs, and the cost of removing them would be prohibitive. To deal with such cases, when such a check fails, we check whether it happened in dead code. The dead of unreachable control nodes is lazily computed to answer that, and it's shared across checkers. While computing unreachable nodes is somewhat expensive, it seems to happen rarely in practice.

This verification has found JDK-8359344 and JDK-8359121. It has been run on tiers 1 to 3, plus some internal testing and, after fixing the above-mentioned, it seems all passing!

Related future: add more checks, should be easy.

Less related future: could we imagine using similar patterns (without the error reporting mechanism) to use for optimizations, instead of manual traversing? It could make the code clearer to understand. We could also imagine optionally using such things in idealization to declare which patterns nodes are looking for, and if they have depth greater than 1, automatically adapting the enqueuing strategy without having to pimp PhaseIterGVN::add_users_of_use_to_worklist everytime. Could at least cover some basic (but numerous) cases.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8350864: C2: verify structural invariants of the Ideal graph (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26362/head:pull/26362
$ git checkout pull/26362

Update a local copy of the PR:
$ git checkout pull/26362
$ git pull https://git.openjdk.org/jdk.git pull/26362/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26362

View PR using the GUI difftool:
$ git pr show -t 26362

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26362.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-07-17T07:25:39Z

👋 Welcome back mchevalier! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-07-17T07:25:56Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-07-17T07:26:36Z

@marc-chevalier The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-07-17T08:48:34Z

Webrevs

iwanowww

Very nice!

Some high-level comments:

IMO it's better to have node-specific invariant checks co-located with corresponding node (as Node::verify() maybe?); it would make it clearer what are the expectations when changing the implementation.
on naming: IMO VerifyIdealGraph would clearly describe what the logic does, fits existing conventions well, and easy to find

marc-chevalier · 2025-07-17T11:17:41Z

IMO it's better to have node-specific invariant checks co-located with corresponding node (as Node::verify() maybe?); it would make it clearer what are the expectations when changing the implementation.

I understand the motivation, but I'm not sure what to do in every case. For instance, when the pattern is not so small (like strip mining), it's hard to associate the invariant with a single node: many are involved, and it's not really describable as the expectation of a single node. Of course, we could split the pattern into a lot of sub-patterns, centered each around each node type, but then, we lose the overview of the structure, and it becomes context-free (e.g. a IfFalse must have a CountedLoopEnd input only when it comes before a safepoint before a OuterStripMinedLoopEnd, but not in general).

Another problematic case is the control successor check that has special handling for some kinds, that could be relocated to the said node types, but for the general case, it simply tests whether the node is a CFG node. One could still do that in Node, but it's then not tied to a specific node type, and I feel it bloats the Node class (that is already not so small). And then, I fear it makes the invariant harder to read since it will be distributed across many node types: I would need to find overrides of Verify. When working on a given node, it may be easier to see what I need to guarantee (or change the invariant), but when working on something else, it makes harder to find the invariants I can actually rely on, because it could always be overridden in a derived class.

I also have a readability concern. Even if we sort them by node types, then we mix the implementation of all invariants of a given node in a single method, making it extra-hard to understand the big picture, and when looking for overrides, I will find some, but maybe they won't be about the invariant I'm interested in.

And a code/maintenance concern. If I have a default implementation of the control successor check in Node, among other such general checks, I'm tempted to override it in IfNode to accept having more than one successor, but then, how do I perform the other general checks that still hold? I can't call Node::Verify since it will enforce the wrong number of successor. I could put these checks in another method, and call it from IfNode::Verify, but it has other annoying consequences: if they are all in the same side method, I can't customize another of these checks in another node type; if each check is in its own method all called from Node::Verify, I need to repeat the call in the overrides of Verify for all the checks one will add in the future... Overall, it seems risky to maintain. We could also just call Node::Verify and have here some handling to skip some steps for some node types, but I feel like that defeats the point of having invariants closer from the type.

Overall, it seems to me that it's beneficial to move checks to the node types if:

the pattern is small and clearly has a privileged node, so we won't be surprised by having the invariant implemented in another node type
the pattern doesn't have special cases for sub-types.

For instance, PhiArity would be a good candidate (about a special kind of node, no context needed, no exception). So, maybe a solution would be to split the checks in two sources: some that are like this, implemented directly in the node, and some that are less local (not about a node, but about bigger shapes), or needs more special cases, and that we keep standalone. I don't think having two sources of invariants is a problem at all.

marc-chevalier · 2025-07-17T11:19:25Z

on naming: IMO VerifyIdealGraph would clearly describe what the logic does, fits existing conventions well, and easy to find

Sure, fine with me! I'd be curious to see if somebody has other ideas.

vnkozlov · 2025-07-29T16:59:20Z

I am fine with VerifyIdealGraph flag. The main concern is we have tons of Verify* flags but I don't think we use them in CI testing. So we are forgetting about them, they will brake and few years later we are removing them like we did with VerifyOpto.

benoitmaillard

Great work, and great explanation as well! The invariants that are already implemented seem quite useful already, and it seems there is a lot of potential.

Having recently worked on a few missed optimizations related to PhaseIterGVN::add_users_of_use_to_worklist, I agree that it would be interesting to use such patterns for automatic notifications. The way I see it, we would need to somehow "reverse" the patterns, as they would be expressed from the point of view of the node on which the optimizations is applied, and would require notification when dependencies changes. Probably quite non-trivial, but interesting nonetheless.

I only have a few basic remarks/questions.

src/hotspot/share/opto/graphInvariants.hpp

src/hotspot/share/opto/graphInvariants.cpp

marc-chevalier · 2025-08-14T08:04:28Z

@vnkozlov: That is true. I think the idea was to use the in tests (typically stress tests) once integrated. Or at least, use it in the issues I found thanks to this flag. That should make it not totally at least.

eme64

Wow, very nice work @marc-chevalier !

Cool that you tried a pattern-matching approach. I really do wonder if we could use that more widely 😊

eme64 · 2025-08-25T13:30:04Z

src/hotspot/share/opto/graphInvariants.hpp

+// An invariant that needs only a local view of the graph, around a given node.
+class LocalGraphInvariant : public ResourceObj {
+public:
+  static constexpr int OutputStep = -1;


Can you please add a quick comment what this is for? After all, it is a public static constant ;)

eme64 · 2025-08-25T13:30:43Z

src/hotspot/share/opto/graphInvariants.hpp

+    bool is_node_dead(const Node*);
+  private:
+    void fill();
+    Unique_Node_List live_nodes;


I think the hotspot convention is to have fields with an underscore _live_nodes. Especially if they are private.

eme64 · 2025-08-25T13:32:37Z

src/hotspot/share/opto/graphInvariants.hpp

+  /* Check whether the invariant is true around the node [center]. The argument [steps] and [path] are initially empty.
+   *
+   * If the check fails steps and path must be filled with the path from the center to the failing node (where it's relevant to show).
+   * Given a list of node


Suggested change

* Given a list of node

* Given a list of nodes

eme64 · 2025-08-25T13:33:16Z

src/hotspot/share/opto/graphInvariants.hpp

+   * - path must have length k, and contain rk ... r1 where ri is:
+   *   - a non-negative integer p for each step such that N{i-1} has Ni as p-th input (we need to follow an input edge)
+   *   - the OUTPUT_STEP value in case N{i-1} has Ni as an output (we need to follow an output edge)
+   * The list are reversed to allow to easily fill them lazily on failure.


Suggested change

* The list are reversed to allow to easily fill them lazily on failure.

* The lists are reversed to allow to easily fill them lazily on failure.

eme64 · 2025-08-25T13:35:30Z

src/hotspot/share/opto/graphInvariants.hpp

+   * The parameter [live_nodes] is used to share the lazily computed set of CFG nodes reachable from root. This is because some
+   * checks don't apply to dead code, suppress their error if a violation is detected in dead code.


Does that mean we only cache the result if it is reachable, but not if it is not reachable? Does that mean we may check reachability for non-reachable nodes many many times?

eme64 · 2025-08-25T14:42:14Z

src/hotspot/share/opto/graphInvariants.cpp

+      return result;
+    }
+    assert(counted_loop != nullptr, "sanity");
+    if (is_long) {


Why did you cache the value? Seems is_long is only used once ... and center should not change pointers around.

eme64 · 2025-08-25T14:46:50Z

src/hotspot/share/opto/graphInvariants.cpp

+  ResourceMark rm;
+
+  if (_checks.is_empty()) {
+    return true;
+  }
+
+  VectorSet enqueued;


I would move the ResourceMark to the beginning of the allocations, and do the fast bail-out first.

eme64 · 2025-08-25T14:48:32Z

src/hotspot/share/opto/graphInvariants.cpp

+  // Sometimes, we get weird structure in dead code that will be cleaned up later. It typically happens
+  // when data dies, but control is not cleanup right away, possibly kept alive by un unreachable loop.
+  // Since we don't want to eagerly traverse the whole graph to remove dead code in IGVN, we can accept
+  // weird structure in dead code.
+  // For CFG-related errors, we will compute the set of reachable CFG nodes and decide whether to keep
+  // the issue if the problematic node is reachable. This set of reachable node is thus computed lazily
+  // (and it seems not to happen often in practice), and shared across checks.


Suggested change

// Sometimes, we get weird structure in dead code that will be cleaned up later. It typically happens

// when data dies, but control is not cleanup right away, possibly kept alive by un unreachable loop.

// Since we don't want to eagerly traverse the whole graph to remove dead code in IGVN, we can accept

// weird structure in dead code.

// For CFG-related errors, we will compute the set of reachable CFG nodes and decide whether to keep

// the issue if the problematic node is reachable. This set of reachable node is thus computed lazily

// (and it seems not to happen often in practice), and shared across checks.

// Sometimes, we get weird structures in dead code that will be cleaned up later. It typically happens

// when data dies, but control is not cleaned up right away, possibly kept alive by an unreachable loop.

// Since we don't want to eagerly traverse the whole graph to remove dead code in IGVN, we can accept

// weird structures in dead code.

// For CFG-related errors, we will compute the set of reachable CFG nodes and decide whether to keep

// the issue if the problematic node is reachable. This set of reachable nodes is thus computed lazily

// (and it seems not to happen often in practice), and shared across checks.

eme64 · 2025-08-25T14:51:03Z

src/hotspot/share/opto/graphInvariants.cpp

+      if (in != nullptr && !enqueued.test_set(in->_idx)) {
+        worklist.push(in);
+      }


Why not make a Unique_Node_List? It would already have a VectorSet included, and you could just push without checking if we already pushed the node. Very nice for BFS traversals.
You would then not even pop nodes, but just traverse over the worklist, as it grows.

eme64 · 2025-08-25T14:53:12Z

src/hotspot/share/opto/graphInvariants.cpp

+      ttyLocker ttyl;
+      tty->print("%d failure%s for node\n", failures, failures == 1 ? "" : "s");
+      center->dump();
+      tty->print_cr("%s", ss.base());
+      ss.reset();


Do you really want to use the ttyLocker here? I thought we were trying to get away from it because it sometimes leads to lock-priority issues / dead-locks.
Why not just use yet another stringStream ss3, and do it all via that one?

eme64 · 2025-08-25T14:57:08Z

I am fine with VerifyIdealGraph flag. The main concern is we have tons of Verify* flags but I don't think we use them in CI testing. So we are forgetting about them, they will brake and few years later we are removing them like we did with VerifyOpto.

Yes. What you need is at least a "Hello World" test, where the flag is enabled.
And then we should try to add it to stress and fuzzer tests, so file an RFE for that!

marc-chevalier added 6 commits July 15, 2025 11:42

Verify structural invariants

3b6f09a

Handle NeverBranch in ControlSuccessor

99453b4

Improve printing

9ef48bb

Improve printing and memory footprint

feebdc8

More comments

e09c60e

Fix declaration

944a8fe

openjdk bot added the hotspot-compiler [email protected] label Jul 17, 2025

marc-chevalier marked this pull request as ready for review July 17, 2025 08:43

openjdk bot added the rfr Pull request is ready for review label Jul 17, 2025

iwanowww reviewed Jul 17, 2025

View reviewed changes

Rename flag as suggested

9117fde

benoitmaillard reviewed Jul 31, 2025

View reviewed changes

src/hotspot/share/opto/graphInvariants.hpp Show resolved Hide resolved

src/hotspot/share/opto/graphInvariants.cpp Outdated Show resolved Hide resolved

src/hotspot/share/opto/graphInvariants.cpp Outdated Show resolved Hide resolved

Benoît's comments

700310e

eme64 suggested changes Aug 25, 2025

View reviewed changes

	* The list are reversed to allow to easily fill them lazily on failure.
	* The lists are reversed to allow to easily fill them lazily on failure.

		* The parameter [live_nodes] is used to share the lazily computed set of CFG nodes reachable from root. This is because some
		* checks don't apply to dead code, suppress their error if a violation is detected in dead code.

8350864: C2: verify structural invariants of the Ideal graph #26362

Are you sure you want to change the base?

8350864: C2: verify structural invariants of the Ideal graph #26362

Conversation

marc-chevalier commented Jul 17, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Jul 17, 2025

Uh oh!

openjdk bot commented Jul 17, 2025

Uh oh!

openjdk bot commented Jul 17, 2025

Uh oh!

mlbridge bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

iwanowww left a comment

Choose a reason for hiding this comment

Uh oh!

marc-chevalier commented Jul 17, 2025

Uh oh!

marc-chevalier commented Jul 17, 2025

Uh oh!

vnkozlov commented Jul 29, 2025

Uh oh!

benoitmaillard left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marc-chevalier commented Aug 14, 2025

Uh oh!

eme64 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eme64 commented Aug 25, 2025

Uh oh!

Uh oh!

marc-chevalier commented Jul 17, 2025 •

edited by openjdk bot

Loading

mlbridge bot commented Jul 17, 2025 •

edited

Loading

benoitmaillard left a comment •

edited

Loading