Skip to content

gdb/testsuite/gdb.rocm: extract reusable multi-inferior driver helpers#166

Open
spatrang wants to merge 2 commits into
amd-stagingfrom
users/spatrang/multi-inferior-test-helper
Open

gdb/testsuite/gdb.rocm: extract reusable multi-inferior driver helpers#166
spatrang wants to merge 2 commits into
amd-stagingfrom
users/spatrang/multi-inferior-test-helper

Conversation

@spatrang

@spatrang spatrang commented Jun 10, 2026

Copy link
Copy Markdown

Why this PR

This is preparatory refactoring split out of #131 at reviewer request.
While reviewing #131 (which adds a new multi-inferior stress test), it
was noted that the new test shares most of its driver logic with the
existing gdb.rocm/multi-inferior-gpu.exp. Rather than duplicate that
logic, the common parts are extracted here into shared helpers first, so
that #131 can reuse them and its diff reduces to just what is genuinely
new.

The tests are kept separate (only the driver logic is shared).

Dependent PR

Summary

Extract the shared non-stop multi-inferior driver logic out of
gdb.rocm/multi-inferior-gpu.exp into two reusable helper procs in
gdb/testsuite/lib/rocm.exp, and convert the existing test to use them.

Helpers added (lib/rocm.exp)

  • rocm_multi_inferior_run_to_kernels {args_list expected} — load the
    program, enable non-stop with detach-on-fork off / follow-fork parent, plant the breakpoints, run the parent to its pre-fork
    breakpoint, resume in the background, and collect one kernel
    breakpoint stop per child inferior. Returns the list of stopped GPU
    thread ids. The child count can be passed explicitly or discovered at
    runtime.
  • rocm_multi_inferior_drain {threads} — continue each stopped GPU
    inferior to a clean exit, wait for the parent to reach its
    post-waitpid breakpoint, and run the parent to completion.

Behavior note

The extracted driver is intentionally stricter than the inlined
original: it deduplicates GPU stops by inferior, uses literal-matched
regexes, and fails loudly on timeout or a non-zero child exit instead of
hanging. Coverage of the converted test is otherwise unchanged.

Files changed

  • gdb/testsuite/lib/rocm.exp — add the two helpers.
  • gdb/testsuite/gdb.rocm/multi-inferior-gpu.exp — convert to use them.

Move the shared non-stop multi-inferior driver logic into two helper
procs in lib/rocm.exp: rocm_multi_inferior_run_to_kernels (set up the
session, run the parent to the pre-fork breakpoint, resume, and collect
one kernel stop per child) and rocm_multi_inferior_drain (continue each
child to a clean exit and run the parent to completion).

Convert multi-inferior-gpu.exp to use them.  The extracted driver is
intentionally stricter than the inlined original: it deduplicates GPU
stops by inferior, uses literal-matched regexes, and fails loudly on
timeout or a non-zero child exit instead of hanging.  A follow-up test
reuses the same helpers.
@spatrang spatrang requested a review from a team as a code owner June 10, 2026 17:06

@lancesix lancesix left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not really thought deeply about it, but the thing which tickles me here is that this helper implicitly relies on properties of the source file (the markers where to insert breakpoints).

If the functions built around those source assumptions are common, I'd expect the source file to be common as well. If we get to a point where we have multiple tests using those helpers, it will get harder to keep the source / tcl bits in sync.

It really feels like the .cpp of multi-inferior-gpu should also be made generic if we go this way.

Comment thread gdb/testsuite/lib/rocm.exp Outdated
@spatrang

Copy link
Copy Markdown
Author

It really feels like the .cpp of multi-inferior-gpu should also be made generic if we go this way.

Done. Rather than add a separate program, I generalized multi-inferior-gpu.cpp in place so the breakpoint markers the helpers rely on live in a single source: the child count now comes from argv when given and otherwise defaults to the detected GPU device count, and each child re-execs through a child argv dispatch. The follow-up stress test (#131) will point standard_testfile at this same source instead of carrying its own copy, so the .cpp and the .exp helpers stay in sync.

Generalize multi-inferior-gpu.cpp into a shared driver for the
multi-inferior tests so the breakpoint markers the lib/rocm.exp helpers
rely on live in one program instead of being duplicated.  The child
count is taken from argv when given and otherwise defaults to the number
of GPU devices found at runtime, and each child re-execs itself through
a "child" argv dispatch.

Give rocm_multi_inferior_run_to_kernels default argument values so
callers that want runtime discovery can omit them.
N comes from argv[1] when given; otherwise it defaults to the number
of GPU devices found at runtime (one child per device). The
companion .exp helpers in lib/rocm.exp plant breakpoints on the
pre-fork and post-join source markers and on the kernel. */

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/post-join/post-waitpid.

return "others"
}
}

@lumachad lumachad Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire rocm.exp change/code should be in gdb.rocm/ rather than lib/rocm.exp. You just need the common part as a TCL file and then include the file in whatever test that wants to use it. lib/rocm.exp changes are supposed to be for infrastructure purposes or very basic testsuite functionality.


#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cosmetic: Group the headers. First sys/, then hip/ then the rest. Or another order, but keep it organized since we're touching this anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants