Skip to content

Conversation

@pablogsal
Copy link
Member

@pablogsal pablogsal commented Dec 12, 2025

This adds the ability to automatically profile child processes spawned by the target when using the sampling profiler. When the --subprocesses flag is passed, a background monitor thread polls for new descendants of the target process and spawns separate profiler instances for each Python child it discovers. Each child profiler inherits the sampling options from the parent (interval, duration, thread selection, native frames, async-aware mode, output format) and writes to its own output file with the child's PID appended to the filename.

There is a limit of 100 concurrent child profilers to prevent resource exhaustion when profiling applications that fork heavily. The --subprocesses flag is incompatible with --live mode since the curses interface cannot accommodate multiple concurrent profiler displays. This is useful for profiling applications that use multiprocessing, ProcessPoolExecutor, or other subprocess-based parallelism.

The _GNU_SOURCE macro must be defined before any system headers are
included to enable GNU extensions like process_vm_readv. Moving it
before the extern "C" block ensures it takes effect. The internal
Python headers are also changed from angle brackets to quotes since
they're local to the project.

On macOS, the TARGET_OS_OSX macro may not be defined by older SDKs,
so we now include TargetConditionals.h explicitly and provide a
fallback definition when needed.
This implements platform-specific child process enumeration for use by
the profiler. On Linux it parses /proc/{pid}/stat to build a parent-
child map and then walks the tree from the target PID. On macOS it uses
proc_listchildpids() when available, falling back to scanning all
processes with proc_pidinfo(). On Windows it uses CreateToolhelp32Snapshot
with TH32CS_SNAPPROCESS to iterate through all processes.

The function returns a Python list of PIDs representing all descendants
of the given process. The recursive parameter controls whether only
direct children or all descendants are returned. This is the building
block needed for the --children flag in the sampling profiler CLI.
The ChildProcessMonitor class runs a background thread that polls for
new child processes spawned by the target. When it finds a new Python
process, it launches a separate profiler subprocess with the same
sampling options. Each child profiler writes to its own output file
with the child's PID appended to the filename pattern.

Detection uses a fast path on Linux (checking /proc/{pid}/exe for
"python" in the name) before falling back to the full RemoteUnwinder
probe. Non-Python children are silently skipped. There's a limit of
100 concurrent child profilers to avoid runaway resource usage if the
target forks heavily.

The --children flag is incompatible with --live mode since the curses
interface can't handle multiple profiler outputs simultaneously.
The test suite covers the get_child_pids C function with both recursive
and non-recursive enumeration, the is_python_process detection helper,
the ChildProcessMonitor lifecycle, and end-to-end CLI tests with
--children for both attach and run modes.

Tests use short-lived subprocesses with controlled lifecycles and
polling-based synchronization rather than fixed sleeps to keep runtime
reasonable. Resource cleanup is handled through reap_children and
explicit process termination in finally blocks.
@pablogsal
Copy link
Member Author

CC: @lkollar

@pablogsal pablogsal force-pushed the tachyon-subprocesses-atomic branch from ebdb847 to bc5dc46 Compare December 12, 2025 15:20
@pablogsal pablogsal requested a review from ambv December 12, 2025 15:21
@pablogsal pablogsal changed the title gh-138122: Add --children flag to profile child processes in tachyon gh-138122: Add --subprocesses flag to profile child processes in tachyon Dec 12, 2025
@pablogsal
Copy link
Member Author

pablogsal commented Dec 12, 2025

I know this looks like "Oh no another 2k line PR from Pablo 😆 " (and it kind of is) but a lot of this is autogenerated files, tests and docs :)

@pablogsal pablogsal marked this pull request as ready for review December 12, 2025 15:31

.. code-block:: python
# worker_pool.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use :caption:?

except (OSError, PermissionError):
# Can't read exe link, fall through to full probe
pass

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also try and fast path the other platforms? e.g. with proc_pidpath() on macOS and QueryFullProcessImageNameW() on Windows before falling back to the full probe?

Copy link
Member Author

@pablogsal pablogsal Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hummmmm, I can try to give it a go, but in my experience both calls are slower than just queuing for the Runtime state and check if it's there

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let me experiment with this :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with it a bit, what about a simpler check, e.g.: https://github.com/pablogsal/cpython/compare/tachyon-subprocesses-atomic...StanFromIreland:cpython:tachyon-subprocesses-atomic-speed?expand=1

Some simple benchmarks show a 2-4x speedup.


Subprocess detection works by periodically scanning for new descendants of
the target process and checking whether each new process is a Python process.
On Linux, this uses a fast check of the executable name followed by a full
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of a nit but I'd probably opt to either 1) remove the platform-specific detail here or 2) add details about macOS and Windows as well.

@bedevere-app
Copy link

bedevere-app bot commented Dec 12, 2025

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Comment on lines +147 to +162
if args.outfile:
# User specified output - add PID to filename
base, ext = os.path.splitext(args.outfile)
if ext:
return f"{base}_{{pid}}{ext}"
else:
return f"{args.outfile}_{{pid}}"
else:
# Use default pattern based on format
extension = FORMAT_EXTENSIONS.get(args.format, "txt")
if args.format == "heatmap":
return "heatmap_{pid}"
if args.format == "pstats":
# pstats defaults to stdout, but for subprocesses we need files
return "profile.{pid}.pstats"
return f"{args.format}.{{pid}}.{extension}"
Copy link
Member

@StanFromIreland StanFromIreland Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are switching between .s and _s, can we have a consistent output format?



def _build_output_pattern(args):
if args.outfile:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a user provides a filename with braces? I think it would be simpler to just replace the format call with something like self.output_pattern.replace("{pid}", str(child_pid)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants