-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
gh-138122: Add --subprocesses flag to profile child processes in tachyon #142636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The _GNU_SOURCE macro must be defined before any system headers are included to enable GNU extensions like process_vm_readv. Moving it before the extern "C" block ensures it takes effect. The internal Python headers are also changed from angle brackets to quotes since they're local to the project. On macOS, the TARGET_OS_OSX macro may not be defined by older SDKs, so we now include TargetConditionals.h explicitly and provide a fallback definition when needed.
This implements platform-specific child process enumeration for use by
the profiler. On Linux it parses /proc/{pid}/stat to build a parent-
child map and then walks the tree from the target PID. On macOS it uses
proc_listchildpids() when available, falling back to scanning all
processes with proc_pidinfo(). On Windows it uses CreateToolhelp32Snapshot
with TH32CS_SNAPPROCESS to iterate through all processes.
The function returns a Python list of PIDs representing all descendants
of the given process. The recursive parameter controls whether only
direct children or all descendants are returned. This is the building
block needed for the --children flag in the sampling profiler CLI.
The ChildProcessMonitor class runs a background thread that polls for
new child processes spawned by the target. When it finds a new Python
process, it launches a separate profiler subprocess with the same
sampling options. Each child profiler writes to its own output file
with the child's PID appended to the filename pattern.
Detection uses a fast path on Linux (checking /proc/{pid}/exe for
"python" in the name) before falling back to the full RemoteUnwinder
probe. Non-Python children are silently skipped. There's a limit of
100 concurrent child profilers to avoid runaway resource usage if the
target forks heavily.
The --children flag is incompatible with --live mode since the curses
interface can't handle multiple profiler outputs simultaneously.
The test suite covers the get_child_pids C function with both recursive and non-recursive enumeration, the is_python_process detection helper, the ChildProcessMonitor lifecycle, and end-to-end CLI tests with --children for both attach and run modes. Tests use short-lived subprocesses with controlled lifecycles and polling-based synchronization rather than fixed sleeps to keep runtime reasonable. Resource cleanup is handled through reap_children and explicit process termination in finally blocks.
2d7e614 to
e6ca9f9
Compare
|
CC: @lkollar |
ebdb847 to
bc5dc46
Compare
|
I know this looks like "Oh no another 2k line PR from Pablo 😆 " (and it kind of is) but a lot of this is autogenerated files, tests and docs :) |
|
|
||
| .. code-block:: python | ||
| # worker_pool.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use :caption:?
| except (OSError, PermissionError): | ||
| # Can't read exe link, fall through to full probe | ||
| pass | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also try and fast path the other platforms? e.g. with proc_pidpath() on macOS and QueryFullProcessImageNameW() on Windows before falling back to the full probe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hummmmm, I can try to give it a go, but in my experience both calls are slower than just queuing for the Runtime state and check if it's there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let me experiment with this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with it a bit, what about a simpler check, e.g.: https://github.com/pablogsal/cpython/compare/tachyon-subprocesses-atomic...StanFromIreland:cpython:tachyon-subprocesses-atomic-speed?expand=1
Some simple benchmarks show a 2-4x speedup.
|
|
||
| Subprocess detection works by periodically scanning for new descendants of | ||
| the target process and checking whether each new process is a Python process. | ||
| On Linux, this uses a fast check of the executable name followed by a full |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of a nit but I'd probably opt to either 1) remove the platform-specific detail here or 2) add details about macOS and Windows as well.
|
When you're done making the requested changes, leave the comment: |
| if args.outfile: | ||
| # User specified output - add PID to filename | ||
| base, ext = os.path.splitext(args.outfile) | ||
| if ext: | ||
| return f"{base}_{{pid}}{ext}" | ||
| else: | ||
| return f"{args.outfile}_{{pid}}" | ||
| else: | ||
| # Use default pattern based on format | ||
| extension = FORMAT_EXTENSIONS.get(args.format, "txt") | ||
| if args.format == "heatmap": | ||
| return "heatmap_{pid}" | ||
| if args.format == "pstats": | ||
| # pstats defaults to stdout, but for subprocesses we need files | ||
| return "profile.{pid}.pstats" | ||
| return f"{args.format}.{{pid}}.{extension}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are switching between .s and _s, can we have a consistent output format?
|
|
||
|
|
||
| def _build_output_pattern(args): | ||
| if args.outfile: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a user provides a filename with braces? I think it would be simpler to just replace the format call with something like self.output_pattern.replace("{pid}", str(child_pid)).
This adds the ability to automatically profile child processes spawned by the target when using the sampling profiler. When the
--subprocessesflag is passed, a background monitor thread polls for new descendants of the target process and spawns separate profiler instances for each Python child it discovers. Each child profiler inherits the sampling options from the parent (interval, duration, thread selection, native frames, async-aware mode, output format) and writes to its own output file with the child's PID appended to the filename.There is a limit of 100 concurrent child profilers to prevent resource exhaustion when profiling applications that fork heavily. The
--subprocessesflag is incompatible with--livemode since the curses interface cannot accommodate multiple concurrent profiler displays. This is useful for profiling applications that usemultiprocessing,ProcessPoolExecutor, or other subprocess-based parallelism.