-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Description:
In a complex async or multi-threaded application, tasks often get "stuck" due to deadlocks (waiting on a lock that never releases), infinite loops, or hanging network requests. Since Ferret will track "Open Spans" (active functions) in real-time (via Issue 7), we can implement a Watchdog that identifies spans that have been running longer than a safety threshold.
Proposed Solution:
- Stall Detection: The
ferret watchsystem scans the list of active spans. If a span's duration exceeds a configuredstall_threshold(e.g., 10s), it is flagged as "Stalled". - Stack Trace Dumping: When a stall is detected, Ferret should attempt to capture the exact line number where the code is stuck.
- For Threads: Use
sys._current_frames()to find the stack trace of the thread owning the span. - For Async: Inspect the
asynciotask associated with the span.
- For Threads: Use
Tasks:
- Configuration (
ferret/core.py):- Add
stall_threshold(float, seconds) toProfilerconfig. - Add
enable_watchdog(bool).
- Add
- Watchdog Thread (
ferret/watchdog.py):- Create a daemon thread that wakes up periodically (e.g., every 5s).
- Iterate over all active
Spanobjects in memory. - If
(time.now() - span.start_time) > stall_threshold:- Mark span status as
STALLED. - Capture Context: Grab the current stack trace for that thread/task using
tracebackandsys._current_frames(). - Emit a
METRICorERRORevent to BeaverDB containing this stack dump.
- Mark span status as
- UI Integration (
ferret/tui.py):- Update
ferret watchto show a "💀 Deadlocks / Stalls" section. - Highlight stalled spans in Red in the Live Tree.
- Allow the user to click a stalled span to see the captured stack trace (telling them exactly where the deadlock is).
- Update
Acceptance Criteria:
- If a profiled function sleeps for 30s (with a 5s threshold), it appears in the "Deadlocks" list after 5s.
- The detailed view shows the filename and line number (e.g.,
await futureorlock.acquire()) where the code is hanging.
Metadata
Metadata
Assignees
Labels
No labels