|
| 1 | +# Precision Grounding + Inspect Overlay (Opus Execution Plan) |
| 2 | + |
| 3 | +## Summary |
| 4 | +- Align grid math across overlay, main, and AI prompts using shared constants. |
| 5 | +- Add local fine grid around the cursor for precise targeting without full-grid noise. |
| 6 | +- Introduce devtools-style inspect overlays (actionable element boxes + metadata). |
| 7 | +- Ensure AI uses the same visual grounding as the user. |
| 8 | + |
| 9 | +## Goals / Non-Goals |
| 10 | +**Goals** |
| 11 | +- User and AI see the same targeting primitives (grid + inspect metadata). |
| 12 | +- Fine precision selection without needing full fine-grid visibility. |
| 13 | +- Deterministic coordinate mapping across renderer/main/AI prompt. |
| 14 | + |
| 15 | +**Non-Goals** |
| 16 | +- Full external app DOM access (we rely on OCR + visual detection). |
| 17 | +- Replacing the grid system entirely. |
| 18 | + |
| 19 | +## Problem |
| 20 | +- Fine dots do not appear around the cursor, preventing high-precision selection. |
| 21 | +- AI coordinate grounding drifts due to mismatched math across modules. |
| 22 | +- AI lacks the same visualization/inspection context the user sees. |
| 23 | + |
| 24 | +## Approach |
| 25 | +1) Shared grid math module used by renderer, main, and AI prompt. |
| 26 | +2) Local fine-grid rendering around cursor in selection mode. |
| 27 | +3) Inspect layer backed by visual-awareness to surface actionable regions. |
| 28 | +4) AI prompt + action executor aligned to overlay math and inspect metadata. |
| 29 | + |
| 30 | +## Key Changes (Planned) |
| 31 | +- `src/shared/grid-math.js`: canonical grid constants + label → pixel conversion. |
| 32 | +- `src/renderer/overlay/overlay.js`: local fine-grid render + shared math usage. |
| 33 | +- `src/renderer/overlay/preload.js`: expose grid math to renderer safely. |
| 34 | +- `src/main/system-automation.js`: unify coordinate mapping. |
| 35 | +- `src/main/ai-service.js`: ground prompts + fine label support. |
| 36 | +- `src/main/index.js`: optional inspect toggle + overlay commands. |
| 37 | +- `src/main/visual-awareness.js`: actionable element detection + metadata surface. |
| 38 | + |
| 39 | +## Implementation Plan |
| 40 | +**Phase 1: Grounding & Precision** |
| 41 | +- [x] Shared grid math module and renderer/main integration. |
| 42 | +- [x] Local fine-grid around cursor with snap highlight. |
| 43 | +- [ ] Add label→pixel IPC from main to overlay to guarantee exact mapping. |
| 44 | +- [ ] Add fine label rendering on hover (C3.12) in local grid. |
| 45 | + |
| 46 | +**Phase 2: Inspect Overlay (Devtools‑Style)** |
| 47 | +- [ ] Add inspect toggle command and UI indicator. |
| 48 | +- [ ] Visual-awareness pass: actionable region detection (buttons, inputs, links). |
| 49 | +- [ ] Overlay layer draws bounding boxes + tooltip with text/role/confidence. |
| 50 | +- [ ] Selection handoff: click through to element center. |
| 51 | + |
| 52 | +**Phase 3: AI Grounding + Action Execution** |
| 53 | +- [ ] Include inspect metadata + screen size in AI context. |
| 54 | +- [ ] Prefer inspect targets; fallback to grid only if needed. |
| 55 | +- [ ] Add “precision click” action with safety confirmation. |
| 56 | + |
| 57 | +## UX Notes |
| 58 | +- Inspect mode should be visually distinct (e.g., cyan boxes, tooltip anchored). |
| 59 | +- Local fine grid should fade in/out smoothly and never block click-through. |
| 60 | +- Keep overlays under 16ms frame budget; throttle redraw to pointer move. |
| 61 | + |
| 62 | +## Testing |
| 63 | +**Unit** |
| 64 | +- Grid label conversions (coarse + fine). |
| 65 | +- Shared constants remain consistent across renderer/main/AI. |
| 66 | + |
| 67 | +**Manual** |
| 68 | +- Cursor-local fine dots appear in selection mode and track cursor. |
| 69 | +- Background click-through still works in both modes. |
| 70 | +- Inspect overlay alignment with visible UI elements. |
| 71 | + |
| 72 | +**Regression** |
| 73 | +- Coarse grid rendering. |
| 74 | +- Pulse effect visibility. |
| 75 | +- Safety confirmation flow intact. |
| 76 | + |
| 77 | +## Risks / Mitigations |
| 78 | +- DPI scaling drift → use Electron `screen.getPrimaryDisplay().scaleFactor`. |
| 79 | +- Performance → local fine grid only; throttled draw. |
| 80 | +- Overlay click-through → hide overlay only at click execution. |
| 81 | + |
| 82 | +## Observability / Debugging |
| 83 | +- Add a debug overlay toggle for grid math readouts. |
| 84 | +- Log label→pixel conversions when in inspect mode. |
| 85 | +- Capture last 10 action targets in memory for post-mortem. |
| 86 | + |
| 87 | +## Opus Notes (Websearch Required) |
| 88 | +- Verify Electron overlay best practices (`setIgnoreMouseEvents` behavior). |
| 89 | +- Validate DPI/scaling guidance for Windows and macOS. |
| 90 | +- Check common patterns for devtools-like overlays. |
| 91 | + |
| 92 | +## Checklist |
| 93 | +- [ ] Shared grid math used everywhere (renderer, main, AI prompt). |
| 94 | +- [ ] Local fine grid visible and performant. |
| 95 | +- [ ] Inspect overlay works and aligns with AI context. |
| 96 | +- [ ] AI actions target inspect regions with correct coordinates. |
| 97 | +- [ ] Tests updated/added and passing. |
0 commit comments