Skip to content

Commit 4c853f4

Browse files
committed
Add shared grid math and local fine grid; expand PR template
1 parent 05ce620 commit 4c853f4

File tree

8 files changed

+308
-78
lines changed

8 files changed

+308
-78
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Precision Grounding + Inspect Overlay (Opus Execution Plan)
2+
3+
## Summary
4+
- Align grid math across overlay, main, and AI prompts using shared constants.
5+
- Add local fine grid around the cursor for precise targeting without full-grid noise.
6+
- Introduce devtools-style inspect overlays (actionable element boxes + metadata).
7+
- Ensure AI uses the same visual grounding as the user.
8+
9+
## Goals / Non-Goals
10+
**Goals**
11+
- User and AI see the same targeting primitives (grid + inspect metadata).
12+
- Fine precision selection without needing full fine-grid visibility.
13+
- Deterministic coordinate mapping across renderer/main/AI prompt.
14+
15+
**Non-Goals**
16+
- Full external app DOM access (we rely on OCR + visual detection).
17+
- Replacing the grid system entirely.
18+
19+
## Problem
20+
- Fine dots do not appear around the cursor, preventing high-precision selection.
21+
- AI coordinate grounding drifts due to mismatched math across modules.
22+
- AI lacks the same visualization/inspection context the user sees.
23+
24+
## Approach
25+
1) Shared grid math module used by renderer, main, and AI prompt.
26+
2) Local fine-grid rendering around cursor in selection mode.
27+
3) Inspect layer backed by visual-awareness to surface actionable regions.
28+
4) AI prompt + action executor aligned to overlay math and inspect metadata.
29+
30+
## Key Changes (Planned)
31+
- `src/shared/grid-math.js`: canonical grid constants + label → pixel conversion.
32+
- `src/renderer/overlay/overlay.js`: local fine-grid render + shared math usage.
33+
- `src/renderer/overlay/preload.js`: expose grid math to renderer safely.
34+
- `src/main/system-automation.js`: unify coordinate mapping.
35+
- `src/main/ai-service.js`: ground prompts + fine label support.
36+
- `src/main/index.js`: optional inspect toggle + overlay commands.
37+
- `src/main/visual-awareness.js`: actionable element detection + metadata surface.
38+
39+
## Implementation Plan
40+
**Phase 1: Grounding & Precision**
41+
- [x] Shared grid math module and renderer/main integration.
42+
- [x] Local fine-grid around cursor with snap highlight.
43+
- [ ] Add label→pixel IPC from main to overlay to guarantee exact mapping.
44+
- [ ] Add fine label rendering on hover (C3.12) in local grid.
45+
46+
**Phase 2: Inspect Overlay (Devtools‑Style)**
47+
- [ ] Add inspect toggle command and UI indicator.
48+
- [ ] Visual-awareness pass: actionable region detection (buttons, inputs, links).
49+
- [ ] Overlay layer draws bounding boxes + tooltip with text/role/confidence.
50+
- [ ] Selection handoff: click through to element center.
51+
52+
**Phase 3: AI Grounding + Action Execution**
53+
- [ ] Include inspect metadata + screen size in AI context.
54+
- [ ] Prefer inspect targets; fallback to grid only if needed.
55+
- [ ] Add “precision click” action with safety confirmation.
56+
57+
## UX Notes
58+
- Inspect mode should be visually distinct (e.g., cyan boxes, tooltip anchored).
59+
- Local fine grid should fade in/out smoothly and never block click-through.
60+
- Keep overlays under 16ms frame budget; throttle redraw to pointer move.
61+
62+
## Testing
63+
**Unit**
64+
- Grid label conversions (coarse + fine).
65+
- Shared constants remain consistent across renderer/main/AI.
66+
67+
**Manual**
68+
- Cursor-local fine dots appear in selection mode and track cursor.
69+
- Background click-through still works in both modes.
70+
- Inspect overlay alignment with visible UI elements.
71+
72+
**Regression**
73+
- Coarse grid rendering.
74+
- Pulse effect visibility.
75+
- Safety confirmation flow intact.
76+
77+
## Risks / Mitigations
78+
- DPI scaling drift → use Electron `screen.getPrimaryDisplay().scaleFactor`.
79+
- Performance → local fine grid only; throttled draw.
80+
- Overlay click-through → hide overlay only at click execution.
81+
82+
## Observability / Debugging
83+
- Add a debug overlay toggle for grid math readouts.
84+
- Log label→pixel conversions when in inspect mode.
85+
- Capture last 10 action targets in memory for post-mortem.
86+
87+
## Opus Notes (Websearch Required)
88+
- Verify Electron overlay best practices (`setIgnoreMouseEvents` behavior).
89+
- Validate DPI/scaling guidance for Windows and macOS.
90+
- Check common patterns for devtools-like overlays.
91+
92+
## Checklist
93+
- [ ] Shared grid math used everywhere (renderer, main, AI prompt).
94+
- [ ] Local fine grid visible and performant.
95+
- [ ] Inspect overlay works and aligns with AI context.
96+
- [ ] AI actions target inspect regions with correct coordinates.
97+
- [ ] Tests updated/added and passing.

package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@
44
"description": "GitHub Copilot CLI with headless agent + ultra-thin overlay architecture",
55
"main": "src/main/index.js",
66
"scripts": {
7-
"start": "node scripts/start.js"
7+
"start": "node scripts/start.js",
8+
"test": "node scripts/test-grid.js"
89
},
910
"keywords": [
1011
"copilot",

scripts/test-grid.js

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
const assert = require('assert');
2+
const { gridToPixels } = require('../src/main/system-automation');
3+
4+
function expectCoord(label, expectedX, expectedY) {
5+
const result = gridToPixels(label);
6+
assert.strictEqual(result.x, expectedX, `${label} x`);
7+
assert.strictEqual(result.y, expectedY, `${label} y`);
8+
}
9+
10+
expectCoord('A0', 50, 50);
11+
expectCoord('B0', 150, 50);
12+
expectCoord('A1', 50, 150);
13+
expectCoord('C3', 250, 350);
14+
expectCoord('Z0', 2550, 50);
15+
expectCoord('AA0', 2650, 50);
16+
expectCoord('C3.12', 237.5, 362.5);
17+
18+
console.log('gridToPixels tests passed');

src/main/ai-service.js

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ const SYSTEM_PROMPT = `You are Liku, an intelligent AGENTIC AI assistant integra
103103
- **Format**: "C3" = column C (index 2), row 3 = pixel (250, 350)
104104
- **Formula**: x = 50 + col_index * 100, y = 50 + row_index * 100
105105
- A0 ≈ (50, 50), B0 ≈ (150, 50), A1 ≈ (50, 150)
106+
- **Fine Grid**: Sub-labels like C3.12 refer to 25px subcells inside C3
106107
107108
3. **SYSTEM CONTROL - AGENTIC ACTIONS**: You can execute actions on the user's computer:
108109
- **Click**: Click at coordinates
@@ -139,10 +140,11 @@ When the user asks you to DO something (click, type, interact), respond with a J
139140
- \`{"type": "screenshot"}\` - Take screenshot to verify result
140141
141142
### Grid to Pixel Conversion:
142-
- A1 → (100, 100), B1 → (200, 100), C1 → (300, 100)
143-
- A2 → (100, 200), B2 → (200, 200), C2 → (300, 200)
144-
- Formula: x = 100 + (column_number - 1) * 100, y = 100 + (row_number - 1) * 100
145-
- Column A=1, B=2, C=3... so C3 = x: 100 + 2*100 = 300, y: 100 + 2*100 = 300
143+
- A0 → (50, 50), B0 → (150, 50), C0 → (250, 50)
144+
- A1 → (50, 150), B1 → (150, 150), C1 → (250, 150)
145+
- Formula: x = 50 + col_index * 100, y = 50 + row_index * 100
146+
- Column A=0, B=1, C=2... so C3 = x: 50 + 2*100 = 250, y: 50 + 3*100 = 350
147+
- Fine labels: C3.12 = x: 12.5 + (2*4+1)*25 = 237.5, y: 12.5 + (3*4+2)*25 = 362.5
146148
147149
## Response Guidelines
148150
@@ -1519,7 +1521,7 @@ async function resumeAfterConfirmation(onAction = null, onScreenshot = null, opt
15191521
* Convert grid coordinate to pixel position
15201522
*/
15211523
function gridToPixels(coord) {
1522-
return systemAutomation.gridToPixels(coord, { width: 1920, height: 1080 });
1524+
return systemAutomation.gridToPixels(coord);
15231525
}
15241526

15251527
module.exports = {

src/main/system-automation.js

Lines changed: 11 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77

88
const { exec, spawn } = require('child_process');
99
const path = require('path');
10+
const gridMath = require('../shared/grid-math');
1011

1112
// Action types the AI can request
1213
const ACTION_TYPES = {
@@ -495,32 +496,18 @@ function parseAIActions(aiResponse) {
495496
* @param {Object} screenSize - {width, height} of the screen
496497
* @param {number} coarseSpacing - Spacing of coarse grid (default 100)
497498
*/
498-
function gridToPixels(coord, screenSize, coarseSpacing = 100) {
499-
// Parse coordinate: letters for column, numbers for row
500-
const match = coord.match(/^([A-Za-z]+)(\d+)$/);
501-
if (!match) {
499+
function gridToPixels(coord) {
500+
const coords = gridMath.labelToScreenCoordinates(coord);
501+
if (!coords) {
502502
throw new Error(`Invalid coordinate format: ${coord}`);
503503
}
504-
505-
const colStr = match[1].toUpperCase();
506-
const row = parseInt(match[2], 10);
507-
508-
// Convert column letters to number (A=0, B=1, ..., Z=25, AA=26, etc.)
509-
let col = 0;
510-
for (let i = 0; i < colStr.length; i++) {
511-
col = col * 26 + (colStr.charCodeAt(i) - 64);
512-
}
513-
col--; // Make 0-indexed
514-
515-
// Calculate pixel position - grid starts at startOffset (50px) to cover full screen
516-
// This MUST match overlay.js: startOffset = coarseSpacing / 2
517-
const startOffset = coarseSpacing / 2; // 50px for default 100px spacing
518-
const x = startOffset + col * coarseSpacing;
519-
const y = startOffset + row * coarseSpacing;
520-
521-
console.log(`[AUTOMATION] gridToPixels: ${coord} -> col=${col}, row=${row} -> (${x}, ${y})`);
522-
523-
return { x, y, col, row };
504+
505+
const labelInfo = coords.isFine
506+
? `fineCol=${coords.fineCol}, fineRow=${coords.fineRow}`
507+
: `col=${coords.colIndex}, row=${coords.rowIndex}`;
508+
console.log(`[AUTOMATION] gridToPixels: ${coord} -> ${labelInfo} -> (${coords.x}, ${coords.y})`);
509+
510+
return coords;
524511
}
525512

526513
module.exports = {

0 commit comments

Comments
 (0)