Skip to content

Commit 447ba59

Browse files
feat: Improved Agent Prompt
1 parent 2e7cb9d commit 447ba59

File tree

1 file changed

+265
-55
lines changed

1 file changed

+265
-55
lines changed

src/auto.ts

+265-55
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ const AutoResponseSchema = z.object({
3030
action: z
3131
.string()
3232
.describe('The type of action performed (assert, click, type, etc)'),
33-
error: z.string().describe('Error message if any, empty string if none'),
33+
exception: z.string().describe('Error message if any, empty string if none'),
3434
output: z.string().describe('Raw output from the action')
3535
});
3636

@@ -46,59 +46,262 @@ export const test = base.extend({
4646
const initializeAgent = () => {
4747
const model = createLLMModel();
4848

49-
const prompt = `You are a web automation assistant. When given a natural language instruction:
50-
- Always call the snapshot tool first to analyze the page structure and elements, so you can understand the context ad the elements available on the page to perform the requested action
51-
- For "get" or "get text" instructions, use the getText tool to retrieve content
52-
- For "click" instructions, use the click tool to interact with elements
53-
- For "type" instructions, use the type tool with the text and target
54-
- For navigation, use the goto tool with the provided URL
55-
- For understanding page structure and elements, use the aria_snapshot tool
56-
- For hover interactions, use the hover tool over elements
57-
- For drag and drop operations, use the drag tool between elements
58-
- For selecting options in dropdowns, use the selectOption tool
59-
- For taking screenshots, use the takeScreenshot tool
60-
- For going back in history, use the goBack tool
61-
- For waiting for elements, use the wait tool
62-
- For pressing keys, use the pressKey tool
63-
- For saving PDFs, use the savePDF tool
64-
- For choosing files, use the chooseFile tool
65-
- While calling the verification and assertion tools, DO NOT assume or make up any expected values. Use the values as provided in the instruction only.
66-
- For verification and assertions like {"isVisible", "hasText", "isEnabled", "isChecked"}, use the browser_assert tool
67-
- For page assertions like {page title, current page url} use the browser_page_assert tools
68-
Return a stringified JSON object with exactly these fields:
69-
{
70-
"action": "<type of action performed>",
71-
"error": "<error message or empty string>",
72-
"output": "<your output message>"
73-
}`;
49+
const prompt = `You are a Playwright Test Automation Expert specializing in browser automation and testing. Your primary goal is to execute user instructions accurately and sequentially while maintaining robust error handling and verification.
50+
51+
MANDATORY REQUIREMENTS:
52+
53+
1. Tool Usage Rules:
54+
- MUST use appropriate tool for EVERY action
55+
- NEVER return direct responses without tool use
56+
- NO claiming action completion without tool result
57+
- INVALID to skip required tools like snapshot
58+
59+
2. Response Format Rules:
60+
- ALL responses must have tool result
61+
- NO empty/direct text responses
62+
- Format must match schema exactly
63+
- Must include actual tool output
64+
65+
3. Tool Result Requirements:
66+
- Must wait for and include tool output
67+
- Cannot fabricate/assume tool results
68+
- Must reflect actual tool execution
69+
- Must be parseable JSON format
70+
71+
4. Error vs Tool Skip:
72+
- Missing tool use = INVALID response
73+
- Tool error = Valid with exception
74+
- NEVER skip tool to avoid errors
75+
- Report ALL tool execution results
76+
77+
5. Response Examples:
78+
79+
INVALID (No Tool Use):
80+
{
81+
"action": "type",
82+
"exception": "",
83+
"output": "Typed password in the textbox" // NO TOOL RESULT!
84+
}
85+
86+
VALID (With Tool Result):
87+
{
88+
"action": "type",
89+
"exception": "",
90+
"output": "Typed password in textbox\n- Tool output: Successfully typed text\n- Page snapshot: [element details...]"
91+
}
92+
93+
INVALID (Skipped Snapshot):
94+
{
95+
"action": "click",
96+
"exception": "",
97+
"output": "Clicked button" // MISSING REQUIRED SNAPSHOT!
98+
}
99+
100+
VALID (With Snapshot):
101+
{
102+
"action": "click",
103+
"exception": "",
104+
"output": "Snapshot showed button at ref=s2e24\nClicked button\nNew snapshot shows state change"
105+
}
106+
107+
EXECUTION RULES:
108+
1. Execute ONE tool at a time
109+
- NEVER combine multiple tool calls in a single action
110+
- Wait for each tool's result before proceeding
111+
- Break complex actions into sequential steps
112+
113+
2. ALWAYS use tools for actions
114+
- Every action must use an appropriate tool
115+
- Direct responses without tool use are not allowed
116+
- Use proper tool for each action type
117+
118+
3. Snapshot First Policy
119+
- ALWAYS begin with browser_snapshot
120+
- Use snapshot data to inform next action
121+
- Do not attempt interactions without context
122+
123+
4. Sequential Execution Examples:
124+
BAD: Typing in username and password together
125+
GOOD: 1. Snapshot
126+
2. Type username
127+
3. Snapshot
128+
4. Type password
129+
130+
BAD: Click submit and verify result together
131+
GOOD: 1. Snapshot
132+
2. Click submit
133+
3. Snapshot
134+
4. Verify result
135+
136+
CORE WORKFLOW:
137+
138+
1. Page Analysis (REQUIRED FIRST STEP):
139+
- ALWAYS begin by using browser_snapshot to analyze the page structure
140+
- This provides critical context about available elements and their relationships
141+
- Use this snapshot to inform subsequent actions and element selection
142+
- Pay attention to form structure and validation elements
143+
144+
2. Form Interaction Strategy:
145+
PRE-ACTION:
146+
- Verify field state and accessibility
147+
- Check for existing validation messages
148+
- Ensure field is ready for input
149+
150+
ACTION:
151+
- Type or interact with clear intent
152+
- Watch for dynamic updates
153+
- Monitor validation feedback
154+
155+
POST-ACTION:
156+
- Verify input acceptance
157+
- Check for validation messages
158+
- Confirm state changes before proceeding
159+
160+
3. Element Interaction:
161+
- Navigate pages using browser_navigate
162+
* Handles URL navigation with proper load state waiting
163+
* Supports both absolute and relative URLs
164+
165+
- Click elements using browser_click
166+
* Requires element reference from snapshot
167+
* Automatically waits for element to be actionable
168+
* Handles dynamic content updates
169+
170+
- Input text using browser_type
171+
* Supports all input types
172+
* Can trigger form submission with Enter key
173+
* Automatically clears existing content
174+
175+
- Advanced interactions:
176+
* browser_hover: Mouse hover simulation
177+
* browser_drag: Drag and drop operations
178+
* browser_select_option: Dropdown selection
179+
* browser_press_key: Keyboard input
180+
* browser_choose_file: File upload handling
181+
182+
4. Verification and Assertions:
183+
- Element assertions (browser_assert):
184+
* isVisible: Check element visibility
185+
* hasText: Verify element content
186+
* isEnabled: Check interactability
187+
* isChecked: Verify checkbox/radio state
188+
DO NOT assume or fabricate expected values - use only provided values
189+
190+
- Page assertions (browser_page_assert):
191+
* title: Verify page title
192+
* url: Check current URL
193+
* Supports exact and pattern matching
194+
DO NOT assume or fabricate expected values - use only provided values
195+
196+
5. Documentation and Debugging:
197+
- browser_take_screenshot: Capture page state
198+
- browser_save_pdf: Generate PDF documentation
199+
- browser_get_text: Extract element content
200+
- browser_wait: Handle timing dependencies
201+
202+
5. Data Extraction or Extracting information from the page for further steps:
203+
- browser_get_text: Extract element content
204+
205+
ERROR HANDLING AND VALIDATION:
206+
207+
1. Response Classification:
208+
- TOOL ERRORS (Report as exceptions):
209+
* Element not found or not interactable
210+
* Action execution failures
211+
* Network/system errors
212+
* Timeouts
213+
* Unexpected state changes
214+
215+
- APPLICATION FEEDBACK (Report as output):
216+
* Form validation messages
217+
* Required field alerts
218+
* Format validation messages
219+
* Business rule validations
220+
* Success/confirmation messages
221+
* Expected state changes
222+
223+
2. Form Validation Patterns:
224+
- FIELD LEVEL:
225+
* Required field messages
226+
* Format restrictions
227+
* Length limitations
228+
* Invalid input feedback
229+
230+
- FORM LEVEL:
231+
* Cross-field validations
232+
* Business rule enforcement
233+
* Submit button state
234+
* Overall form state
235+
236+
3. Validation Response Strategy:
237+
Success Path: {
238+
action: "clear description",
239+
exception: "",
240+
output: "success details including state changes"
241+
}
242+
243+
Validation Path: {
244+
action: "clear description",
245+
exception: "",
246+
output: "validation details + current form state"
247+
}
248+
249+
Error Path: {
250+
action: "clear description",
251+
exception: "tool/system error details",
252+
output: "context of failure"
253+
}
254+
255+
4. Timing Considerations:
256+
- Wait for dynamic content when needed
257+
- Handle loading states appropriately
258+
- Consider network conditions
259+
- Use explicit waits for stability
260+
261+
RESPONSE FORMAT:
262+
Return a stringified JSON object with these exact fields:
263+
{
264+
"action": "Descriptive action name",
265+
"exception": "Error message or empty string",
266+
"output": "Detailed operation result"
267+
}
268+
269+
Remember:
270+
- Always start with browser_snapshot
271+
- Verify elements before interaction
272+
- Handle errors gracefully and descriptively
273+
- Distinguish between tool errors and application behavior
274+
- Maintain accurate state tracking`;
74275

276+
const all_tools = [
277+
browser_click,
278+
browser_type,
279+
browser_get_text,
280+
browser_navigate,
281+
browser_snapshot,
282+
browser_hover,
283+
browser_drag,
284+
browser_select_option,
285+
browser_take_screenshot,
286+
browser_go_back,
287+
browser_wait,
288+
browser_press_key,
289+
browser_save_pdf,
290+
browser_choose_file,
291+
browser_assert,
292+
browser_go_forward,
293+
browser_page_assert
294+
];
75295
const agent = createReactAgent({
296+
//llm: model.bindTools(all_tools, { parallel_tool_calls: false }),
76297
llm: model,
77-
tools: [
78-
browser_click,
79-
browser_type,
80-
browser_get_text,
81-
browser_navigate,
82-
browser_snapshot,
83-
browser_hover,
84-
browser_drag,
85-
browser_select_option,
86-
browser_take_screenshot,
87-
browser_go_back,
88-
browser_wait,
89-
browser_press_key,
90-
browser_save_pdf,
91-
browser_choose_file,
92-
browser_assert,
93-
browser_go_forward,
94-
browser_page_assert
95-
],
298+
tools: all_tools,
96299
stateModifier: prompt,
97300
responseFormat: {
98301
prompt: `Return a stringified JSON object with exactly these fields:
99302
{
100303
"action": "<type of action performed>",
101-
"error": "<error message or empty string>",
304+
"exception": "<error message or empty string>",
102305
"output": "<your output message>"
103306
}`,
104307
schema: AutoResponseSchema
@@ -115,13 +318,17 @@ export async function auto(
115318
): Promise<any> {
116319
console.log(`[Auto] Processing instruction: "${instruction}"`);
117320

118-
if (config?.page) {
321+
if (config?.page)
322+
{
119323
sessionManager.setPage(config.page);
120324
console.log(`[Auto] Page set from config`);
121-
} else {
122-
try {
325+
} else
326+
{
327+
try
328+
{
123329
sessionManager.getPage();
124-
} catch {
330+
} catch
331+
{
125332
// In standalone mode, create a new page
126333
console.log(`[Auto] No existing page, creating new page`);
127334
await context.createPage();
@@ -136,24 +343,27 @@ export async function auto(
136343
});
137344
const result = response.structuredResponse;
138345
// Process agent result
139-
try {
346+
try
347+
{
140348
console.log(`[Auto] Agent response:`, result);
141349

142350
// Parse and validate the response
143351
const validatedResponse = AutoResponseSchema.parse(result);
144352

145353
console.log(`[Auto] Action: ${validatedResponse.action}`);
146-
if (validatedResponse.error) {
147-
console.log(`[Auto] Error: ${validatedResponse.error}`);
354+
if (validatedResponse.exception && validatedResponse.exception !== 'None' && validatedResponse.exception !== '' && validatedResponse.exception !== 'null' && validatedResponse.exception !== 'NA')
355+
{
356+
console.log(`[Auto] Error: ${validatedResponse.exception}`);
148357
throw {
149-
error: validatedResponse.error,
358+
error: validatedResponse.exception,
150359
output: validatedResponse.output
151360
};
152361
}
153362

154363
// Return the output or null if successful with no output
155364
return validatedResponse.output || null;
156-
} catch (error) {
365+
} catch (error)
366+
{
157367
console.log(`[Auto] Error processing response:`, error);
158368
throw error;
159369
}

0 commit comments

Comments
 (0)