This test plan aims to verify the functionality of cursor-tools commands, with particular focus on:
- Features documented in the README
- Recently changed functionality (from CHANGELOG)
- Different models and providers across commands
From the CHANGELOG, key areas to test include:
- MCP server overrides in the marketplace (both hardcoded and user-configured)
- Updated references to Claude 3.7 Sonnet models (from Claude 3.5)
- Updated @browserbasehq/stagehand dependency (1.13.0 to 1.13.1)
- Browser command state management with
--connect-to
andreload-current
- Default console and network options for browser commands
- New cursor rules directory structure (
.cursor/rules/cursor-tools.mdc
) - Dual-provider architecture for
plan
command --quiet
flag for suppressing stdout while saving to file
-
Required API keys:
- Set up
.cursor-tools.env
with:PERPLEXITY_API_KEY
GEMINI_API_KEY
OPENAI_API_KEY
(for browser commands)ANTHROPIC_API_KEY
(for MCP commands and browser commands)GITHUB_TOKEN
(optional, for GitHub commands)
- Set up
-
For browser command testing:
- Install Playwright:
npm install --global playwright
- Start test server:
pnpm serve-test
(runs on http://localhost:3000) - Test files located in
tests/commands/browser/
- Install Playwright:
For each command, we will:
- Test basic functionality (positive tests)
- Test error handling (negative tests)
- Test with different providers where applicable
- Test with different models where applicable
- Verify output quality and correctness
- Test recent changes and edge cases
- Test with different providers (OpenAI, Anthropic, Perplexity, Gemini, ModelBox, OpenRouter)
- Verify Claude 3.7 Sonnet model works correctly (previously 3.5)
- Compare output quality across models
- Test error handling
- Basic query:
cursor-tools ask "What is cursor-tools?" --provider openai --model o3-mini
- Complex query:
cursor-tools ask "Explain the differences between REST and GraphQL" --provider anthropic --model claude-3-7-sonnet
- Technical query:
cursor-tools ask "How does JavaScript event loop work?" --provider perplexity --model perplexity-online-latest
- ModelBox query:
cursor-tools ask "What are the benefits of TypeScript?" --provider modelbox --model claude-3-7-sonnet
- OpenRouter query:
cursor-tools ask "Explain the SOLID principles" --provider openrouter --model anthropic/claude-3-7-sonnet
- Missing model:
cursor-tools ask "Test question" --provider openai
(should throw ModelNotFoundError) - Missing provider:
cursor-tools ask "Test question" --model o3-mini
(should throw ProviderError) - Invalid provider:
cursor-tools ask "Test question" --provider invalid-provider --model o3-mini
(should throw ProviderError) - Invalid model:
cursor-tools ask "Test question" --provider openai --model invalid-model
(should throw ModelNotFoundError)
- Test with different providers (Perplexity, Gemini, ModelBox, OpenRouter)
- Test saving output to file
- Test with debug flag
- Test quiet flag
- Current technology:
cursor-tools web "Latest TypeScript features" --provider perplexity
- Technical documentation:
cursor-tools web "Next.js app router documentation" --provider gemini
- Saving output:
cursor-tools web "cursor-tools installation guide" --save-to web-results.md
- ModelBox provider:
cursor-tools web "React 19 new features" --provider modelbox --model claude-3-7-sonnet
- OpenRouter provider:
cursor-tools web "Node.js best practices 2024" --provider openrouter
- Debug flag:
cursor-tools web "TypeScript 5.7 features" --debug
- Quiet flag with save:
cursor-tools web "Angular vs React in 2024" --save-to web-compare.md --quiet
- Test with different providers and models
- Test on this repository
- Test with token limit considerations
- Test default Claude 3.7 model (previously 3.5)
- Code explanation:
cursor-tools repo "Explain the MCP server override implementation"
- Architecture question:
cursor-tools repo "How are providers implemented in the codebase?"
- Different models:
cursor-tools repo "What's the structure of browser commands?" --provider openai --model o3-mini
- ModelBox provider:
cursor-tools repo "Explain the dual-provider architecture for plan command" --provider modelbox
- OpenRouter provider:
cursor-tools repo "How is cursor rule integration implemented?" --provider openrouter
- Debug flag:
cursor-tools repo "Analyze error handling in the codebase" --debug
- Quiet flag:
cursor-tools repo "Summarize recent changes" --save-to repo-changes.md --quiet
- Test with different file and thinking providers
- Test with various combinations of models
- Test default thinking provider (OpenAI with o3-mini model)
- Test small and larger planning tasks
- Basic plan:
cursor-tools plan "How to add a new command to cursor-tools?"
- Default providers (verify o3-mini is used):
cursor-tools plan "Add a health check endpoint" --debug
- Split providers:
cursor-tools plan "Implement a health check endpoint" --fileProvider gemini --thinkingProvider openai --thinkingModel o3-mini
- Complex task:
cursor-tools plan "Implement TypeScript strict mode across the codebase" --fileProvider gemini --thinkingProvider anthropic --thinkingModel claude-3-7-sonnet
- ModelBox provider:
cursor-tools plan "Add support for a new AI provider" --fileProvider modelbox --thinkingProvider modelbox
- OpenRouter provider:
cursor-tools plan "Implement a new browser command feature" --fileProvider openrouter --thinkingProvider openrouter
- Quiet flag:
cursor-tools plan "Refactor error handling" --save-to plan-errors.md --quiet
- Test local repository documentation
- Test remote GitHub repository documentation
- Test with different hints for focused documentation
- Test with different models including Claude 3.7 Sonnet
- Local repo:
cursor-tools doc --save-to repo-docs.md
- Remote repo:
cursor-tools doc --from-github=eastlondoner/cursor-tools --save-to cursor-tools-docs.md
- Focused docs:
cursor-tools doc --hint="Focus on browser commands" --save-to browser-docs.md
- With Anthropic provider:
cursor-tools doc --provider anthropic --model claude-3-7-sonnet --save-to anthropic-docs.md
- With specific branch:
cursor-tools doc --from-github=eastlondoner/cursor-tools@main --save-to main-docs.md
- Quiet flag:
cursor-tools doc --from-github=eastlondoner/cursor-tools --save-to docs.md --quiet
- Test PR listing
- Test issue listing
- Test specific PR and issue viewing
- Test remote repository specification
- List PRs:
cursor-tools github pr
- List issues:
cursor-tools github issue
- View specific PR:
cursor-tools github pr 1
(if available) - View specific issue:
cursor-tools github issue 1
(if available) - From remote repo:
cursor-tools github pr --from-github=eastlondoner/cursor-tools
- Quiet flag:
cursor-tools github pr --save-to prs.md --quiet
- Test each browser subcommand (open, act, observe, extract)
- Test with updated Stagehand dependency
- Test with different models including Claude 3.7 Sonnet
- Test default console and network options
- Test state persistence with
--connect-to
andreload-current
- Open webpage:
cursor-tools browser open "https://example.com" --html
- Default console/network:
cursor-tools browser open "http://localhost:3000/test-logging.html"
(verify console/network logs are captured) - Disable console/network:
cursor-tools browser open "http://localhost:3000/test-logging.html" --no-console --no-network
(verify logs are not captured) - Perform action:
cursor-tools browser act "Click on the first link" --url "https://example.com" --model claude-3-7-sonnet-latest
- Default model:
cursor-tools browser act "Click the submit button" --url "http://localhost:3000/test-form.html" --provider anthropic
(verify claude-3-7-sonnet is used) - Observe page:
cursor-tools browser observe "What interactive elements are visible?" --url "https://example.com"
- Extract data:
cursor-tools browser extract "Get all heading text" --url "https://example.com"
cursor-tools browser open "http://localhost:3000/test-state.html" --connect-to=9222
cursor-tools browser act "Click the 'Counter' button" --url=current --connect-to=9222
(verify state is maintained)cursor-tools browser act "Check the counter value" --url=current --connect-to=9222
(verify state persisted from previous action)cursor-tools browser act "Click the 'Counter' button" --url=reload-current --connect-to=9222
(verify reload works while maintaining connection)cursor-tools browser act "Check the counter value" --url=current --connect-to=9222
(verify counter was reset after reload)
- Test MCP search functionality
- Test MCP run with overrides (recent feature)
- Test different MCP servers
- Test both hardcoded and user-configured overrides
- Search:
cursor-tools mcp search "git operations"
- Run command:
cursor-tools mcp run "list files in the current directory"
- Hardcoded override:
cursor-tools mcp run "using google-calendar-mcp list upcoming events"
(testing the eastlondoner fork override)
- Temporarily modify
cursor-tools.config.json
to add user override:{ "mcp": { "overrides": { "test-mcp-server": { "githubUrl": "https://github.com/example/test-mcp-server", "command": "custom-command", "args": [] } } } }
- Run:
cursor-tools mcp run "using test-mcp-server perform test action"
- Verify that the user-configured override is applied (check logs)
- Restore original
cursor-tools.config.json
- Test cursor rules installation with different environment settings
- Test new
.cursor/rules/cursor-tools.mdc
file creation - Test legacy
.cursorrules
file creation
- New cursor rules format:
USE_LEGACY_CURSORRULES=false cursor-tools install .
(verify.cursor/rules/cursor-tools.mdc
is created/updated) - Legacy cursor rules format:
USE_LEGACY_CURSORRULES=true cursor-tools install .
(verify.cursorrules
is created/updated) - Default behavior:
cursor-tools install .
(verify correct file is updated based on default behavior)
- Set up environment with all required API keys
- Start test server for browser testing:
pnpm serve-test
- Execute tests for each command category
- For each test:
- Document command executed
- Record output
- Verify debug output when
--debug
flag is used - Verify no stdout output when
--quiet
flag is used - Confirm file output when
--save-to
is used - Note any errors or unexpected behavior
- Assess response quality and correctness
- Test environment cleanup
- All commands execute without unexpected errors
- Output quality meets expectations
- Different providers and models can be successfully used
- Debug output is informative when
--debug
flag is used - No stdout output occurs when
--quiet
flag is used with--save-to
- File output is correct when
--save-to
is used
- Claude 3.7 Sonnet model works correctly in all commands (replacing Claude 3.5)
- MCP server overrides (both hardcoded and user-configured) function correctly
- Browser commands with Stagehand 1.13.1 function correctly
- Console and network logs are captured by default in browser commands
- Browser state is maintained correctly when using
--connect-to=current
- Browser state is reset correctly when using
--connect-to=reload-current
- Default thinking provider for plan command is OpenAI with o3-mini model
- Cursor rules are installed to the correct location based on
USE_LEGACY_CURSORRULES
- Error messages are user-friendly and informative across all commands