feat: add multimodal support (voice, camera, screen, MCP)#36
Merged
Conversation
Add comprehensive multimodal features to AGI CLI: ## New CLI Options - --voice: Enable voice input/output (requires OPENAI_API_KEY) - --camera: Enable camera video feed - --screen: Enable screen recording - --mcp: Load MCP servers from config - --mcp-config: Custom MCP config path (default: ~/.agi/mcp.json) ## Features - Voice input with automatic turn detection - Text-to-speech output - Camera and screen video buffers - MCP server integration for extended tools - All features work together seamlessly ## Usage Examples agi --voice "What's the time?" agi --voice --screen "What's on my screen?" agi --voice --camera --screen --mcp "Help me with my work" ## Related PRs - agi-api (driver): agi-inc/agents#344 - agi-python: agi-inc/agi-python#8 - agi-node: agi-inc/agi-node#11 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Allows users to specify a custom AGI API endpoint URL: - Added apiUrl to CliArgs interface - Added --api-url CLI option - Pass apiUrl to useAgent hook Usage: agi --api-url http://localhost:8000 "your goal" 🤖 Generated with Claude Code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Update App.tsx to pass voice, camera, screen, mcp, mcpConfig to useAgent - Update UseAgentOptions interface to accept multimodal options - Pass all multimodal options to AgentDriver constructor - Complete end-to-end wiring: CLI args → App → useAgent → AgentDriver → API Now the --voice, --camera, --screen, --mcp flags are fully functional! 🤖 Generated with Claude Code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add voice, camera, screen, mcp, mcpConfig to the start callback dependency array so React captures the correct values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove unused mkdirSync, join, and color variable that caused ESLint failures in CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI will pass once agi-node 0.5.0 is published to npm. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
Author
Merge OrderThis PR depends on
The |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Multimodal Support - AGI CLI Updates
This update adds comprehensive multimodal support to the AGI CLI.
New Features
Voice Mode (
--voice)OPENAI_API_KEYenvironment variableCamera Mode (
--camera)Screen Mode (
--screen)MCP Support (
--mcp)~/.agi/mcp.json--mcp-config /path/to/mcp.jsonUsage Examples
Voice Mode
agi --voice "What's the current time?"Voice + Screen
agi --voice --screen "What's on my screen?"Full Multimodal
agi --voice --camera --screen "Can you see me and my screen?"MCP Servers
Everything Combined
agi --voice --camera --screen --mcp "Help me with my work"Configuration
Environment Variables
AGI_API_KEY: Your AGI API key (required)OPENAI_API_KEY: OpenAI key for voice features (required for --voice)MCP Config Format
{ "server-name": { "command": "executable", "args": ["arg1", "arg2"], "env": { "ENV_VAR": "value" } } }CLI Options
--voice--camera--screen--mcp--mcp-config PATH-m, --model-v, --verbose--no-confirmImplementation
Changes made:
src/cli.tsto add multimodal optionssrc/hooks/useAgent.tsto pass multimodal config to driverTesting
Related PRs