Skip to content

Latest commit

 

History

History
504 lines (402 loc) · 13.8 KB

plan.md

File metadata and controls

504 lines (402 loc) · 13.8 KB

I've reviewed your document which provides extensive information about your AI navigation system built using Claude, the Model Context Protocol (MCP), and a web driving library. Let me analyze your project and outline a technical recipe to move from your working local environment to a web-hosted solution.

Is This Project Possible?

Yes, this project is definitely possible to host on the web. What you've described is an advanced but achievable technical architecture combining:

  1. A large language model (Claude)
  2. MCP servers for tool connection
  3. Web driving capabilities through the Stagehand MCP server
  4. Web hosting to make this accessible outside your local environment

Your current implementation is working in Cursor (an AI-enhanced extension of VSCode), and your goal to make this accessible via a website is reasonable. The architecture you've described is similar to many AI agent frameworks that are currently deployed online.

Technical Recipe for Web Deployment

Here's a step-by-step guide to move your local implementation to a web-hosted environment:

1. Set Up Your Web Application Framework

# Create a new Next.js project
npx create-next-app@latest web-automation-agent
cd web-automation-agent

# Install dependencies
npm install @anthropic-ai/sdk  # For Claude API
npm install @copilotkit/react-core @copilotkit/react-ui @copilotkit/runtime  # CopilotKit components
npm install langchain-openai  # For LangChain OpenAI adapters
npm install langgraph  # For LangGraph workflow
npm install langchain-mcp-adapters  # For MCP adapters

2. Create Basic Project Structure

web-automation-agent/
├── .env.local         # Environment variables
├── app/
│   ├── api/
│   │   └── agent/
│   │       └── route.ts  # API endpoint for agent
│   ├── components/
│   │   ├── AgentChat.tsx  # Chat UI component
│   │   └── TaskForm.tsx   # Form for submitting tasks
│   └── page.tsx           # Main page
└── lib/
    ├── agent.ts           # Agent implementation
    ├── mcp-config.ts      # MCP configuration
    └── types.ts           # Type definitions

3. Configure Environment Variables

Create a .env.local file with the following variables:

ANTHROPIC_API_KEY=your_claude_api_key
BROWSERBASE_API_KEY=your_browserbase_api_key
BROWSERBASE_PROJECT_ID=your_browserbase_project_id
LINEAR_API_KEY=your_linear_api_key (if using Linear)

4. Implement the Agent API Endpoint

Create app/api/agent/route.ts:

import { NextRequest, NextResponse } from "next/server";
import { Anthropic } from "@anthropic-ai/sdk";
import { MultiServerMCPClient } from "langchain-mcp-adapters/client";
import { createReactAgent } from "langgraph/prebuilt";

// Configure MCP servers
const MCP_CONFIG = {
  stagehand: {
    command: "npx",
    args: ["-y", "@modelcontextprotocol/server-browserbase-stagehand"],
    transport: "stdio",
    env: {
      BROWSERBASE_API_KEY: process.env.BROWSERBASE_API_KEY,
      BROWSERBASE_PROJECT_ID: process.env.BROWSERBASE_PROJECT_ID,
    },
  },
};

export async function POST(req: NextRequest) {
  try {
    const { objective } = await req.json();

    // Initialize Anthropic client
    const anthropic = new Anthropic({
      apiKey: process.env.ANTHROPIC_API_KEY,
    });

    // Setup MCP client and tools
    const mcpClient = new MultiServerMCPClient(MCP_CONFIG);
    await mcpClient.initialize();
    const mcpTools = mcpClient.getTools();

    // Create agent
    const model = new ChatAnthropic({
      model: "claude-3-opus-20240229",
      anthropicApiKey: process.env.ANTHROPIC_API_KEY,
    });
    const agent = createReactAgent(model, mcpTools);

    // Run agent with objective
    const agentResponse = await agent.invoke({
      messages: [
        {
          role: "user",
          content: objective,
        },
      ],
    });

    // Clean up
    await mcpClient.close();

    return NextResponse.json({
      result: agentResponse.messages,
    });
  } catch (error) {
    console.error("Agent error:", error);
    return NextResponse.json(
      { error: "Failed to process request" },
      { status: 500 }
    );
  }
}

5. Create a User Interface Component

Create app/components/AgentChat.tsx:

"use client";

import { useState, useRef, useEffect } from "react";
import { CopilotChat, CopilotKitCSSProperties } from "@copilotkit/react-ui";

export default function AgentChat() {
  const [isProcessing, setIsProcessing] = useState(false);
  const [messages, setMessages] = useState([]);

  const handleSubmit = async (userMessage) => {
    setIsProcessing(true);

    try {
      const response = await fetch("/api/agent", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ objective: userMessage }),
      });

      const data = await response.json();

      if (response.ok) {
        // Add assistant response
        setMessages((prev) => [
          ...prev,
          {
            role: "assistant",
            content: data.result,
          },
        ]);
      } else {
        throw new Error(data.error || "Failed to process request");
      }
    } catch (error) {
      console.error("Error processing message:", error);
      // Add error message
      setMessages((prev) => [
        ...prev,
        {
          role: "assistant",
          content: `Error: ${error.message}`,
        },
      ]);
    } finally {
      setIsProcessing(false);
    }
  };

  return (
    <div className="h-full flex flex-col">
      <CopilotChat
        className="h-full flex flex-col"
        messages={messages}
        onSubmit={handleSubmit}
        isProcessing={isProcessing}
        instructions={
          "I'm an AI assistant that can help you navigate the web and perform tasks."
        }
        labels={{
          title: "Web Automation Assistant",
          initial: "What would you like me to do on the web today?",
        }}
      />
    </div>
  );
}

6. Implement the Main Page

Create app/page.tsx:

import AgentChat from "./components/AgentChat";

export default function Home() {
  return (
    <main className="min-h-screen p-4 md:p-8 flex flex-col">
      <header className="mb-6">
        <h1 className="text-3xl font-bold">Web Automation Assistant</h1>
        <p className="text-gray-600">
          Ask me to perform tasks on the web. I can navigate websites, interact
          with pages, and help you find information.
        </p>
      </header>

      <div className="flex-1 min-h-[70vh]">
        <AgentChat />
      </div>
    </main>
  );
}

7. Create a Server-Side MCP Adapter

For web deployment, we need to modify how MCP servers are initialized. Since we can't run local processes on most hosting platforms, we'll need to use an HTTP/SSE approach instead of stdio.

Create lib/mcp-remote-adapter.ts:

import { ServerParameters } from "langchain-mcp-adapters/client";

// This adapter will connect to remotely hosted MCP servers
export class RemoteMCPAdapter {
  constructor(private config: Record<string, ServerParameters>) {}

  async initialize() {
    // Connect to remote MCP servers via HTTP/SSE
    for (const [name, params] of Object.entries(this.config)) {
      if (params.transport === "stdio") {
        // Convert stdio configs to SSE for remote deployment
        this.config[name] = {
          ...params,
          transport: "sse",
          url: `${process.env.MCP_SERVER_BASE_URL}/${name}/events`,
        };
      }
    }
  }

  getTools() {
    // Implementation to fetch tools from remote servers
  }

  async close() {
    // Close connections to remote servers
  }
}

8. Create a Separate MCP Server Project

Since we need to host the MCP servers independently, create a new project for them:

# Create MCP server project
mkdir mcp-servers
cd mcp-servers

# Initialize project
npm init -y
npm install @modelcontextprotocol/sdk puppeteer browserbase-stagehand

# Create server files
mkdir src
touch src/index.js
touch src/stagehand-server.js

Edit src/stagehand-server.js:

const { Server } = require("@modelcontextprotocol/sdk/server");
const { SSEServerTransport } = require("@modelcontextprotocol/sdk/server/sse");
const { StagehandBrowser } = require("browserbase-stagehand");

// Initialize server
const server = new Server(
  {
    name: "stagehand-server",
    version: "1.0.0",
  },
  {
    capabilities: {
      tools: {},
    },
  }
);

// Configure tools
server.setRequestHandler("tools/list", async () => {
  return {
    tools: [
      {
        name: "navigate",
        description: "Navigate to a URL",
        inputSchema: {
          type: "object",
          properties: {
            url: { type: "string" },
          },
          required: ["url"],
        },
      },
      // Add more tools here for clicking, typing, etc.
    ],
  };
});

// Handle tool execution
server.setRequestHandler("tools/call", async (request) => {
  const browser = new StagehandBrowser({
    apiKey: process.env.BROWSERBASE_API_KEY,
    projectId: process.env.BROWSERBASE_PROJECT_ID,
  });

  try {
    switch (request.params.name) {
      case "navigate":
        await browser.navigate(request.params.arguments.url);
        return {
          content: [
            {
              type: "text",
              text: `Navigated to ${request.params.arguments.url}`,
            },
          ],
        };
      // Handle other tools
      default:
        throw new Error(`Unknown tool: ${request.params.name}`);
    }
  } catch (error) {
    return {
      isError: true,
      content: [{ type: "text", text: `Error: ${error.message}` }],
    };
  } finally {
    await browser.close();
  }
});

// Export function to start server
module.exports = async function startServer(app, routePath) {
  app.get(`${routePath}/events`, (req, res) => {
    const transport = new SSEServerTransport(`${routePath}/messages`, res);
    server.connect(transport);
  });

  app.post(`${routePath}/messages`, (req, res) => {
    if (transport) {
      transport.handlePostMessage(req, res);
    }
  });
};

Create src/index.js to initialize all servers:

const express = require("express");
const cors = require("cors");
const stagehandServer = require("./stagehand-server");

const app = express();
app.use(cors());
app.use(express.json());

// Initialize servers
(async () => {
  await stagehandServer(app, "/stagehand");

  // Start the server
  const PORT = process.env.PORT || 3001;
  app.listen(PORT, () => {
    console.log(`MCP Servers running on port ${PORT}`);
  });
})();

9. Deploy Your Applications

A. Deploy the MCP Servers

The MCP servers need to be deployed first:

  1. Vercel/Render/Railway:

    cd mcp-servers
    vercel deploy  # or equivalent for your platform
  2. Set environment variables in your hosting dashboard:

    • BROWSERBASE_API_KEY
    • BROWSERBASE_PROJECT_ID
  3. Note the URL of your deployed MCP server (e.g., https://mcp-servers.vercel.app)

B. Deploy the Web Application

Now deploy the main web application:

  1. Vercel/Render/Railway:

    cd web-automation-agent
    vercel deploy  # or equivalent for your platform
  2. Set environment variables in your hosting dashboard:

    • ANTHROPIC_API_KEY
    • MCP_SERVER_BASE_URL (the URL from the previous step)

10. Configure CORS and Security

Ensure proper CORS settings in both applications:

For the MCP server (add to src/index.js):

// Configure CORS
app.use(
  cors({
    origin: process.env.CLIENT_ORIGIN || "*",
    methods: ["GET", "POST"],
    allowedHeaders: ["Content-Type"],
  })
);

For the web app, add API route CORS configuration:

// In app/api/agent/route.ts
export const config = {
  runtime: "edge",
  regions: ["iad1"], // Choose regions close to your users
};

11. Testing and Verification

Once deployed, test your application by:

  1. Navigating to your web app URL
  2. Entering a web task like "Scrape all jobs on Disney's careers site"
  3. Monitoring logs for both applications to ensure proper communication

12. Scaling and Monitoring

For production use, add:

  1. Rate limiting to prevent abuse
  2. Error monitoring using a service like Sentry
  3. Usage tracking to monitor API costs
  4. User authentication to restrict access to authorized users

Additional Considerations

Alternative Deployment Options

  1. Docker: Package both applications as Docker containers for more consistent deployment
  2. AWS Lambda: Deploy the API endpoints as serverless functions
  3. Kubernetes: For complex, scalable deployments with multiple MCP servers

Cost Management

  1. Claude API costs: Implement rate limiting and caching to reduce API calls
  2. Browserbase costs: Monitor session usage and implement timeouts
  3. Server costs: Use serverless where possible to pay only for actual usage

Security

  1. API Key Management: Use environment variables and never expose keys in client-side code
  2. Request Validation: Validate all inputs before passing to tools
  3. User Permissions: Implement proper authentication and authorization

Conclusion

Your project is definitely achievable as a web-hosted solution. The technical recipe above provides a comprehensive roadmap to take your local implementation and deploy it as a web service. By following these steps, you'll create a fully functional web-based AI system that can navigate websites and perform tasks based on high-level objectives.

The main challenge in the web deployment compared to your local setup is the transition from stdio transport to HTTP/SSE for MCP communication, but the approach outlined above addresses this by hosting the MCP servers as separate web services that your main application can communicate with via HTTP.

Is there any specific part of this plan you'd like me to expand on further?