Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Improvements #30

Closed
wants to merge 17 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
FROM node:20

# Set the working directory in the container
WORKDIR /app

# Copy package.json and package-lock.json files
COPY package*.json ./

# Install dependencies
RUN npm install

# Copy the rest of the application code
COPY . .

# Expose the port the app runs on
EXPOSE 3000

# Start the app
CMD ["npm", "run", "dev"]


9 changes: 0 additions & 9 deletions LICENSE

This file was deleted.

169 changes: 169 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,25 @@ You should be able to use this repo to prototype your own multi-agent realtime v
- Start the server with `npm run dev`
- Open your browser to [http://localhost:3000](http://localhost:3000) to see the app. It should automatically connect to the `simpleExample` Agent Set.

## Alternative Docker Setup

- You can also run this in a Docker container.
- Build the Docker image with the following command:

```bash
docker build -t realtime-api-agents-demo .
```

- Run the Docker container with the following command:

```bash
docker run -it --rm -p 3000:3000
-e OPENAI_API_KEY=replace-with-api-key
-v local-copy-of/openai-realtime-agents:/app
-v /app/node_modules
realtime-api-agents-demo
```

## Configuring Agents
Configuration in `src/app/agentConfigs/simpleExample.ts`
```javascript
Expand Down Expand Up @@ -49,6 +68,156 @@ export default agents;

This fully specifies the agent set that was used in the interaction shown in the screenshot above.

### Sequence Diagram

#### SimpleExample Flow

This diagram illustrates the interaction flow defined in `src/app/agentConfigs/simpleExample.ts`.

```mermaid
sequenceDiagram
participant User
participant WebClient as Next.js Client (App.tsx)
participant NextAPI as /api/session
participant RealtimeAPI as OpenAI Realtime API
participant AgentManager as AgentConfig (greeter, haiku)

Note over WebClient: User navigates to the app with ?agentConfig=simpleExample
User->>WebClient: Open Page (Next.js SSR fetches page.tsx/layout.tsx)
WebClient->>WebClient: useEffect loads agent configs (simpleExample)
WebClient->>WebClient: connectToRealtime() called

Note right of WebClient: Fetch ephemeral session
WebClient->>NextAPI: GET /api/session
NextAPI->>RealtimeAPI: POST /v1/realtime/sessions
RealtimeAPI->>NextAPI: Returns ephemeral session token
NextAPI->>WebClient: Returns ephemeral token (JSON)

Note right of WebClient: Start RTC handshake
WebClient->>RealtimeAPI: POST /v1/realtime?model=gpt-4o-realtime-preview-2024-12-17 <Offer SDP>
RealtimeAPI->>WebClient: Returns SDP answer
WebClient->>WebClient: DataChannel "oai-events" established

Note over WebClient: The user speaks or sends text
User->>WebClient: "Hello!" (mic or text)
WebClient->>AgentManager: conversation.item.create (role=user)
WebClient->>RealtimeAPI: data channel event: {type: "conversation.item.create"}
WebClient->>RealtimeAPI: data channel event: {type: "response.create"}

Note left of AgentManager: Agents parse user message
AgentManager->>greeter: "greeter" sees new user message
greeter->>AgentManager: Potentially calls "transferAgents(haiku)" if user says "Yes"
AgentManager-->>WebClient: event: transferAgents => destination_agent="haiku"

Note left of WebClient: data channel function call
WebClient->>WebClient: handleFunctionCall: sets selectedAgentName="haiku"

Note left of AgentManager: "haiku" agent now handles user messages
haiku->>AgentManager: Respond with a haiku
AgentManager->>WebClient: "Here is a haiku…" (assistant role)
WebClient->>User: Display/Play final answer
```

#### FrontDeskAuthentication Flow

This diagram illustrates the interaction flow defined in `src/app/agentConfigs/frontDeskAuthentication/`.

```mermaid
sequenceDiagram
participant User
participant WebClient as Next.js Client (App.tsx)
participant NextAPI as /api/session
participant RealtimeAPI as OpenAI Realtime API
participant AgentManager as Agents (authenticationAgent, tourGuide)

Note over WebClient: User navigates to ?agentConfig=frontDeskAuthentication
User->>WebClient: Open Page
WebClient->>NextAPI: GET /api/session
NextAPI->>RealtimeAPI: POST /v1/realtime/sessions
RealtimeAPI->>NextAPI: Returns ephemeral session
NextAPI->>WebClient: Returns ephemeral token (JSON)

Note right of WebClient: Start RTC handshake
WebClient->>RealtimeAPI: Offer SDP (WebRTC)
RealtimeAPI->>WebClient: SDP answer
WebClient->>WebClient: DataChannel "oai-events" established

Note over WebClient,AgentManager: The user is connected to "authenticationAgent" first
User->>WebClient: "Hello, I need to check in."
WebClient->>AgentManager: conversation.item.create (role=user)
WebClient->>RealtimeAPI: data channel event: {type: "conversation.item.create"}
WebClient->>RealtimeAPI: data channel event: {type: "response.create"}

Note over AgentManager: authenticationAgent prompts for user details
authenticationAgent->>AgentManager: calls authenticate_user_information() (tool function)
AgentManager-->>WebClient: function_call => name="authenticate_user_information"
WebClient->>WebClient: handleFunctionCall => possibly calls your custom backend or a mock to confirm

Note left of AgentManager: Once user is authenticated
authenticationAgent->>AgentManager: calls transferAgents("tourGuide")
AgentManager-->>WebClient: function_call => name="transferAgents" args={destination: "tourGuide"}

WebClient->>WebClient: setSelectedAgentName("tourGuide")
Note over AgentManager: "tourGuide" welcomes the user with a friendly introduction
tourGuide->>AgentManager: "Here's a guided tour..."
AgentManager->>WebClient: conversation.item.create (assistant role)
WebClient->>User: Displays or plays back the tour content
```

#### CustomerServiceRetail Flow

This diagram illustrates the interaction flow defined in `src/app/agentConfigs/customerServiceRetail/`.

```mermaid
sequenceDiagram
participant User
participant WebClient as Next.js Client
participant NextAPI as /api/session
participant RealtimeAPI as OpenAI Realtime API
participant AgentManager as Agents (authentication, returns, sales, simulatedHuman)
participant o1mini as "o1-mini" (Escalation Model)

Note over WebClient: User navigates to ?agentConfig=customerServiceRetail
User->>WebClient: Open Page
WebClient->>NextAPI: GET /api/session
NextAPI->>RealtimeAPI: POST /v1/realtime/sessions
RealtimeAPI->>NextAPI: Returns ephemeral session
NextAPI->>WebClient: Returns ephemeral token (JSON)

Note right of WebClient: Start RTC handshake
WebClient->>RealtimeAPI: Offer SDP (WebRTC)
RealtimeAPI->>WebClient: SDP answer
WebClient->>WebClient: DataChannel "oai-events" established

Note over AgentManager: Default agent is "authentication"
User->>WebClient: "Hi, I'd like to return my snowboard."
WebClient->>AgentManager: conversation.item.create (role=user)
WebClient->>RealtimeAPI: {type: "conversation.item.create"}
WebClient->>RealtimeAPI: {type: "response.create"}

authentication->>AgentManager: Requests user info, calls authenticate_user_information()
AgentManager-->>WebClient: function_call => name="authenticate_user_information"
WebClient->>WebClient: handleFunctionCall => verifies details

Note over AgentManager: After user is authenticated
authentication->>AgentManager: transferAgents("returns")
AgentManager-->>WebClient: function_call => name="transferAgents" args={ destination: "returns" }
WebClient->>WebClient: setSelectedAgentName("returns")

Note over returns: The user wants to process a return
returns->>AgentManager: function_call => checkEligibilityAndPossiblyInitiateReturn
AgentManager-->>WebClient: function_call => name="checkEligibilityAndPossiblyInitiateReturn"

Note over WebClient: The WebClient calls /api/chat/completions with model="o1-mini"
WebClient->>o1mini: "Is this item eligible for return?"
o1mini->>WebClient: "Yes/No (plus notes)"

Note right of returns: Returns uses the result from "o1-mini"
returns->>AgentManager: "Return is approved" or "Return is denied"
AgentManager->>WebClient: conversation.item.create (assistant role)
WebClient->>User: Displays final verdict
```

### Next steps
- Check out the configs in `src/app/agentConfigs`. The example above is a minimal demo that illustrates the core concepts.
- [frontDeskAuthentication](src/app/agentConfigs/frontDeskAuthentication) Guides the user through a step-by-step authentication flow, confirming each value character-by-character, authenticates the user with a tool call, and then transfers to another agent. Note that the second agent is intentionally "bored" to show how to prompt for personality and tone.
Expand Down
35 changes: 35 additions & 0 deletions reasoning_flask_app/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
from flask import Flask, request, jsonify
import openai # Using OpenAI model for reasoning
import os
from dotenv import load_dotenv
from openai import OpenAI
# Load environment variables
load_dotenv()

app = Flask(__name__)

@app.route("/reason", methods=["POST"])
def reason():
try:
data = request.get_json()
if not data or "query" not in data:
return jsonify({"error": "Missing 'query' in request"}), 400

query = data["query"]
openai.api_key = os.getenv("OPENAI_API_KEY") # Use API key from environment
client = OpenAI()
response = client.chat.completions.create(
model="o1-mini",
messages=[
{"role": "developer", "content": "You are a helpful assistant."},
{"role": "user", "content": query}
]
max_completion_tokens=150
)

return jsonify({"response": response.choices[0].message.content})
except Exception as e:
return jsonify({"error": str(e)}), 500

if __name__ == "__main__":
app.run(host="0.0.0.0", port=5050, debug=True)
36 changes: 29 additions & 7 deletions src/app/App.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ import Image from "next/image";
import Transcript from "./components/Transcript";
import Events from "./components/Events";
import BottomToolbar from "./components/BottomToolbar";
import { ThemeToggle } from "./components/ThemeToggle";
import { ThemeProvider } from "./components/ThemeProvider";

// Types
import { AgentConfig, SessionStatus } from "@/app/types";
Expand All @@ -26,6 +28,15 @@ import { createRealtimeConnection } from "./lib/realtimeConnection";
import { allAgentSets, defaultAgentSetKey } from "@/app/agentConfigs";

function App() {
return (
<ThemeProvider>
<AppContent />
<ThemeToggle />
</ThemeProvider>
);
}

function AppContent() {
const searchParams = useSearchParams();

const { transcriptItems, addTranscriptMessage, addTranscriptBreadcrumb } =
Expand Down Expand Up @@ -401,23 +412,34 @@ function App() {
}
}, [isAudioPlaybackEnabled]);

useEffect(() => {
// (un)mute microphone
if (pcRef.current) {
pcRef.current.getSenders().forEach((sender) => {
if (sender.track) {
sender.track.enabled = isAudioPlaybackEnabled
}
});
}
}, [isAudioPlaybackEnabled, pcRef.current])

const agentSetKey = searchParams.get("agentConfig") || "default";

return (
<div className="text-base flex flex-col h-screen bg-gray-100 text-gray-800 relative">
<div className="p-5 text-lg font-semibold flex justify-between items-center">
<div className="text-base flex flex-col h-screen bg-background dark:bg-[#1a1a1a] text-foreground dark:text-white relative">
<div className="p-5 text-lg font-semibold flex justify-between items-center bg-background dark:bg-[#202020] panel">
<div className="flex items-center">
<div onClick={() => window.location.reload()} style={{ cursor: 'pointer' }}>
<Image
src="/openai-logomark.svg"
alt="OpenAI Logo"
width={20}
height={20}
className="mr-2"
className="mr-2 dark:invert"
/>
</div>
<div>
Realtime API <span className="text-gray-500">Agents</span>
Realtime API <span className="text-muted dark:text-muted">Agents</span>
</div>
</div>
<div className="flex items-center">
Expand All @@ -428,7 +450,7 @@ function App() {
<select
value={agentSetKey}
onChange={handleAgentChange}
className="appearance-none border border-gray-300 rounded-lg text-base px-2 py-1 pr-8 cursor-pointer font-normal focus:outline-none"
className="appearance-none border border-[#acacac] dark:border-[#404040] rounded-lg text-base px-2 py-1 pr-8 cursor-pointer font-normal focus:outline-none bg-[#2a2a2a]/5 dark:bg-[#2a2a2a]/20"
>
{Object.keys(allAgentSets).map((agentKey) => (
<option key={agentKey} value={agentKey}>
Expand Down Expand Up @@ -456,7 +478,7 @@ function App() {
<select
value={selectedAgentName}
onChange={handleSelectedAgentChange}
className="appearance-none border border-gray-300 rounded-lg text-base px-2 py-1 pr-8 cursor-pointer font-normal focus:outline-none"
className="appearance-none border border-[#acacac] dark:border-[#404040] rounded-lg text-base px-2 py-1 pr-8 cursor-pointer font-normal focus:outline-none bg-[#2a2a2a]/5 dark:bg-[#2a2a2a]/20"
>
{selectedAgentConfigSet?.map(agent => (
<option key={agent.name} value={agent.name}>
Expand All @@ -483,7 +505,7 @@ function App() {
</div>
</div>

<div className="flex flex-1 gap-2 px-2 overflow-hidden relative">
<div className="flex flex-1 gap-4 px-4 py-4 overflow-hidden relative">
<Transcript
userText={userText}
setUserText={setUserText}
Expand Down
Loading