|
| 1 | +# Realtime API Agents Demo |
| 2 | + |
| 3 | +This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API. In particular, this demonstrates: |
| 4 | +- Sequential agent handoffs according to a defined agent graph (taking inspiration from [OpenAI Swarm](https://github.com/openai/swarm)) |
| 5 | +- Background escalation to more intelligent models like o1-mini for high-stakes decisions |
| 6 | +- Prompting models to follow a state machine, for example to accurately collect things like names and phone numbers with confirmation character by character to authenticate a user. |
| 7 | + |
| 8 | +You should be able to use this repo to prototype your own multi-agent realtime voice app in less than 20 minutes! |
| 9 | + |
| 10 | + |
| 11 | + |
| 12 | +## Setup |
| 13 | + |
| 14 | +- This is a Next.js typescript app |
| 15 | +- Install dependencies with `npm i` |
| 16 | +- Add your `OPENAI_API_KEY` to your env |
| 17 | +- Start the server with `npm run dev` |
| 18 | +- Open your browser to [http://localhost:3000](http://localhost:3000) to see the app. It should automatically connect to the `simpleExample` Agent Set. |
| 19 | + |
| 20 | +## Configuring Agents |
| 21 | +Configuration in `src/app/agentConfigs/simpleExample.ts` |
| 22 | +```javascript |
| 23 | +import { AgentConfig } from "@/app/types"; |
| 24 | +import { injectTransferTools } from "./utils"; |
| 25 | + |
| 26 | +// Define agents |
| 27 | +const haiku: AgentConfig = { |
| 28 | + name: "haiku", |
| 29 | + publicDescription: "Agent that writes haikus.", // Context for the agent_transfer tool |
| 30 | + instructions: |
| 31 | + "Ask the user for a topic, then reply with a haiku about that topic.", |
| 32 | + tools: [], |
| 33 | +}; |
| 34 | + |
| 35 | +const greeter: AgentConfig = { |
| 36 | + name: "greeter", |
| 37 | + publicDescription: "Agent that greets the user.", |
| 38 | + instructions: |
| 39 | + "Please greet the user and ask them if they'd like a Haiku. If yes, transfer them to the 'haiku' agent.", |
| 40 | + tools: [], |
| 41 | + downstreamAgents: [haiku], |
| 42 | +}; |
| 43 | + |
| 44 | +// add the transfer tool to point to downstreamAgents |
| 45 | +const agents = injectTransferTools([greeter, haiku]); |
| 46 | + |
| 47 | +export default agents; |
| 48 | +``` |
| 49 | + |
| 50 | +This fully specifies the agent set that was used in the interaction shown in the screenshot above. |
| 51 | + |
| 52 | +### Next steps |
| 53 | +- Check out the configs in `src/app/agentConfigs`. The example above is a minimal demo that illustrates the core concepts. |
| 54 | +- [frontDeskAuthentication](src/app/agentConfigs/frontDeskAuthentication) Guides the user through a step-by-step authentication flow, confirming each value character-by-character, authenticates the user with a tool call, and then transfers to another agent. Note that the second agent is intentionally "bored" to show how to prompt for personality and tone. |
| 55 | +- [customerServiceRetail](src/app/agentConfigs/customerServiceRetail) Also guides through an authentication flow, reads a long offer from a canned script verbatim, and then walks through a complex return flow which requires looking up orders and policies, gathering user context, and checking with `o1-mini` to ensure the return is eligible. To test this flow, say that you'd like to return your snowboard and go through the necessary prompts! |
| 56 | + |
| 57 | +### Defining your own agents |
| 58 | +- You can copy these to make your own multi-agent voice app! Once you make a new agent set config, add it to `src/app/agentConfigs/index.ts` and you should be able to select it in the UI in the "Scenario" dropdown menu. |
| 59 | +- To see how to define tools and toolLogic, including a background LLM call, see [src/app/agentConfigs/customerServiceRetail/returns.ts](src/app/agentConfigs/customerServiceRetail/returns.ts) |
| 60 | +- To see how to define a detailed personality and tone, and use a prompt state machine to collect user information step by step, see [src/app/agentConfigs/frontDeskAuthentication/authentication.ts](src/app/agentConfigs/frontDeskAuthentication/authentication.ts) |
| 61 | +- To see how to wire up Agents into a single Agent Set, see [src/app/agentConfigs/frontDeskAuthentication/index.ts](src/app/agentConfigs/frontDeskAuthentication/index.ts) |
| 62 | +- If you want help creating your own prompt using these conventions, we've included a metaprompt [here](src/app/agentConfigs/voiceAgentMetaprompt.txt), or you can use our [Voice Agent Metaprompter GPT](https://chatgpt.com/g/g-678865c9fb5c81918fa28699735dd08e-voice-agent-metaprompt-gpt) |
| 63 | + |
| 64 | +## UI |
| 65 | +- You can select agent scenarios in the Scenario dropdown, and automatically switch to a specific agent with the Agent dropdown. |
| 66 | +- The conversation transcript is on the left, including tool calls, tool call responses, and agent changes. Click to expand non-message elements. |
| 67 | +- The event log is on the right, showing both client and server events. Click to see the full payload. |
| 68 | +- On the bottom, you can disconnect, toggle between automated voice-activity detection or PTT, turn off audio playback, and toggle logs. |
| 69 | + |
| 70 | +## Core Contributors |
| 71 | +- Noah MacCallum - [noahmacca](https://x.com/noahmacca) |
| 72 | +- Ilan Bigio - [ibigio](https://github.com/ibigio) |
0 commit comments