|
| 1 | +# Reference Client: Realtime API (beta) |
| 2 | + |
| 3 | +This repository contains a reference client aka sample library for connecting |
| 4 | +to OpenAI's Realtime API. |
| 5 | +**This library is in beta and should not be treated as a final implementation.** |
| 6 | +You can use it to easily prototype conversational apps. |
| 7 | + |
| 8 | +# Quickstart |
| 9 | + |
| 10 | +This library is built to be used both server-side (Node.js) and in browser (React, Vue), |
| 11 | +in both JavaScript and TypeScript codebases. While in beta, to install the library you will |
| 12 | +need to `npm install` directly from the GitHub repository. |
| 13 | + |
| 14 | +```shell |
| 15 | +$ npm i openai/openai-realtime-api-beta --save |
| 16 | +``` |
| 17 | + |
| 18 | +```javascript |
| 19 | +import { RealtimeClient } from '@openai/realtime-api-beta'; |
| 20 | + |
| 21 | +const client = new RealtimeClient({ apiKey: process.env.OPENAI_API_KEY }); |
| 22 | + |
| 23 | +// Can set parameters ahead of connecting, either separately or all at once |
| 24 | +client.updateSession({ instructions: 'You are a great, upbeat friend.' }); |
| 25 | +client.updateSession({ voice: 'alloy' }); |
| 26 | +client.updateSession({ |
| 27 | + turn_detection: { type: 'none' }, // or 'server_vad' |
| 28 | + input_audio_transcription: { model: 'whisper-1' }, |
| 29 | +}); |
| 30 | + |
| 31 | +// Set up event handling |
| 32 | +client.on('conversation.updated', (event) => { |
| 33 | + const { item, delta } = event; |
| 34 | + const items = client.conversation.getItems(); |
| 35 | + /** |
| 36 | + * item is the current item being updated |
| 37 | + * delta can be null or populated |
| 38 | + * you can fetch a full list of items at any time |
| 39 | + */ |
| 40 | +}); |
| 41 | + |
| 42 | +// Connect to Realtime API |
| 43 | +await client.connect(); |
| 44 | + |
| 45 | +// Send a item and triggers a generation |
| 46 | +client.sendUserMessageContent([{ type: 'input_text', text: `How are you?` }]); |
| 47 | +``` |
| 48 | + |
| 49 | +## Browser (front-end) quickstart |
| 50 | + |
| 51 | +You can use this client directly from the browser in e.g. React or Vue apps. |
| 52 | +**We do not recommend this, your API keys are at risk if you connect to OpenAI directly from the browser.** |
| 53 | +In order to instantiate the client in a browser environment, use: |
| 54 | + |
| 55 | +```javascript |
| 56 | +import { RealtimeClient } from '@openai/realtime-api-beta'; |
| 57 | + |
| 58 | +const client = new RealtimeClient({ |
| 59 | + apiKey: process.env.OPENAI_API_KEY, |
| 60 | + dangerouslyAllowAPIKeyInBrowser: true, |
| 61 | +}); |
| 62 | +``` |
| 63 | + |
| 64 | +If you are running your own relay server, e.g. with the |
| 65 | +[Realtime Console](https://github.com/openai/openai-realtime-console), you can |
| 66 | +instead connect to the relay server URL like so: |
| 67 | + |
| 68 | +```javascript |
| 69 | +const client = new RealtimeClient({ url: RELAY_SERVER_URL }); |
| 70 | +``` |
| 71 | + |
| 72 | +# Table of contents |
| 73 | + |
| 74 | +1. [Project structure](#project-structure) |
| 75 | +1. [Using the reference client](#using-the-reference-client) |
| 76 | + 1. [Sending messages](#sending-messages) |
| 77 | + 1. [Sending streaming audio](#sending-streaming-audio) |
| 78 | + 1. [Adding and using tools](#adding-and-using-tools) |
| 79 | + 1. [Interrupting the model](#interrupting-the-model) |
| 80 | +1. [Client events](#client-events) |
| 81 | + 1. [Reference Client Utility Events](#reference-client-utility-events) |
| 82 | +1. [Server events](#server-events) |
| 83 | +1. [Running tests](#running-tests) |
| 84 | +1. [Acknowledgements and contact](#acknowledgements-and-contact) |
| 85 | + |
| 86 | +# Project structure |
| 87 | + |
| 88 | +In this library, there are three primitives for interfacing with the Realtime API. |
| 89 | +We recommend starting with the `RealtimeClient`, but more advanced users may be |
| 90 | +more comfortable working closer to the metal. |
| 91 | + |
| 92 | +1. [`RealtimeClient`](./lib/client.js) |
| 93 | + - Primary abstraction for interfacing with the Realtime API |
| 94 | + - Enables rapid application development with a simplified control flow |
| 95 | + - Has custom `conversation.updated`, `conversation.item.appended`, `conversation.item.completed`, `conversation.interrupted` and `realtime.event` events |
| 96 | + - These events send item deltas and conversation history |
| 97 | +1. [`RealtimeAPI`](./lib/api.js) |
| 98 | + - Exists on client instance as `client.realtime` |
| 99 | + - Thin wrapper over [WebSocket](https://developer.mozilla.org/en-US/docs/Web/API/WebSocket) |
| 100 | + - Use this for connecting to the API, authenticating, and sending items |
| 101 | + - There is **no item validation**, you will have to rely on the API specification directly |
| 102 | + - Dispatches events as `server.{event_name}` and `client.{event_name}`, respectively |
| 103 | +1. [`RealtimeConversation`](./lib/conversation.js) |
| 104 | + - Exists on client instance as `client.conversation` |
| 105 | + - Stores a client-side cache of your current conversation |
| 106 | + - Has **event validation**, will validate incoming events to make sure it can cache them properly |
| 107 | + |
| 108 | +# Using the reference client |
| 109 | + |
| 110 | +The client comes packaged with some basic utilities that make it easy to build realtime |
| 111 | +apps quickly. |
| 112 | + |
| 113 | +## Sending messages |
| 114 | + |
| 115 | +Sending messages to the server from the user is easy. |
| 116 | + |
| 117 | +```javascript |
| 118 | +client.sendUserMessageContent([{ type: 'input_text', text: `How are you?` }]); |
| 119 | +// or (empty audio) |
| 120 | +client.sendUserMessageContent([ |
| 121 | + { type: 'input_audio', audio: new Int16Array(0) }, |
| 122 | +]); |
| 123 | +``` |
| 124 | + |
| 125 | +## Sending streaming audio |
| 126 | + |
| 127 | +To send streaming audio, use the `.appendInputAudio()` method. If you're in `turn_detection: 'disabled'` mode, |
| 128 | +then you need to use `.createResponse()` to tell the model to respond. |
| 129 | + |
| 130 | +```javascript |
| 131 | +// Send user audio, must be Int16Array or ArrayBuffer |
| 132 | +// Default audio format is pcm16 with sample rate of 24,000 Hz |
| 133 | +// This populates 1s of noise in 0.1s chunks |
| 134 | +for (let i = 0; i < 10; i++) { |
| 135 | + const data = new Int16Array(2400); |
| 136 | + for (let n = 0; n < 2400; n++) { |
| 137 | + const value = Math.floor((Math.random() * 2 - 1) * 0x8000); |
| 138 | + data[n] = value; |
| 139 | + } |
| 140 | + client.appendInputAudio(data); |
| 141 | +} |
| 142 | +// Pending audio is committed and model is asked to generate |
| 143 | +client.createResponse(); |
| 144 | +``` |
| 145 | + |
| 146 | +## Adding and using tools |
| 147 | + |
| 148 | +Working with tools is easy. Just call `.addTool()` and set a callback as the second parameter. |
| 149 | +The callback will be executed with the parameters for the tool, and the result will be automatically |
| 150 | +sent back to the model. |
| 151 | + |
| 152 | +```javascript |
| 153 | +// We can add tools as well, with callbacks specified |
| 154 | +client.addTool( |
| 155 | + { |
| 156 | + name: 'get_weather', |
| 157 | + description: |
| 158 | + 'Retrieves the weather for a given lat, lng coordinate pair. Specify a label for the location.', |
| 159 | + parameters: { |
| 160 | + type: 'object', |
| 161 | + properties: { |
| 162 | + lat: { |
| 163 | + type: 'number', |
| 164 | + description: 'Latitude', |
| 165 | + }, |
| 166 | + lng: { |
| 167 | + type: 'number', |
| 168 | + description: 'Longitude', |
| 169 | + }, |
| 170 | + location: { |
| 171 | + type: 'string', |
| 172 | + description: 'Name of the location', |
| 173 | + }, |
| 174 | + }, |
| 175 | + required: ['lat', 'lng', 'location'], |
| 176 | + }, |
| 177 | + }, |
| 178 | + async ({ lat, lng, location }) => { |
| 179 | + const result = await fetch( |
| 180 | + `https://api.open-meteo.com/v1/forecast?latitude=${lat}&longitude=${lng}¤t=temperature_2m,wind_speed_10m`, |
| 181 | + ); |
| 182 | + const json = await result.json(); |
| 183 | + return json; |
| 184 | + }, |
| 185 | +); |
| 186 | +``` |
| 187 | + |
| 188 | +## Interrupting the model |
| 189 | + |
| 190 | +You may want to manually interrupt the model, especially in `turn_detection: 'disabled'` mode. |
| 191 | +To do this, we can use: |
| 192 | + |
| 193 | +```javascript |
| 194 | +// id is the id of the item currently being generated |
| 195 | +// sampleCount is the number of audio samples that have been heard by the listener |
| 196 | +client.cancelResponse(id, sampleCount); |
| 197 | +``` |
| 198 | + |
| 199 | +This method will cause the model to immediately cease generation, but also truncate the |
| 200 | +item being played by removing all audio after `sampleCount` and clearing the text |
| 201 | +response. By using this method you can interrupt the model and prevent it from "remembering" |
| 202 | +anything it has generated that is ahead of where the user's state is. |
| 203 | + |
| 204 | +# Client events |
| 205 | + |
| 206 | +If you need more manual control and want to send custom client events according |
| 207 | +to the [Realtime Client Events API Reference](https://platform.openai.com/docs/api-reference/realtime-client-events), |
| 208 | +you can use `client.realtime.send()` like so: |
| 209 | + |
| 210 | +```javascript |
| 211 | +// manually send a function call output |
| 212 | +client.realtime.send('conversation.item.create', { |
| 213 | + item: { |
| 214 | + type: 'function_call_output', |
| 215 | + call_id: 'my-call-id', |
| 216 | + output: '{function_succeeded:true}', |
| 217 | + }, |
| 218 | +}); |
| 219 | +client.realtime.send('response.create'); |
| 220 | +``` |
| 221 | + |
| 222 | +## Reference client utility events |
| 223 | + |
| 224 | +With `RealtimeClient` we have reduced the event overhead from server events to **five** |
| 225 | +main events that are most critical for your application control flow. These events |
| 226 | +**are not** part of the API specification itself, but wrap logic to make application |
| 227 | +development easier. |
| 228 | + |
| 229 | +```javascript |
| 230 | +// errors like connection failures |
| 231 | +client.on('error', (event) => { |
| 232 | + // do thing |
| 233 | +}); |
| 234 | + |
| 235 | +// in VAD mode, the user starts speaking |
| 236 | +// we can use this to stop audio playback of a previous response if necessary |
| 237 | +client.on('conversation.interrupted', () => { |
| 238 | + /* do something */ |
| 239 | +}); |
| 240 | + |
| 241 | +// includes all changes to conversations |
| 242 | +// delta may be populated |
| 243 | +client.on('conversation.updated', ({ item, delta }) => { |
| 244 | + // get all items, e.g. if you need to update a chat window |
| 245 | + const items = client.conversation.getItems(); |
| 246 | + switch (item.type) { |
| 247 | + case 'message': |
| 248 | + // system, user, or assistant message (item.role) |
| 249 | + break; |
| 250 | + case 'function_call': |
| 251 | + // always a function call from the model |
| 252 | + break; |
| 253 | + case 'function_call_output': |
| 254 | + // always a response from the user / application |
| 255 | + break; |
| 256 | + } |
| 257 | + if (delta) { |
| 258 | + // Only one of the following will be populated for any given event |
| 259 | + // delta.audio = Int16Array, audio added |
| 260 | + // delta.transcript = string, transcript added |
| 261 | + // delta.arguments = string, function arguments added |
| 262 | + } |
| 263 | +}); |
| 264 | + |
| 265 | +// only triggered after item added to conversation |
| 266 | +client.on('conversation.item.appended', ({ item }) => { |
| 267 | + /* item status can be 'in_progress' or 'completed' */ |
| 268 | +}); |
| 269 | + |
| 270 | +// only triggered after item completed in conversation |
| 271 | +// will always be triggered after conversation.item.appended |
| 272 | +client.on('conversation.item.completed', ({ item }) => { |
| 273 | + /* item status will always be 'completed' */ |
| 274 | +}); |
| 275 | +``` |
| 276 | + |
| 277 | +# Server events |
| 278 | + |
| 279 | +If you want more control over your application development, you can use the |
| 280 | +`realtime.event` event and choose only to respond to **server** events. |
| 281 | +The full documentation for these events are available on |
| 282 | +the [Realtime Server Events API Reference](https://platform.openai.com/docs/api-reference/realtime-server-events). |
| 283 | + |
| 284 | +```javascript |
| 285 | +// all events, can use for logging, debugging, or manual event handling |
| 286 | +client.on('realtime.event', ({ time, source, event }) => { |
| 287 | + // time is an ISO timestamp |
| 288 | + // source is 'client' or 'server' |
| 289 | + // event is the raw event payload (json) |
| 290 | + if (source === 'server') { |
| 291 | + doSomething(event); |
| 292 | + } |
| 293 | +}); |
| 294 | +``` |
| 295 | + |
| 296 | +# Running tests |
| 297 | + |
| 298 | +You will need to make sure you have a `.env` file with `OPENAI_API_KEY=` set in order |
| 299 | +to run tests. From there, running the test suite is easy. |
| 300 | + |
| 301 | +```shell |
| 302 | +$ npm test |
| 303 | +``` |
| 304 | + |
| 305 | +To run tests with debug logs (will log events sent to and received from WebSocket), use: |
| 306 | + |
| 307 | +```shell |
| 308 | +$ npm test -- --debug |
| 309 | +``` |
| 310 | + |
| 311 | +# Acknowledgements and contact |
| 312 | + |
| 313 | +Thank you for checking out the Realtime API. Would love to hear from you. |
| 314 | +Special thanks to the Realtime API team for making this all possible. |
| 315 | + |
| 316 | +- OpenAI Developers / [@OpenAIDevs](https://x.com/OpenAIDevs) |
| 317 | +- Jordan Sitkin / API / [@dustmason](https://x.com/dustmason) |
| 318 | +- Mark Hudnall / API / [@landakram](https://x.com/landakram) |
| 319 | +- Peter Bakkum / API / [@pbbakkum](https://x.com/pbbakkum) |
| 320 | +- Atty Eleti / API / [@athyuttamre](https://x.com/athyuttamre) |
| 321 | +- Keith Horwood / API + DX / [@keithwhor](https://x.com/keithwhor) |
0 commit comments