Skip to content

Commit 462f3cd

Browse files
Initial public commit
0 parents  commit 462f3cd

32 files changed

+5392
-0
lines changed

.gitignore

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.DS_Store
2+
node_modules/
3+
.env

.npmignore

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.DS_Store
2+
node_modules/
3+
.env
4+
test/

.prettierrc

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"tabWidth": 2,
3+
"useTabs": false,
4+
"singleQuote": true
5+
}

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2024 OpenAI
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+321
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,321 @@
1+
# Reference Client: Realtime API (beta)
2+
3+
This repository contains a reference client aka sample library for connecting
4+
to OpenAI's Realtime API.
5+
**This library is in beta and should not be treated as a final implementation.**
6+
You can use it to easily prototype conversational apps.
7+
8+
# Quickstart
9+
10+
This library is built to be used both server-side (Node.js) and in browser (React, Vue),
11+
in both JavaScript and TypeScript codebases. While in beta, to install the library you will
12+
need to `npm install` directly from the GitHub repository.
13+
14+
```shell
15+
$ npm i openai/openai-realtime-api-beta --save
16+
```
17+
18+
```javascript
19+
import { RealtimeClient } from '@openai/realtime-api-beta';
20+
21+
const client = new RealtimeClient({ apiKey: process.env.OPENAI_API_KEY });
22+
23+
// Can set parameters ahead of connecting, either separately or all at once
24+
client.updateSession({ instructions: 'You are a great, upbeat friend.' });
25+
client.updateSession({ voice: 'alloy' });
26+
client.updateSession({
27+
turn_detection: { type: 'none' }, // or 'server_vad'
28+
input_audio_transcription: { model: 'whisper-1' },
29+
});
30+
31+
// Set up event handling
32+
client.on('conversation.updated', (event) => {
33+
const { item, delta } = event;
34+
const items = client.conversation.getItems();
35+
/**
36+
* item is the current item being updated
37+
* delta can be null or populated
38+
* you can fetch a full list of items at any time
39+
*/
40+
});
41+
42+
// Connect to Realtime API
43+
await client.connect();
44+
45+
// Send a item and triggers a generation
46+
client.sendUserMessageContent([{ type: 'input_text', text: `How are you?` }]);
47+
```
48+
49+
## Browser (front-end) quickstart
50+
51+
You can use this client directly from the browser in e.g. React or Vue apps.
52+
**We do not recommend this, your API keys are at risk if you connect to OpenAI directly from the browser.**
53+
In order to instantiate the client in a browser environment, use:
54+
55+
```javascript
56+
import { RealtimeClient } from '@openai/realtime-api-beta';
57+
58+
const client = new RealtimeClient({
59+
apiKey: process.env.OPENAI_API_KEY,
60+
dangerouslyAllowAPIKeyInBrowser: true,
61+
});
62+
```
63+
64+
If you are running your own relay server, e.g. with the
65+
[Realtime Console](https://github.com/openai/openai-realtime-console), you can
66+
instead connect to the relay server URL like so:
67+
68+
```javascript
69+
const client = new RealtimeClient({ url: RELAY_SERVER_URL });
70+
```
71+
72+
# Table of contents
73+
74+
1. [Project structure](#project-structure)
75+
1. [Using the reference client](#using-the-reference-client)
76+
1. [Sending messages](#sending-messages)
77+
1. [Sending streaming audio](#sending-streaming-audio)
78+
1. [Adding and using tools](#adding-and-using-tools)
79+
1. [Interrupting the model](#interrupting-the-model)
80+
1. [Client events](#client-events)
81+
1. [Reference Client Utility Events](#reference-client-utility-events)
82+
1. [Server events](#server-events)
83+
1. [Running tests](#running-tests)
84+
1. [Acknowledgements and contact](#acknowledgements-and-contact)
85+
86+
# Project structure
87+
88+
In this library, there are three primitives for interfacing with the Realtime API.
89+
We recommend starting with the `RealtimeClient`, but more advanced users may be
90+
more comfortable working closer to the metal.
91+
92+
1. [`RealtimeClient`](./lib/client.js)
93+
- Primary abstraction for interfacing with the Realtime API
94+
- Enables rapid application development with a simplified control flow
95+
- Has custom `conversation.updated`, `conversation.item.appended`, `conversation.item.completed`, `conversation.interrupted` and `realtime.event` events
96+
- These events send item deltas and conversation history
97+
1. [`RealtimeAPI`](./lib/api.js)
98+
- Exists on client instance as `client.realtime`
99+
- Thin wrapper over [WebSocket](https://developer.mozilla.org/en-US/docs/Web/API/WebSocket)
100+
- Use this for connecting to the API, authenticating, and sending items
101+
- There is **no item validation**, you will have to rely on the API specification directly
102+
- Dispatches events as `server.{event_name}` and `client.{event_name}`, respectively
103+
1. [`RealtimeConversation`](./lib/conversation.js)
104+
- Exists on client instance as `client.conversation`
105+
- Stores a client-side cache of your current conversation
106+
- Has **event validation**, will validate incoming events to make sure it can cache them properly
107+
108+
# Using the reference client
109+
110+
The client comes packaged with some basic utilities that make it easy to build realtime
111+
apps quickly.
112+
113+
## Sending messages
114+
115+
Sending messages to the server from the user is easy.
116+
117+
```javascript
118+
client.sendUserMessageContent([{ type: 'input_text', text: `How are you?` }]);
119+
// or (empty audio)
120+
client.sendUserMessageContent([
121+
{ type: 'input_audio', audio: new Int16Array(0) },
122+
]);
123+
```
124+
125+
## Sending streaming audio
126+
127+
To send streaming audio, use the `.appendInputAudio()` method. If you're in `turn_detection: 'disabled'` mode,
128+
then you need to use `.createResponse()` to tell the model to respond.
129+
130+
```javascript
131+
// Send user audio, must be Int16Array or ArrayBuffer
132+
// Default audio format is pcm16 with sample rate of 24,000 Hz
133+
// This populates 1s of noise in 0.1s chunks
134+
for (let i = 0; i < 10; i++) {
135+
const data = new Int16Array(2400);
136+
for (let n = 0; n < 2400; n++) {
137+
const value = Math.floor((Math.random() * 2 - 1) * 0x8000);
138+
data[n] = value;
139+
}
140+
client.appendInputAudio(data);
141+
}
142+
// Pending audio is committed and model is asked to generate
143+
client.createResponse();
144+
```
145+
146+
## Adding and using tools
147+
148+
Working with tools is easy. Just call `.addTool()` and set a callback as the second parameter.
149+
The callback will be executed with the parameters for the tool, and the result will be automatically
150+
sent back to the model.
151+
152+
```javascript
153+
// We can add tools as well, with callbacks specified
154+
client.addTool(
155+
{
156+
name: 'get_weather',
157+
description:
158+
'Retrieves the weather for a given lat, lng coordinate pair. Specify a label for the location.',
159+
parameters: {
160+
type: 'object',
161+
properties: {
162+
lat: {
163+
type: 'number',
164+
description: 'Latitude',
165+
},
166+
lng: {
167+
type: 'number',
168+
description: 'Longitude',
169+
},
170+
location: {
171+
type: 'string',
172+
description: 'Name of the location',
173+
},
174+
},
175+
required: ['lat', 'lng', 'location'],
176+
},
177+
},
178+
async ({ lat, lng, location }) => {
179+
const result = await fetch(
180+
`https://api.open-meteo.com/v1/forecast?latitude=${lat}&longitude=${lng}&current=temperature_2m,wind_speed_10m`,
181+
);
182+
const json = await result.json();
183+
return json;
184+
},
185+
);
186+
```
187+
188+
## Interrupting the model
189+
190+
You may want to manually interrupt the model, especially in `turn_detection: 'disabled'` mode.
191+
To do this, we can use:
192+
193+
```javascript
194+
// id is the id of the item currently being generated
195+
// sampleCount is the number of audio samples that have been heard by the listener
196+
client.cancelResponse(id, sampleCount);
197+
```
198+
199+
This method will cause the model to immediately cease generation, but also truncate the
200+
item being played by removing all audio after `sampleCount` and clearing the text
201+
response. By using this method you can interrupt the model and prevent it from "remembering"
202+
anything it has generated that is ahead of where the user's state is.
203+
204+
# Client events
205+
206+
If you need more manual control and want to send custom client events according
207+
to the [Realtime Client Events API Reference](https://platform.openai.com/docs/api-reference/realtime-client-events),
208+
you can use `client.realtime.send()` like so:
209+
210+
```javascript
211+
// manually send a function call output
212+
client.realtime.send('conversation.item.create', {
213+
item: {
214+
type: 'function_call_output',
215+
call_id: 'my-call-id',
216+
output: '{function_succeeded:true}',
217+
},
218+
});
219+
client.realtime.send('response.create');
220+
```
221+
222+
## Reference client utility events
223+
224+
With `RealtimeClient` we have reduced the event overhead from server events to **five**
225+
main events that are most critical for your application control flow. These events
226+
**are not** part of the API specification itself, but wrap logic to make application
227+
development easier.
228+
229+
```javascript
230+
// errors like connection failures
231+
client.on('error', (event) => {
232+
// do thing
233+
});
234+
235+
// in VAD mode, the user starts speaking
236+
// we can use this to stop audio playback of a previous response if necessary
237+
client.on('conversation.interrupted', () => {
238+
/* do something */
239+
});
240+
241+
// includes all changes to conversations
242+
// delta may be populated
243+
client.on('conversation.updated', ({ item, delta }) => {
244+
// get all items, e.g. if you need to update a chat window
245+
const items = client.conversation.getItems();
246+
switch (item.type) {
247+
case 'message':
248+
// system, user, or assistant message (item.role)
249+
break;
250+
case 'function_call':
251+
// always a function call from the model
252+
break;
253+
case 'function_call_output':
254+
// always a response from the user / application
255+
break;
256+
}
257+
if (delta) {
258+
// Only one of the following will be populated for any given event
259+
// delta.audio = Int16Array, audio added
260+
// delta.transcript = string, transcript added
261+
// delta.arguments = string, function arguments added
262+
}
263+
});
264+
265+
// only triggered after item added to conversation
266+
client.on('conversation.item.appended', ({ item }) => {
267+
/* item status can be 'in_progress' or 'completed' */
268+
});
269+
270+
// only triggered after item completed in conversation
271+
// will always be triggered after conversation.item.appended
272+
client.on('conversation.item.completed', ({ item }) => {
273+
/* item status will always be 'completed' */
274+
});
275+
```
276+
277+
# Server events
278+
279+
If you want more control over your application development, you can use the
280+
`realtime.event` event and choose only to respond to **server** events.
281+
The full documentation for these events are available on
282+
the [Realtime Server Events API Reference](https://platform.openai.com/docs/api-reference/realtime-server-events).
283+
284+
```javascript
285+
// all events, can use for logging, debugging, or manual event handling
286+
client.on('realtime.event', ({ time, source, event }) => {
287+
// time is an ISO timestamp
288+
// source is 'client' or 'server'
289+
// event is the raw event payload (json)
290+
if (source === 'server') {
291+
doSomething(event);
292+
}
293+
});
294+
```
295+
296+
# Running tests
297+
298+
You will need to make sure you have a `.env` file with `OPENAI_API_KEY=` set in order
299+
to run tests. From there, running the test suite is easy.
300+
301+
```shell
302+
$ npm test
303+
```
304+
305+
To run tests with debug logs (will log events sent to and received from WebSocket), use:
306+
307+
```shell
308+
$ npm test -- --debug
309+
```
310+
311+
# Acknowledgements and contact
312+
313+
Thank you for checking out the Realtime API. Would love to hear from you.
314+
Special thanks to the Realtime API team for making this all possible.
315+
316+
- OpenAI Developers / [@OpenAIDevs](https://x.com/OpenAIDevs)
317+
- Jordan Sitkin / API / [@dustmason](https://x.com/dustmason)
318+
- Mark Hudnall / API / [@landakram](https://x.com/landakram)
319+
- Peter Bakkum / API / [@pbbakkum](https://x.com/pbbakkum)
320+
- Atty Eleti / API / [@athyuttamre](https://x.com/athyuttamre)
321+
- Keith Horwood / API + DX / [@keithwhor](https://x.com/keithwhor)

dist/index.d.ts

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
import { RealtimeAPI } from './lib/api.js';
2+
import { RealtimeConversation } from './lib/conversation.js';
3+
import { RealtimeClient } from './lib/client.js';
4+
import { RealtimeUtils } from './lib/utils.js';
5+
export { RealtimeAPI, RealtimeConversation, RealtimeClient, RealtimeUtils };
6+
//# sourceMappingURL=index.d.ts.map

dist/index.d.ts.map

+1
Original file line numberDiff line numberDiff line change

0 commit comments

Comments
 (0)