Skip to content

Commit f329619

Browse files
committed
Merge branch 'main' of github.com:openai/openai-agents-python into alex/cleanup-tests
2 parents ea3e8ce + 6503220 commit f329619

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+4882
-141
lines changed

README.md

+2
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ source env/bin/activate
3030
pip install openai-agents
3131
```
3232

33+
For voice support, install with the optional `voice` group: `pip install 'openai-agents[voice]'`.
34+
3335
## Hello world example
3436

3537
```python

docs/ref/voice/events.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# `Events`
2+
3+
::: agents.voice.events

docs/ref/voice/exceptions.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# `Exceptions`
2+
3+
::: agents.voice.exceptions

docs/ref/voice/input.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# `Input`
2+
3+
::: agents.voice.input

docs/ref/voice/model.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# `Model`
2+
3+
::: agents.voice.model
+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# `OpenAIVoiceModelProvider`
2+
3+
::: agents.voice.models.openai_model_provider

docs/ref/voice/models/openai_stt.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# `OpenAI STT`
2+
3+
::: agents.voice.models.openai_stt

docs/ref/voice/models/openai_tts.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# `OpenAI TTS`
2+
3+
::: agents.voice.models.openai_tts

docs/ref/voice/pipeline.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# `Pipeline`
2+
3+
::: agents.voice.pipeline

docs/ref/voice/pipeline_config.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# `Pipeline Config`
2+
3+
::: agents.voice.pipeline_config

docs/ref/voice/result.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# `Result`
2+
3+
::: agents.voice.result

docs/ref/voice/utils.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# `Utils`
2+
3+
::: agents.voice.utils

docs/ref/voice/workflow.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# `Workflow`
2+
3+
::: agents.voice.workflow

docs/tracing.md

+8-1
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,9 @@ By default, the SDK traces the following:
3535
- Function tool calls are each wrapped in `function_span()`
3636
- Guardrails are wrapped in `guardrail_span()`
3737
- Handoffs are wrapped in `handoff_span()`
38+
- Audio inputs (speech-to-text) are wrapped in a `transcription_span()`
39+
- Audio outputs (text-to-speech) are wrapped in a `speech_span()`
40+
- Related audio spans may be parented under a `speech_group_span()`
3841

3942
By default, the trace is named "Agent trace". You can set this name if you use `trace`, or you can can configure the name and other properties with the [`RunConfig`][agents.run.RunConfig].
4043

@@ -76,7 +79,11 @@ Spans are automatically part of the current trace, and are nested under the near
7679

7780
## Sensitive data
7881

79-
Some spans track potentially sensitive data. For example, the `generation_span()` stores the inputs/outputs of the LLM generation, and `function_span()` stores the inputs/outputs of function calls. These may contain sensitive data, so you can disable capturing that data via [`RunConfig.trace_include_sensitive_data`][agents.run.RunConfig.trace_include_sensitive_data].
82+
Certain spans may capture potentially sensitive data.
83+
84+
The `generation_span()` stores the inputs/outputs of the LLM generation, and `function_span()` stores the inputs/outputs of function calls. These may contain sensitive data, so you can disable capturing that data via [`RunConfig.trace_include_sensitive_data`][agents.run.RunConfig.trace_include_sensitive_data].
85+
86+
Similarly, Audio spans include base64-encoded PCM data for input and output audio by default. You can disable capturing this audio data by configuring [`VoicePipelineConfig.trace_include_sensitive_audio_data`][agents.voice.pipeline_config.VoicePipelineConfig.trace_include_sensitive_audio_data].
8087

8188
## Custom tracing processors
8289

docs/voice/pipeline.md

+75
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Pipelines and workflows
2+
3+
[`VoicePipeline`][agents.voice.pipeline.VoicePipeline] is a class that makes it easy to turn your agentic workflows into a voice app. You pass in a workflow to run, and the pipeline takes care of transcribing input audio, detecting when the audio ends, calling your workflow at the right time, and turning the workflow output back into audio.
4+
5+
```mermaid
6+
graph LR
7+
%% Input
8+
A["🎤 Audio Input"]
9+
10+
%% Voice Pipeline
11+
subgraph Voice_Pipeline [Voice Pipeline]
12+
direction TB
13+
B["Transcribe (speech-to-text)"]
14+
C["Your Code"]:::highlight
15+
D["Text-to-speech"]
16+
B --> C --> D
17+
end
18+
19+
%% Output
20+
E["🎧 Audio Output"]
21+
22+
%% Flow
23+
A --> Voice_Pipeline
24+
Voice_Pipeline --> E
25+
26+
%% Custom styling
27+
classDef highlight fill:#ffcc66,stroke:#333,stroke-width:1px,font-weight:700;
28+
29+
```
30+
31+
## Configuring a pipeline
32+
33+
When you create a pipeline, you can set a few things:
34+
35+
1. The [`workflow`][agents.voice.workflow.VoiceWorkflowBase], which is the code that runs each time new audio is transcribed.
36+
2. The [`speech-to-text`][agents.voice.model.STTModel] and [`text-to-speech`][agents.voice.model.TTSModel] models used
37+
3. The [`config`][agents.voice.pipeline_config.VoicePipelineConfig], which lets you configure things like:
38+
- A model provider, which can map model names to models
39+
- Tracing, including whether to disable tracing, whether audio files are uploaded, the workflow name, trace IDs etc.
40+
- Settings on the TTS and STT models, like the prompt, language and data types used.
41+
42+
## Running a pipeline
43+
44+
You can run a pipeline via the [`run()`][agents.voice.pipeline.VoicePipeline.run] method, which lets you pass in audio input in two forms:
45+
46+
1. [`AudioInput`][agents.voice.input.AudioInput] is used when you have a full audio transcript, and just want to produce a result for it. This is useful in cases where you don't need to detect when a speaker is done speaking; for example, when you have pre-recorded audio or in push-to-talk apps where it's clear when the user is done speaking.
47+
2. [`StreamedAudioInput`][agents.voice.input.StreamedAudioInput] is used when you might need to detect when a user is done speaking. It allows you to push audio chunks as they are detected, and the voice pipeline will automatically run the agent workflow at the right time, via a process called "activity detection".
48+
49+
## Results
50+
51+
The result of a voice pipeline run is a [`StreamedAudioResult`][agents.voice.result.StreamedAudioResult]. This is an object that lets you stream events as they occur. There are a few kinds of [`VoiceStreamEvent`][agents.voice.events.VoiceStreamEvent], including:
52+
53+
1. [`VoiceStreamEventAudio`][agents.voice.events.VoiceStreamEventAudio], which contains a chunk of audio.
54+
2. [`VoiceStreamEventLifecycle`][agents.voice.events.VoiceStreamEventLifecycle], which informs you of lifecycle events like a turn starting or ending.
55+
3. [`VoiceStreamEventError`][agents.voice.events.VoiceStreamEventError], is an error event.
56+
57+
```python
58+
59+
result = await pipeline.run(input)
60+
61+
async for event in result.stream():
62+
if event.type == "voice_stream_event_audio":
63+
# play audio
64+
elif event.type == "voice_stream_event_lifecycle":
65+
# lifecycle
66+
elif event.type == "voice_stream_event_error"
67+
# error
68+
...
69+
```
70+
71+
## Best practices
72+
73+
### Interruptions
74+
75+
The Agents SDK currently does not support any built-in interruptions support for [`StreamedAudioInput`][agents.voice.input.StreamedAudioInput]. Instead for every detected turn it will trigger a separate run of your workflow. If you want to handle interruptions inside your application you can listen to the [`VoiceStreamEventLifecycle`][agents.voice.events.VoiceStreamEventLifecycle] events. `turn_started` will indicate that a new turn was transcribed and processing is beginning. `turn_ended` will trigger after all the audio was dispatched for a respective turn. You could use these events to mute the microphone of the speaker when the model starts a turn and unmute it after you flushed all the related audio for a turn.

docs/voice/quickstart.md

+191
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
# Quickstart
2+
3+
## Prerequisites
4+
5+
Make sure you've followed the base [quickstart instructions](../quickstart.md) for the Agents SDK, and set up a virtual environment. Then, install the optional voice dependencies from the SDK:
6+
7+
```bash
8+
pip install 'openai-agents[voice]'
9+
```
10+
11+
## Concepts
12+
13+
The main concept to know about is a [`VoicePipeline`][agents.voice.pipeline.VoicePipeline], which is a 3 step process:
14+
15+
1. Run a speech-to-text model to turn audio into text.
16+
2. Run your code, which is usually an agentic workflow, to produce a result.
17+
3. Run a text-to-speech model to turn the result text back into audio.
18+
19+
```mermaid
20+
graph LR
21+
%% Input
22+
A["🎤 Audio Input"]
23+
24+
%% Voice Pipeline
25+
subgraph Voice_Pipeline [Voice Pipeline]
26+
direction TB
27+
B["Transcribe (speech-to-text)"]
28+
C["Your Code"]:::highlight
29+
D["Text-to-speech"]
30+
B --> C --> D
31+
end
32+
33+
%% Output
34+
E["🎧 Audio Output"]
35+
36+
%% Flow
37+
A --> Voice_Pipeline
38+
Voice_Pipeline --> E
39+
40+
%% Custom styling
41+
classDef highlight fill:#ffcc66,stroke:#333,stroke-width:1px,font-weight:700;
42+
43+
```
44+
45+
## Agents
46+
47+
First, let's set up some Agents. This should feel familiar to you if you've built any agents with this SDK. We'll have a couple of Agents, a handoff, and a tool.
48+
49+
```python
50+
import asyncio
51+
import random
52+
53+
from agents import (
54+
Agent,
55+
function_tool,
56+
)
57+
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
58+
59+
60+
61+
@function_tool
62+
def get_weather(city: str) -> str:
63+
"""Get the weather for a given city."""
64+
print(f"[debug] get_weather called with city: {city}")
65+
choices = ["sunny", "cloudy", "rainy", "snowy"]
66+
return f"The weather in {city} is {random.choice(choices)}."
67+
68+
69+
spanish_agent = Agent(
70+
name="Spanish",
71+
handoff_description="A spanish speaking agent.",
72+
instructions=prompt_with_handoff_instructions(
73+
"You're speaking to a human, so be polite and concise. Speak in Spanish.",
74+
),
75+
model="gpt-4o-mini",
76+
)
77+
78+
agent = Agent(
79+
name="Assistant",
80+
instructions=prompt_with_handoff_instructions(
81+
"You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",
82+
),
83+
model="gpt-4o-mini",
84+
handoffs=[spanish_agent],
85+
tools=[get_weather],
86+
)
87+
```
88+
89+
## Voice pipeline
90+
91+
We'll set up a simple voice pipeline, using [`SingleAgentVoiceWorkflow`][agents.voice.workflow.SingleAgentVoiceWorkflow] as the workflow.
92+
93+
```python
94+
from agents.voice import SingleAgentVoiceWorkflow, VoicePipeline,
95+
pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
96+
```
97+
98+
## Run the pipeline
99+
100+
```python
101+
import numpy as np
102+
import sounddevice as sd
103+
104+
# For simplicity, we'll just create 3 seconds of silence
105+
# In reality, you'd get microphone data
106+
audio = np.zeros(24000 * 3, dtype=np.int16)
107+
result = await pipeline.run(audio_input)
108+
109+
# Create an audio player using `sounddevice`
110+
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
111+
player.start()
112+
113+
# Play the audio stream as it comes in
114+
async for event in result.stream():
115+
if event.type == "voice_stream_event_audio":
116+
player.write(event.data)
117+
118+
```
119+
120+
## Put it all together
121+
122+
```python
123+
import asyncio
124+
import random
125+
126+
import numpy as np
127+
import sounddevice as sd
128+
129+
from agents import (
130+
Agent,
131+
function_tool,
132+
set_tracing_disabled,
133+
)
134+
from agents.voice import (
135+
AudioInput,
136+
SingleAgentVoiceWorkflow,
137+
VoicePipeline,
138+
)
139+
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
140+
141+
142+
@function_tool
143+
def get_weather(city: str) -> str:
144+
"""Get the weather for a given city."""
145+
print(f"[debug] get_weather called with city: {city}")
146+
choices = ["sunny", "cloudy", "rainy", "snowy"]
147+
return f"The weather in {city} is {random.choice(choices)}."
148+
149+
150+
spanish_agent = Agent(
151+
name="Spanish",
152+
handoff_description="A spanish speaking agent.",
153+
instructions=prompt_with_handoff_instructions(
154+
"You're speaking to a human, so be polite and concise. Speak in Spanish.",
155+
),
156+
model="gpt-4o-mini",
157+
)
158+
159+
agent = Agent(
160+
name="Assistant",
161+
instructions=prompt_with_handoff_instructions(
162+
"You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",
163+
),
164+
model="gpt-4o-mini",
165+
handoffs=[spanish_agent],
166+
tools=[get_weather],
167+
)
168+
169+
170+
async def main():
171+
pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))
172+
buffer = np.zeros(24000 * 3, dtype=np.int16)
173+
audio_input = AudioInput(buffer=buffer)
174+
175+
result = await pipeline.run(audio_input)
176+
177+
# Create an audio player using `sounddevice`
178+
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
179+
player.start()
180+
181+
# Play the audio stream as it comes in
182+
async for event in result.stream():
183+
if event.type == "voice_stream_event_audio":
184+
player.write(event.data)
185+
186+
187+
if __name__ == "__main__":
188+
asyncio.run(main())
189+
```
190+
191+
If you run this example, the agent will speak to you! Check out the example in [examples/voice/static](https://github.com/openai/openai-agents-python/tree/main/examples/voice/static) to see a demo where you can speak to the agent yourself.

docs/voice/tracing.md

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Tracing
2+
3+
Just like the way [agents are traced](../tracing.md), voice pipelines are also automatically traced.
4+
5+
You can read the tracing doc above for basic tracing information, but you can additionally configure tracing of a pipeline via [`VoicePipelineConfig`][agents.voice.pipeline_config.VoicePipelineConfig].
6+
7+
Key tracing related fields are:
8+
9+
- [`tracing_disabled`][agents.voice.pipeline_config.VoicePipelineConfig.tracing_disabled]: controls whether tracing is disabled. By default, tracing is enabled.
10+
- [`trace_include_sensitive_data`][agents.voice.pipeline_config.VoicePipelineConfig.trace_include_sensitive_data]: controls whether traces include potentially sensitive data, like audio transcripts. This is specifically for the voice pipeline, and not for anything that goes on inside your Workflow.
11+
- [`trace_include_sensitive_audio_data`][agents.voice.pipeline_config.VoicePipelineConfig.trace_include_sensitive_audio_data]: controls whether traces include audio data.
12+
- [`workflow_name`][agents.voice.pipeline_config.VoicePipelineConfig.workflow_name]: The name of the trace workflow.
13+
- [`group_id`][agents.voice.pipeline_config.VoicePipelineConfig.group_id]: The `group_id` of the trace, which lets you link multiple traces.
14+
- [`trace_metadata`][agents.voice.pipeline_config.VoicePipelineConfig.tracing_disabled]: Additional metadata to include with the trace.

0 commit comments

Comments
 (0)