Skip to content

[Bug Report] RealtimeSession: Voice stream works, but 'message' events do not fire & session.history() is empty #157

@chinesekeke

Description

@chinesekeke

Describe the bug
When using @openai/agents-realtime, a RealtimeSession successfully establishes a WebSocket connection (Status 101) and the end-to-end audio stream works (the user can hear the AI's synthesized voice response).

However, the SDK fails to deliver any text-based events to the client-side JavaScript. Both the session.on('message', ...) event listener never fires, and calling session.history() after the session returns an empty array. This makes it impossible to capture the conversation transcript for logging, memory, or any other application logic.

Debug information

  • Agents SDK version: v0.0.4
  • Runtime environment:
    • Browser: Chrome on macOS
    • Backend Node.js (for providing the clientKey): Node.js v22.16.0
    • Network: Accessing from Mainland China via a popular proxy client (ClashX Pro) with "Enhanced Mode" (TUN) enabled.

Repro steps

  1. Backend Code (server.js): A simple Express server to provide the clientKey.
import express from 'express';
import cors from 'cors';
import { getClientKey } from './openai.js'; // Assumes an openai.js helper from docs

const app = express();
app.use(cors());
app.post('/api/get-client-key', async (req, res) => {
  try {
    const keyData = await getClientKey();
    res.json(keyData);
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});
app.listen(3000, () => console.log('Server running on port 3000'));

  1. Frontend Code (main.js): A minimal script to demonstrate the issue.
import { RealtimeAgent, RealtimeSession } from '@openai/agents-realtime';

const startBtn = document.getElementById('start-btn'); // Assume these exist in HTML
const stopBtn = document.getElementById('stop-btn');
const statusDiv = document.getElementById('status');
let session = null;

async function startConversation() {
  try {
    console.log('1. Preparing session...');
    const agent = new RealtimeAgent({ name: 'TestAgent', instructions: 'Be brief.' });

    const resp = await fetch('http://localhost:3000/api/get-client-key', { method: 'POST' });
    if (!resp.ok) throw new Error('Failed to get clientKey');
    const { clientKey } = await resp.json();
    console.log('2. Got clientKey.');

    session = new RealtimeSession(agent, { model: 'gpt-4o' });

    console.log('3. Attaching event listeners...');
    session.on('message', (e) => {
      // THIS EVENT NEVER FIRES
      console.log("✅ 'message' event fired:", e.data);
    });
    session.on('error', (err) => {
      // This event DID fire once with an RTCErrorEvent during our tests
      console.error('❌ Session error:', err);
    });
    session.on('close', () => {
      console.log('Session closed.');
      const history = session.history();
      console.log('4. Final history on close:', history); // This logs an empty array []
    });

    await session.connect({ apiKey: clientKey });
    console.log('5. Session connected! Please speak.');
    statusDiv.textContent = '对话已启动,开始说话';

  } catch (err) {
    console.error("Failed to start conversation:", err);
    statusDiv.textContent = `启动失败: ${err.message}`;
  }
}

startBtn.addEventListener('click', startConversation);

To Reproduce:

  • Run the backend server with a valid OPENAI_API_KEY.
  • Serve the frontend HTML which includes themain.jsscript.
  • Open the page in the browser with the developer console open.
  • Click the start button and have a brief voice conversation with the AI.
  • Observe the browser console logs.

Expected behavior
After the user speaks and the AI responds, the session.on('message', ...) listener should fire multiple times (for both 'user' and 'assistant' roles). The console should log the "✅ 'message' event fired" messages. When the session is closed, the session.history() call should return an array of message objects reflecting the conversation.

Actual behavior
The WebSocket connects successfully (Network tab shows Status 101) and the audio stream works (the user can hear the AI's voice response).

However, the 'message' event listener never fires. The console never logs the "✅ 'message' event fired" message. The session.history() call always returns an empty array []. A low-level RTCErrorEvent was observed once during debugging, which strongly suggests a WebRTC data channel failure, likely due to the complex network environment (VPN/GFW).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions