Skip to content

chat({ stream: false }) still streams on the wire (uses runStreamingText + streamToText) #557

@tombeckenham

Description

@tombeckenham

Summary

Calling chat({ stream: false }) does not actually send stream: false to the provider. Internally it calls runStreamingText and concatenates the SSE stream into a string via streamToText. The wire request still has Accept: text/event-stream and "stream": true in the body. The only code path that sends a wire-level non-streaming request is chat({ outputSchema })runAgenticStructuredOutputadapter.structuredOutput.

Reproduction

import { chat } from '@tanstack/ai'
import { openRouterText } from '@tanstack/ai-openrouter'

const adapter = openRouterText('x-ai/grok-4.3', { /* … */ })
const text = await chat({
  adapter,
  messages: [{ role: 'user', content: 'hello' }],
  stream: false,
  modelOptions: {
    responseFormat: { type: 'json_schema', jsonSchema: { /* … */ } },
  },
})

Wire capture: Accept: text/event-stream, body "stream": true. OpenRouter responds with SSE, SDK concatenates.

Root cause (current main)

packages/typescript/ai/src/activities/chat/index.ts:1609-1631 (dispatch):

if (outputSchema) return runAgenticStructuredOutput(options)
if (stream === false) return runNonStreamingText(options)
return runStreamingText(options)

packages/typescript/ai/src/activities/chat/index.ts:1666-1675 (the offender):

function runNonStreamingText(options): Promise<string> {
  const stream = runStreamingText(options)
  return streamToText(stream)
}

runStreamingTextTextEngine.streamModelResponseadapter.chatStream(...). The OpenRouter adapter's chatStream hardcodes stream: true (packages/typescript/ai-openrouter/src/adapters/text.ts:131). Its structuredOutput is the only place that sends stream: false (packages/typescript/ai-openrouter/src/adapters/text.ts:214).

The adapter interface (packages/typescript/ai/src/activities/chat/adapter.ts:59-120) only defines chatStream, structuredOutput, and optional structuredOutputStream — there is no non-streaming chat() method on adapters today.

Why it matters

Reasoning models under concurrent load (Grok 4.3 via OpenRouter, in our case) can take 30s+ of pure reasoning before any content emission. Observed wire behavior with the same prompt × 6 parallel calls:

  • streaming, no proxy: 25–52s wall-clock each, all clean
  • non-streaming (true wire-level), no proxy: 22–41s wall-clock each, all clean
  • streaming through a proxy with a 30s socket idle timeout: one of the six truncates mid-stream because OpenRouter sends nothing for >30s during the reasoning phase

If chat({ stream: false }) actually sent stream: false, that proxy-idle-timeout class of bug would not apply and fixtures/replay paths would be a single JSON body.

Proposed fix

  1. Add a non-streaming method to the adapter interface alongside chatStream / structuredOutput:
    chat(options: ChatStreamOptions): Promise<{
      content: string
      reasoning?: string
      toolCalls?: 
      usage?: 
    }>
  2. OpenRouter implements it as a single this.client.chat.send({ chatRequest: { …, stream: false } }) returning result.choices[0].message.content — mirroring structuredOutput minus the schema enforcement.
  3. Rewire runNonStreamingText to call adapter.chat(...) directly instead of runStreamingText + streamToText.
  4. For adapters that don't implement chat(), fall back to current behavior and emit a one-time warning so users know they're not getting wire-level non-streaming.

Happy to send a PR if the shape sounds right.

Environment

  • @tanstack/ai: 0.14.0
  • @tanstack/ai-openrouter: 0.8.2
  • Node 24.x (Bun 1.x)
  • Provider: x-ai/grok-4.3 via OpenRouter

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions