Skip to content

Conversation

@zhao-oai
Copy link
Collaborator

No description provided.

@zhao-oai
Copy link
Collaborator Author

@codex review please

@chatgpt-codex-connector
Copy link
Contributor

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@zhao-oai
Copy link
Collaborator Author

@codex what are the performance implications of this PR

@chatgpt-codex-connector
Copy link
Contributor

Note

You have reached your Codex usage limits. You can see your limits in the Codex usage dashboard.

@zhao-oai
Copy link
Collaborator Author

@codex what are the performance implications of this PR

@chatgpt-codex-connector
Copy link
Contributor

Note

You have reached your Codex usage limits. You can see your limits in the Codex usage dashboard.

@zhao-oai
Copy link
Collaborator Author

@codex hi

@chatgpt-codex-connector
Copy link
Contributor

Note

You have reached your Codex usage limits. You can see your limits in the Codex usage dashboard.

@zhao-oai
Copy link
Collaborator Author

@codex what are the performance implications of this PR

@chatgpt-codex-connector
Copy link
Contributor

Note

You have reached your Codex usage limits. You can see your limits in the Codex usage dashboard.

@zhao-oai
Copy link
Collaborator Author

@codex hello

@chatgpt-codex-connector
Copy link
Contributor

Note

You have reached your Codex usage limits. You can see your limits in the Codex usage dashboard.

@zhao-oai
Copy link
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link
Contributor

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@zhao-oai
Copy link
Collaborator Author

@codex what are the performance implications of this PR

@chatgpt-codex-connector
Copy link
Contributor

Key locations

Performance analysis

Request setup

  • Every call now builds a tokenizer (Tokenizer::for_model) and walks the entire messages payload to approximate prompt-token counts before the request is submitted. codex-rs/core/src/chat_completions.rsL47-L87 codex-rs/core/src/chat_completions.rsL121-L128
    • tiktoken_rs::get_bpe_from_model is reasonably fast when the model table is cached, but constructing a tokenizer still allocates and can add several hundred microseconds of CPU per request. On cold paths or when falling back to Tokenizer::try_default, the cost climbs because the BPE tables must be loaded.
    • The pre-scan token counting is O(total characters in the serialized prompt). Large conversation histories will incur proportional CPU time because each string is re-tokenized purely for accounting; this is extra work beyond the existing JSON assembly.

Streaming path

  • While consuming the SSE stream, every emitted assistant chunk, reasoning chunk, tool-call identifier, name, and argument fragment is re-tokenized to update the heuristic counters. codex-rs/core/src/chat_completions.rsL592-L712
    • This adds a Tokenizer::count call per delta. For tightly chunked providers (dozens or hundreds of deltas per turn) the additional CPU can be noticeable; each call runs a full BPE encode on the fragment even though the text is already being buffered for output.
    • Because the heuristic is optional, it is skipped only if tokenizer construction fails (rare in practice), so most streams will pay this per-delta cost.
  • The stream still performs the same number of allocations for downstream events; the only extra work at completion time is cloning the computed usage into the Completed event, which is negligible compared with the tokenization work. codex-rs/core/src/chat_completions.rsL525-L779

Memory & concurrency

  • The heuristic holds an Arc<Tokenizer> plus three i64 counters, so the steady-state memory footprint per stream increases by a few dozen bytes. No additional concurrency primitives were introduced, so there is no locking contention—just extra CPU time on the task that handles the stream. codex-rs/core/src/chat_completions.rsL39-L105
  • The added unit test runs only in CI and does not affect production performance, but it does validate that the Completed event now carries a usage estimate. codex-rs/core/tests/chat_completions_sse.rsL188-L229

Net effect

  • Costs: Extra CPU both before the request (prompt re-tokenization) and during streaming (per-delta BPE counting). These costs scale with prompt size and the number of streamed chunks, so high-traffic deployments should expect a modest increase in CPU utilization.
  • Benefits: Downstream consumers receive an approximate TokenUsage without waiting for provider-side accounting, which can eliminate separate follow-up calls or heuristics elsewhere in the stack, potentially offsetting work in other services. codex-rs/core/src/chat_completions.rsL525-L779
  • Overall: The change trades additional CPU on the chat-streaming task for richer metadata in the Completed event. Monitor throughput-critical nodes to ensure the added tokenization does not push them over latency or utilization budgets.

View task →

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants