Include the token usage for every conversation and workspace #788

aponcedeleonch · 2025-01-27T18:43:58Z

Closes: #418

This PR does introduces the changes necessary to track the used tokens per request, process them, and to return them in the API.

Specific changes:

Make sure we process all the stream and record to DB at the very end
Include the flag "stream_options": {"include_usage": True}, so the providers respond with the tokens
Added the necessary processing for the API
Modified the initial API models to display correctly the tokens and its price

Notable:

Ollama and Llama CPP tokens cannot be obtained with the current code. It's not critical since these are local providers and shouldn't represent any cost in tokens
- Ollama: If we switch our requests to not use the Python client but directly HTTP requests we would be able to obtain the tokens
- Llama CPP: I couldn't find a way to get the tokens when streaming the response.
Copilot is not returning the used tokens. Also not critical since Copilot charges a fixed price and not per token

Sample response

curl -Ss 'http://localhost:8989/api/v1/workspaces/tokens/token-usage' | jq
{
  "tokens_by_model": {
    "hosted_vllm/unsloth/Qwen2.5-Coder-32B-Instruct": {
      "provider_type": "vllm",
      "model": "hosted_vllm/unsloth/Qwen2.5-Coder-32B-Instruct",
      "token_usage": {
        "input_tokens": 30,
        "output_tokens": 10,
        "input_cost": 0.0,
        "output_cost": 0.0
      }
    },
    "gpt-4o-mini": {
      "provider_type": "openai",
      "model": "gpt-4o-mini",
      "token_usage": {
        "input_tokens": 8,
        "output_tokens": 10,
        "input_cost": 0.0000012,
        "output_cost": 0.000006
      }
    },
    "claude-3-5-sonnet-latest": {
      "provider_type": "anthropic",
      "model": "claude-3-5-sonnet-latest",
      "token_usage": {
        "input_tokens": 471,
        "output_tokens": 35,
        "input_cost": 0.001413,
        "output_cost": 0.000525
      }
    }
  },
  "token_usage": {
    "input_tokens": 509,
    "output_tokens": 55,
    "input_cost": 0.0014142,
    "output_cost": 0.000531
  }
}

src/codegate/db/token_usage.py

Related: #418 This PR does introduces the changes necessary to track the used tokens per request and then process them to return them in the API. Specific changes: - Make sure we process all the stream and record at the very end - Include the flag `"stream_options": {"include_usage": True},` so the providers respond with the tokens - Added the necessary processing for the API - Modified the initial API models to display correctly the tokens and its price

…ile periodically

* Include the token usage for every conversation and workspace Related: #418 This PR does introduces the changes necessary to track the used tokens per request and then process them to return them in the API. Specific changes: - Make sure we process all the stream and record at the very end - Include the flag `"stream_options": {"include_usage": True},` so the providers respond with the tokens - Added the necessary processing for the API - Modified the initial API models to display correctly the tokens and its price * Moved token recording to DB * Changed token usage code to get info from file and added GHA to get file periodically * formatting changes * Move model cost to dedicated folder * Fix problems with copilot streaming

aponcedeleonch self-assigned this Jan 27, 2025

aponcedeleonch marked this pull request as draft January 27, 2025 18:44

aponcedeleonch force-pushed the ask-llm-token-usage branch 2 times, most recently from 5a51e0c to 9dd2669 Compare January 28, 2025 11:18

aponcedeleonch marked this pull request as ready for review January 28, 2025 11:21

aponcedeleonch requested review from JAORMX and yrobla January 28, 2025 11:21

JAORMX reviewed Jan 28, 2025

View reviewed changes

src/codegate/db/token_usage.py Outdated Show resolved Hide resolved

aponcedeleonch force-pushed the ask-llm-token-usage branch from 9dd2669 to 6984169 Compare January 28, 2025 14:47

aponcedeleonch requested a review from JAORMX January 28, 2025 14:49

JAORMX previously approved these changes Jan 28, 2025

View reviewed changes

alex-mcgovern mentioned this pull request Jan 28, 2025

feat: show token usage on alerts table stacklok/codegate-ui#216

Merged

aponcedeleonch dismissed JAORMX’s stale review via bca6f87 January 29, 2025 05:34

aponcedeleonch added 6 commits January 29, 2025 10:30

Moved token recording to DB

651fce0

Changed token usage code to get info from file and added GHA to get f…

219babe

…ile periodically

formatting changes

274f134

Move model cost to dedicated folder

947815e

Fix problems with copilot streaming

b4e6432

aponcedeleonch force-pushed the ask-llm-token-usage branch from 676064c to b4e6432 Compare January 29, 2025 08:40

JAORMX approved these changes Jan 29, 2025

View reviewed changes

aponcedeleonch merged commit 1e8c1c9 into main Jan 29, 2025
4 checks passed

aponcedeleonch deleted the ask-llm-token-usage branch January 29, 2025 08:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Include the token usage for every conversation and workspace #788

Include the token usage for every conversation and workspace #788

Uh oh!

aponcedeleonch commented Jan 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Include the token usage for every conversation and workspace #788

Include the token usage for every conversation and workspace #788

Uh oh!

Conversation

aponcedeleonch commented Jan 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aponcedeleonch commented Jan 27, 2025 •

edited

Loading