Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include the token usage for every conversation and workspace #788

Merged
merged 6 commits into from
Jan 29, 2025

Conversation

aponcedeleonch
Copy link
Contributor

@aponcedeleonch aponcedeleonch commented Jan 27, 2025

Closes: #418

This PR does introduces the changes necessary to track the used tokens per request, process them, and to return them in the API.

Specific changes:

  • Make sure we process all the stream and record to DB at the very end
  • Include the flag "stream_options": {"include_usage": True}, so the providers respond with the tokens
  • Added the necessary processing for the API
  • Modified the initial API models to display correctly the tokens and its price

Notable:

  • Ollama and Llama CPP tokens cannot be obtained with the current code. It's not critical since these are local providers and shouldn't represent any cost in tokens
    • Ollama: If we switch our requests to not use the Python client but directly HTTP requests we would be able to obtain the tokens
    • Llama CPP: I couldn't find a way to get the tokens when streaming the response.
  • Copilot is not returning the used tokens. Also not critical since Copilot charges a fixed price and not per token

Sample response

curl -Ss 'http://localhost:8989/api/v1/workspaces/tokens/token-usage' | jq
{
  "tokens_by_model": {
    "hosted_vllm/unsloth/Qwen2.5-Coder-32B-Instruct": {
      "provider_type": "vllm",
      "model": "hosted_vllm/unsloth/Qwen2.5-Coder-32B-Instruct",
      "token_usage": {
        "input_tokens": 30,
        "output_tokens": 10,
        "input_cost": 0.0,
        "output_cost": 0.0
      }
    },
    "gpt-4o-mini": {
      "provider_type": "openai",
      "model": "gpt-4o-mini",
      "token_usage": {
        "input_tokens": 8,
        "output_tokens": 10,
        "input_cost": 0.0000012,
        "output_cost": 0.000006
      }
    },
    "claude-3-5-sonnet-latest": {
      "provider_type": "anthropic",
      "model": "claude-3-5-sonnet-latest",
      "token_usage": {
        "input_tokens": 471,
        "output_tokens": 35,
        "input_cost": 0.001413,
        "output_cost": 0.000525
      }
    }
  },
  "token_usage": {
    "input_tokens": 509,
    "output_tokens": 55,
    "input_cost": 0.0014142,
    "output_cost": 0.000531
  }
}

@aponcedeleonch aponcedeleonch self-assigned this Jan 27, 2025
@aponcedeleonch aponcedeleonch marked this pull request as draft January 27, 2025 18:44
@aponcedeleonch aponcedeleonch force-pushed the ask-llm-token-usage branch 2 times, most recently from 5a51e0c to 9dd2669 Compare January 28, 2025 11:18
@aponcedeleonch aponcedeleonch marked this pull request as ready for review January 28, 2025 11:21
JAORMX
JAORMX previously approved these changes Jan 28, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Related: #418

This PR does introduces the changes necessary to track the
used tokens per request and then process them to return them
in the API.

Specific changes:
- Make sure we process all the stream and record at the very end
- Include the flag `"stream_options": {"include_usage": True},` so
the providers respond with the tokens
- Added the necessary processing for the API
- Modified the initial API models to display correctly the tokens and its price
@aponcedeleonch aponcedeleonch merged commit 1e8c1c9 into main Jan 29, 2025
4 checks passed
@aponcedeleonch aponcedeleonch deleted the ask-llm-token-usage branch January 29, 2025 08:51
lukehinds pushed a commit that referenced this pull request Jan 31, 2025
* Include the token usage for every conversation and workspace

Related: #418

This PR does introduces the changes necessary to track the
used tokens per request and then process them to return them
in the API.

Specific changes:
- Make sure we process all the stream and record at the very end
- Include the flag `"stream_options": {"include_usage": True},` so
the providers respond with the tokens
- Added the necessary processing for the API
- Modified the initial API models to display correctly the tokens and its price

* Moved token recording to DB

* Changed token usage code to get info from file and added GHA to get file periodically

* formatting changes

* Move model cost to dedicated folder

* Fix problems with copilot streaming
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Display Token usage
2 participants