Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected High Costs for Keep-Alive Requests with Sonnet Model #3548

Open
ledmaster opened this issue Mar 15, 2025 · 0 comments
Open

Unexpected High Costs for Keep-Alive Requests with Sonnet Model #3548

ledmaster opened this issue Mar 15, 2025 · 0 comments
Labels

Comments

@ledmaster
Copy link

Issue

Hi, there might be a bug in prompt caching when using Sonnet models via OpenRouter. It seems to not be asking for caching most of the prompts.

I tried checking if it was an OpenRouter issue, but querying Sonnet-OR via litellm charges cache prices correctly.

If we were caching the full inputs, each keep-alive should cost approximately: 14198/1e6 * 0.3 + 2/1e6 * 15 = 0,0042894

But it's costing 0,0378

I'm trying to debug the litellm requests on my local aider to see if I can find a bug, but you guys know the codebase much better, so this is why I opened an issue.

Here's an image of the costs of the keep-alive by Aider:

Image

Here's one of the transactions:

Image

Here's my .aider.conf

model: openrouter/anthropic/claude-3.7-sonnet
weak-model: openrouter/google/gemini-2.0-flash-lite-preview-02-05:free
timeout: 60
cache-prompts: true
stream: false
cache-keepalive-pings: 6

Thanks! Love aider!

Version and model info

Aider v0.77.0
Main model: openrouter/anthropic/claude-3.7-sonnet with diff edit format, prompt cache, infinite output
Weak model: openrouter/google/gemini-2.0-flash-lite-preview-02-05:free
Git repo: .git with 64 files
Repo-map: using 4096 tokens, files refresh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants