fix: re-queue failed DB writes with bounded queue cap #15923

AlexsanderHamir · 2025-10-25T00:27:07Z

Title

fix: re-queue failed DB writes with bounded queue cap

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
I have added a screenshot of my new test passing locally
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🧹 Refactoring

Changes

Re-queues failed transactions for retry when DB recovers
Caps re-queueing at MAX_QUEUE_SIZE_BEFORE_REQUEUE (5000 items) to prevent unbounded growth during prolonged DB outages
Drops transactions beyond cap to prioritize stability over completeness

Impact

Cascading failures previously caused many queued objects to remain orphaned in memory, leading to excessive garbage collection pressure.
Although retries already exist for database failures, some transactions were still being lost. Re-queuing failed data increases the likelihood of successful recovery without data loss.

- Add ProxyTimingMiddleware to capture request start time before auth - Ensures overhead measurement includes authentication, rate limiting, etc. - Update common_request_processing to use middleware timing - Middleware positioned before auth to measure complete proxy latency

…/Router

Replace json.dumps with orjson.dumps in HTTP handler to reduce serialization latency for all LLM provider API calls.

Replace json.dumps/loads with orjson in streaming hot paths. Saves ~350ms per 1000-chunk streaming response.

- Move orjson from optional to required dependencies in pyproject.toml - Fixes ModuleNotFoundError when importing litellm core modules - orjson is used in llm_http_handler.py which is imported by core litellm

- Optimize jsonify_object() (24 call sites across codebase) - Optimize get_request_status() for metadata parsing Reduces CPU usage and improves database write latency.

- serialize_object() uses orjson for 3-5x faster dict serialization - get_prompt_caching_cache_key() skips encode/decode cycle - Maintains cache key compatibility

- Maintains circular reference detection - Preserves default=str fallback behavior - Used in 15+ files across proxy and integrations

- set_cache: orjson.dumps() returns bytes directly for S3 - get_cache: orjson.loads() parses bytes without decode step - 3-5x faster serialization + skips UTF-8 encode/decode

- Added @lru_cache(1024) to get_cooldown_cache_key() - Changed all 4 locations to use the cached method instead of recreating strings - Replaced f-strings with string concatenation for better performance Results: get_cooldown_cache_key dropped from 47MB to a few bytes and is no longer a top consumer. Memory leak still present. Next: Optimize heavy memory consumers so memory leaks become more obvious.

vercel · 2025-10-25T00:27:14Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
litellm	Ready	Preview	Comment	Oct 25, 2025 9:02pm

When database writes fail, flushed queue items were being dropped, leaving orphaned objects in memory until GC'd, adding pressure on the garbage collector. This change: - Re-queues failed transactions for retry when DB recovers - Caps re-queueing at MAX_QUEUE_SIZE_BEFORE_REQUEUE (5000 items) to prevent unbounded growth during prolonged DB outages - Drops transactions beyond cap to prioritize stability over completeness

…leak_on_db_failure

AlexsanderHamir added 13 commits October 20, 2025 17:35

[feat] tests: add configurable performance regression testing for SDK…

d39c821

…/Router

fix: remove unused import

7c57a8b

perf: use orjson for faster request serialization

e7daf6d

Replace json.dumps with orjson.dumps in HTTP handler to reduce serialization latency for all LLM provider API calls.

perf: use orjson for streaming serialization (2-5x faster)

74e7a6e

Replace json.dumps/loads with orjson in streaming hot paths. Saves ~350ms per 1000-chunk streaming response.

fix: make orjson a required dependency

8cb8a2a

- Move orjson from optional to required dependencies in pyproject.toml - Fixes ModuleNotFoundError when importing litellm core modules - orjson is used in llm_http_handler.py which is imported by core litellm

ci: Add orjson dependency to CircleCI config

04b6d8b

perf(proxy): use orjson in PrismaClient for 3-5x faster DB writes

94026fb

- Optimize jsonify_object() (24 call sites across codebase) - Optimize get_request_status() for metadata parsing Reduces CPU usage and improves database write latency.

perf(router): optimize prompt caching with orjson

a4e3cda

- serialize_object() uses orjson for 3-5x faster dict serialization - get_prompt_caching_cache_key() skips encode/decode cycle - Maintains cache key compatibility

perf: optimize safe_json_loads with orjson for 3-5x faster parsing

75154b7

perf: optimize safe_dumps with orjson for 3-5x faster serialization

aa4a8ba

- Maintains circular reference detection - Preserves default=str fallback behavior - Used in 15+ files across proxy and integrations

perf(cache): optimize S3 cache with orjson

641f9a1

- set_cache: orjson.dumps() returns bytes directly for S3 - get_cache: orjson.loads() parses bytes without decode step - 3-5x faster serialization + skips UTF-8 encode/decode

AlexsanderHamir requested a review from ishaan-jaff October 25, 2025 00:27

AlexsanderHamir force-pushed the litellm_mem_leak_on_db_failure branch from 60daf00 to 22557cf Compare October 25, 2025 00:30

vercel bot deployed to Preview October 25, 2025 00:32 View deployment

AlexsanderHamir force-pushed the litellm_oct_alexsander_staging_three branch 6 times, most recently from 305516f to 33db244 Compare October 25, 2025 20:11

Merge branch 'litellm_oct_alexsander_staging_three' into litellm_mem_…

7814742

…leak_on_db_failure

vercel bot deployed to Preview October 25, 2025 21:02 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: re-queue failed DB writes with bounded queue cap #15923

fix: re-queue failed DB writes with bounded queue cap #15923

AlexsanderHamir commented Oct 25, 2025 •

edited

Loading

Uh oh!

vercel bot commented Oct 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix: re-queue failed DB writes with bounded queue cap #15923

Are you sure you want to change the base?

fix: re-queue failed DB writes with bounded queue cap #15923

Conversation

AlexsanderHamir commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Title

Relevant issues

Pre-Submission checklist

Type

Changes

Impact

Uh oh!

vercel bot commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AlexsanderHamir commented Oct 25, 2025 •

edited

Loading

vercel bot commented Oct 25, 2025 •

edited

Loading