Skip to content

Conversation

@AlexsanderHamir
Copy link
Collaborator

@AlexsanderHamir AlexsanderHamir commented Oct 25, 2025

Title

fix: re-queue failed DB writes with bounded queue cap

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🧹 Refactoring

Changes

  • Re-queues failed transactions for retry when DB recovers
  • Caps re-queueing at MAX_QUEUE_SIZE_BEFORE_REQUEUE (5000 items) to prevent unbounded growth during prolonged DB outages
  • Drops transactions beyond cap to prioritize stability over completeness

Impact

  • Cascading failures previously caused many queued objects to remain orphaned in memory, leading to excessive garbage collection pressure.
  • Although retries already exist for database failures, some transactions were still being lost. Re-queuing failed data increases the likelihood of successful recovery without data loss.
Screenshot 2025-10-24 at 5 45 52 PM Screenshot 2025-10-24 at 5 46 13 PM

- Add ProxyTimingMiddleware to capture request start time before auth
- Ensures overhead measurement includes authentication, rate limiting, etc.
- Update common_request_processing to use middleware timing
- Middleware positioned before auth to measure complete proxy latency
Replace json.dumps with orjson.dumps in HTTP handler to reduce
serialization latency for all LLM provider API calls.
Replace json.dumps/loads with orjson in streaming hot paths.
Saves ~350ms per 1000-chunk streaming response.
- Move orjson from optional to required dependencies in pyproject.toml
- Fixes ModuleNotFoundError when importing litellm core modules
- orjson is used in llm_http_handler.py which is imported by core litellm
- Optimize jsonify_object() (24 call sites across codebase)
- Optimize get_request_status() for metadata parsing

Reduces CPU usage and improves database write latency.
- serialize_object() uses orjson for 3-5x faster dict serialization
- get_prompt_caching_cache_key() skips encode/decode cycle
- Maintains cache key compatibility
- Maintains circular reference detection
- Preserves default=str fallback behavior
- Used in 15+ files across proxy and integrations
- set_cache: orjson.dumps() returns bytes directly for S3
- get_cache: orjson.loads() parses bytes without decode step
- 3-5x faster serialization + skips UTF-8 encode/decode
- Added @lru_cache(1024) to get_cooldown_cache_key()
- Changed all 4 locations to use the cached method instead of recreating strings
- Replaced f-strings with string concatenation for better performance

Results: get_cooldown_cache_key dropped from 47MB to a few bytes and is no
longer a top consumer. Memory leak still present.

Next: Optimize heavy memory consumers so memory leaks become more obvious.
@vercel
Copy link

vercel bot commented Oct 25, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
litellm Ready Ready Preview Comment Oct 25, 2025 9:02pm

When database writes fail, flushed queue items were being dropped, leaving
orphaned objects in memory until GC'd, adding pressure on the garbage collector.

This change:
- Re-queues failed transactions for retry when DB recovers
- Caps re-queueing at MAX_QUEUE_SIZE_BEFORE_REQUEUE (5000 items) to prevent
  unbounded growth during prolonged DB outages
- Drops transactions beyond cap to prioritize stability over completeness
@AlexsanderHamir AlexsanderHamir force-pushed the litellm_mem_leak_on_db_failure branch from 60daf00 to 22557cf Compare October 25, 2025 00:30
@AlexsanderHamir AlexsanderHamir force-pushed the litellm_oct_alexsander_staging_three branch 6 times, most recently from 305516f to 33db244 Compare October 25, 2025 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants