Skip to content

Conversation

@KaparthyReddy
Copy link

Description

Fixes #31227 - Resolves the issue where OpenAIEmbeddings exceeds OpenAI's 300,000 token per request limit, causing 400 BadRequest errors.

Problem

When embedding large document sets, LangChain would send batches containing more than 300,000 tokens in a single API request, causing this error:

openai.BadRequestError: Error code: 400 - {'error': {'message': 'Requested 673477 tokens, max 300000 tokens per request'}}

The issue occurred because:

  • The code chunks texts by embedding_ctx_length (8191 tokens per chunk)
  • Then batches chunks by chunk_size (default 1000 chunks per request)
  • But didn't check: Total tokens per batch against OpenAI's 300k limit
  • Result: 1000 chunks × 8191 tokens = 8,191,000 tokens → Exceeds limit!

Solution

This PR implements dynamic batching that respects the 300k token limit:

  1. Added constant: MAX_TOKENS_PER_REQUEST = 300000
  2. Track token counts: Calculate actual tokens for each chunk
  3. Dynamic batching: Instead of fixed chunk_size batches, accumulate chunks until approaching the 300k limit
  4. Applied to both sync and async: Fixed both _get_len_safe_embeddings and _aget_len_safe_embeddings

Changes

  • Modified langchain_openai/embeddings/base.py:
    • Added MAX_TOKENS_PER_REQUEST constant
    • Replaced fixed-size batching with token-aware dynamic batching
    • Applied to both sync (line ~478) and async (line ~527) methods
  • Added test in tests/unit_tests/embeddings/test_base.py:
    • test_embeddings_respects_token_limit() - Verifies large document sets are properly batched

Testing

All existing tests pass (280 passed, 4 xfailed, 1 xpassed).

New test verifies:

  • Large document sets (500 texts × 1000 tokens = 500k tokens) are split into multiple API calls
  • Each API call respects the 300k token limit

Usage

After this fix, users can embed large document sets without errors:

from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_text_splitters import CharacterTextSplitter

# This will now work without exceeding token limits
embeddings = OpenAIEmbeddings()
documents = CharacterTextSplitter().split_documents(large_documents)
Chroma.from_documents(documents, embeddings)

Resolves #31227

Kaparthy Reddy added 3 commits October 18, 2025 20:26
- Add strict parameter to ChatDeepSeek class
- Switch to Beta API endpoint when strict mode is enabled
- Override bind_tools method to add strict: true to tool definitions
- Add comprehensive tests for strict mode functionality

Resolves langchain-ai#32670
- Add robust fallback for response serialization when model_dump() fails
- Use model_dump_json() as fallback for non-OpenAI API responses
- Improve null choices error message with debugging information
- Add tests for vLLM-style responses and improved error messages

Fixes langchain-ai#32252
- Add MAX_TOKENS_PER_REQUEST constant (300,000 tokens)
- Implement dynamic batching in _get_len_safe_embeddings to respect token limits
- Track actual token counts per chunk and batch accordingly
- Apply same fix to async version _aget_len_safe_embeddings
- Add test to verify token limit is respected with large document sets

Fixes langchain-ai#31227
@github-actions github-actions bot added integration Related to a provider partner package integration fix labels Oct 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix integration Related to a provider partner package integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

openai: OpenAIEmbeddings does not respect token limits, causes 400 BadRequest

1 participant