Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local memory and context window length #177

Open
raunak-kondiboyina opened this issue Mar 15, 2025 · 3 comments
Open

Local memory and context window length #177

raunak-kondiboyina opened this issue Mar 15, 2025 · 3 comments
Labels
question Question about using the SDK

Comments

@raunak-kondiboyina
Copy link

raunak-kondiboyina commented Mar 15, 2025

Question

  1. Is there a way that same questions are not asked again like by maintaining a local or session memory to avoid infinite loops?
  2. how does the sdk handle context window? for example, if I made a tool call which fetched data whose size exceeds input context window, will the sdk auto summarise the result and make next llm call or will it break here?
    Example -
    openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid 'input[2].output': string too long. Expected a string with maximum length 256000, but got a string with length 288701 instead.", 'type': 'invalid_request_error', 'param': 'input[2].output', 'code': 'string_above_max_length'}}

getting the above error, but it is actually possible to summarise and send the response instead

@raunak-kondiboyina raunak-kondiboyina added the question Question about using the SDK label Mar 15, 2025
@Muhammadzainattiq
Copy link

Muhammadzainattiq commented Mar 16, 2025

  1. There is no way to do this. Because it's not something hardcoded which you may save in the cache and get it again whenever required. Because two users may be asking the same questions in different wording. One way to do this is: We may save the FAQs in a separate vector or graph db, and whenever a query is sent, we may check the query's similarity with the existing FAQ's and retrieve if some similar one is found. But its still much overhead.

  2. You can handle it at tool level. Apply a check there if the output string exceed that 256000 limit, you may trim or summarize it.
    Its not an error at SDK level. It's an error at LLM level. The LLM is not allowing a tool to return a string exceeding the limit bcz it goes out of its context window. Another way is to use an LLM with a longer context window like Gemini series.

@rm-openai
Copy link
Collaborator

This SDK doesn't include any tools to trim the context window - we're trying to keep it lightweight, and there isn't one universal way to do that. Some options for you are:

  1. Use an external memory service to provide history context.
  2. When you do result.to_input_list(), trim older messages
  3. Every so often, summarize the history. When you send inputs to the Runner, send at most N messages + the summary.

@raunak-kondiboyina
Copy link
Author

  1. Use an external memory service to provide history context

can you recommend any provider?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question about using the SDK
Projects
None yet
Development

No branches or pull requests

3 participants