You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a way that same questions are not asked again like by maintaining a local or session memory to avoid infinite loops?
how does the sdk handle context window? for example, if I made a tool call which fetched data whose size exceeds input context window, will the sdk auto summarise the result and make next llm call or will it break here?
Example - openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid 'input[2].output': string too long. Expected a string with maximum length 256000, but got a string with length 288701 instead.", 'type': 'invalid_request_error', 'param': 'input[2].output', 'code': 'string_above_max_length'}}
getting the above error, but it is actually possible to summarise and send the response instead
The text was updated successfully, but these errors were encountered:
There is no way to do this. Because it's not something hardcoded which you may save in the cache and get it again whenever required. Because two users may be asking the same questions in different wording. One way to do this is: We may save the FAQs in a separate vector or graph db, and whenever a query is sent, we may check the query's similarity with the existing FAQ's and retrieve if some similar one is found. But its still much overhead.
You can handle it at tool level. Apply a check there if the output string exceed that 256000 limit, you may trim or summarize it.
Its not an error at SDK level. It's an error at LLM level. The LLM is not allowing a tool to return a string exceeding the limit bcz it goes out of its context window. Another way is to use an LLM with a longer context window like Gemini series.
This SDK doesn't include any tools to trim the context window - we're trying to keep it lightweight, and there isn't one universal way to do that. Some options for you are:
Use an external memory service to provide history context.
When you do result.to_input_list(), trim older messages
Every so often, summarize the history. When you send inputs to the Runner, send at most N messages + the summary.
Question
Example -
openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid 'input[2].output': string too long. Expected a string with maximum length 256000, but got a string with length 288701 instead.", 'type': 'invalid_request_error', 'param': 'input[2].output', 'code': 'string_above_max_length'}}
getting the above error, but it is actually possible to summarise and send the response instead
The text was updated successfully, but these errors were encountered: