Is there support for 'unlimited' output tokens? #1353

barapa · 2025-01-21T14:47:56Z

The idea here is that the framework would detect when the stop reason indicates the maximum output token limit is hit, and then continuously re-prompts the model with the correct context such that the final structured result can still be parsed from the output.

Similar to the Aider Infinite Output feature:
https://aider.chat/docs/more/infinite-output.html

hellovai · 2025-01-25T12:58:49Z

from discord:

Sidd — 12/31/24, 9:42 AM
Feature Request: A good way to automatically handle hitting max output tokens.
I basically want to do this:
Attempt #1:

prompt #"{_.role("user")} {{prompt}}
  {{ ctx.output_format }}
"#

//LLM Response is too long and hits the model's max output token limit

Attempt #2:

prompt #"
  {_.role("user")} {{prompt}}
  {_.role("assistant")} {{Attempt #1 Raw LLM Reply}}
  {_.role("user")} Continue your response
  {{ ctx.output_format }}
"#

And the parser merges the json responses of all n attempts.
This is sorta doable by modifying the baml_client files to return the raw response and checking the finish reason but if this was implemented in baml more people might find it useful

Sidd — 12/31/24, 9:56 AM
Its meant to operate like the "Continue generating" button in ChatGPT but for structured responses

Vaibhav — 1/2/25, 11:53 AM
thats a great idea 🙂 The reason we haven't done this yet is because the merging is actually a bit more complicated than we predicted. We tried this and found:

the LLM doesn't always continue trivially:

e.g.:

message 1:

<a bunch of previous outputs>
"foo": [ "and then this

message 2:

Continuing from where i left off:
would lead to something good"]
}

or another message 2:

  "foo": [ "and then this would lead to something good"]
}

so we likely will need another parser to handle that and as of right now didn't have the bandwidth for thsi algorithm! (but we will likely cover something here as we have some ideas for how to resolve this). Funnily this is very close to the leetcode problem of: "maximal substring".

Though there are scenarios where the LLM outputs the key and then diverges from the previous message.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there support for 'unlimited' output tokens? #1353

Is there support for 'unlimited' output tokens? #1353

barapa commented Jan 21, 2025

hellovai commented Jan 25, 2025

Is there support for 'unlimited' output tokens? #1353

Is there support for 'unlimited' output tokens? #1353

Comments

barapa commented Jan 21, 2025

hellovai commented Jan 25, 2025