Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there support for 'unlimited' output tokens? #1353

Open
barapa opened this issue Jan 21, 2025 · 1 comment
Open

Is there support for 'unlimited' output tokens? #1353

barapa opened this issue Jan 21, 2025 · 1 comment

Comments

@barapa
Copy link

barapa commented Jan 21, 2025

The idea here is that the framework would detect when the stop reason indicates the maximum output token limit is hit, and then continuously re-prompts the model with the correct context such that the final structured result can still be parsed from the output.

Similar to the Aider Infinite Output feature:
https://aider.chat/docs/more/infinite-output.html

@hellovai
Copy link
Contributor

from discord:

Sidd — 12/31/24, 9:42 AM
Feature Request: A good way to automatically handle hitting max output tokens.
I basically want to do this:
Attempt #1:

prompt #"{_.role("user")} {{prompt}}
  {{ ctx.output_format }}
"#

//LLM Response is too long and hits the model's max output token limit

Attempt #2:

prompt #"
  {_.role("user")} {{prompt}}
  {_.role("assistant")} {{Attempt #1 Raw LLM Reply}}
  {_.role("user")} Continue your response
  {{ ctx.output_format }}
"#

And the parser merges the json responses of all n attempts.
This is sorta doable by modifying the baml_client files to return the raw response and checking the finish reason but if this was implemented in baml more people might find it useful

Sidd — 12/31/24, 9:56 AM
Its meant to operate like the "Continue generating" button in ChatGPT but for structured responses

Vaibhav — 1/2/25, 11:53 AM
thats a great idea 🙂 The reason we haven't done this yet is because the merging is actually a bit more complicated than we predicted. We tried this and found:

the LLM doesn't always continue trivially:

e.g.:

message 1:

<a bunch of previous outputs>
"foo": [ "and then this

message 2:

Continuing from where i left off:
would lead to something good"]
}

or another message 2:

  "foo": [ "and then this would lead to something good"]
}

so we likely will need another parser to handle that and as of right now didn't have the bandwidth for thsi algorithm! (but we will likely cover something here as we have some ideas for how to resolve this). Funnily this is very close to the leetcode problem of: "maximal substring".

Though there are scenarios where the LLM outputs the key and then diverges from the previous message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants