You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The idea here is that the framework would detect when the stop reason indicates the maximum output token limit is hit, and then continuously re-prompts the model with the correct context such that the final structured result can still be parsed from the output.
prompt #"
{_.role("user")} {{prompt}}
{_.role("assistant")} {{Attempt #1 Raw LLM Reply}}
{_.role("user")} Continue your response
{{ ctx.output_format }}
"#
And the parser merges the json responses of all n attempts.
This is sorta doable by modifying the baml_client files to return the raw response and checking the finish reason but if this was implemented in baml more people might find it useful
Sidd — 12/31/24, 9:56 AM
Its meant to operate like the "Continue generating" button in ChatGPT but for structured responses
Vaibhav — 1/2/25, 11:53 AM
thats a great idea 🙂 The reason we haven't done this yet is because the merging is actually a bit more complicated than we predicted. We tried this and found:
the LLM doesn't always continue trivially:
e.g.:
message 1:
<a bunch of previous outputs>
"foo": [ "and then this
message 2:
Continuing from where i left off:
would lead to something good"]
}
or another message 2:
"foo": [ "and then this would lead to something good"]
}
so we likely will need another parser to handle that and as of right now didn't have the bandwidth for thsi algorithm! (but we will likely cover something here as we have some ideas for how to resolve this). Funnily this is very close to the leetcode problem of: "maximal substring".
Though there are scenarios where the LLM outputs the key and then diverges from the previous message.
The idea here is that the framework would detect when the stop reason indicates the maximum output token limit is hit, and then continuously re-prompts the model with the correct context such that the final structured result can still be parsed from the output.
Similar to the Aider Infinite Output feature:
https://aider.chat/docs/more/infinite-output.html
The text was updated successfully, but these errors were encountered: