Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup model routing config and plan routing to o1 #6189

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

ryanhoangt
Copy link
Contributor

@ryanhoangt ryanhoangt commented Jan 10, 2025

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

This PR is to:

  • Setup config for model routing-related features.
  • Implement a prototype for routing to reasoning models if appropriate. The criteria are based on this paper.
Screenshot 2025-01-10 at 17 22 40

Link of any specific issues this addresses

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! This is a great start for model routing and LGTM!

Router that routes the prompt that is judged by a LLM as complex and requires a step-by-step plan.
"""

JUDGE_MODEL = 'gpt-4o'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be interesting to see if we can experiment with cheaper model for that 🤔

* Translating high-level requirements into detailed implementation steps and ensuring consistency.

=== BEGIN USER MESSAGE ===
{message}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also experiment sending O1 with the last 5/10 action/observation 🤔 in case there's some deep reasoning required to figure out the error, etc.

)

# Replace the model with the reasoning model
kwargs['model'] = self.model_routing_config.reasoning_model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is model enough, or also: custom provider, base URL?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could design the reasoning model not as a part of an LLM instance, but as a second LLM instance in the agent?

Copy link
Contributor Author

@ryanhoangt ryanhoangt Jan 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is model enough, or also: custom provider, base URL?

Yeah, I think we also need to allow user to set these, especially if they don't use via a llm proxy 🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using [llm.reasoning_model] will do it implicitly!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enyst I've refactored it to use separate LLM instances, can you have a look again to see if it matches your suggestion?

config.template.toml Outdated Show resolved Hide resolved
Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so happy to see this, thank you! I do think we are missing some minimal framework to experiment with reasoning models.

About the way to choose another model:
We already have the ability to choose, configure, and use a random model, for example in evals: we can write the model configuration in toml, in a custom named LLM config section, [llm.o1], load it with an utility function, and instantiate an LLM from it.

We can use that here. Names are user-defined, and we can, if we want, set in stone a particular name for the reasoning model, e.g. [llm.reasoning_model], or [llm.oh_reasoning_model], or [llm.blueberry] (or strawberry for that matter), whatever name.

@mamoodi
Copy link
Collaborator

mamoodi commented Feb 3, 2025

@ryanhoangt gentle ping here in case this fell of the radar.


[llm.reasoning_model]
model = "o1"
api_key = ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might have here a little too much configurability 😅

It's perfectly fine if we reserve some names for our own features. So the names (of the llm configs) don't need to be configurable themselves, they mean what we say they mean.

We did that with draft_llm:

We can reserve the names reasoning_model and reasoning_judge_model for the reasoning model routing feature, and use them freely as necessary in the code. So we don't need these lines:

reasoning_llm_config_name = 'reasoning_model'
judge_llm_config_name = 'judge_model'

That will also simplify the code below, starting from reading these configs in llm_config.py: I think we don't need to do anything there? They'll be read like any other named configs. And it will save us quite a bit of code complexity elsewhere too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might have here a little too much configurability 😅

Yeah I'm also feeling this, I'm implementing this and also think a bit about how to support the Notdiamond router one, where we can train a custom router on a set of selected LLMs and hence the llm config names are not fixed like the two above. But in this case indeed choosing a reserved name would make more sense, we probably don't need it to be configurable. I'll try to change that.

Copy link
Collaborator

@enyst enyst Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I hear you, I'm thinking about Notdiamond too, and about the litellm option - there's a routing feature in litellm which we have tried twice in the past to use, and it proved too much complexity when it doesn't actually support one of the most important things we wanted from it (fallbacks/retries by reading the providers' retry-after headers). Maybe we will look again at it (3rd time's the charm?) or not.

Anyway, there seem to be two ways we can take on the idea of future routing:

  • ignore it. We have a full feature here, we implement it as necessary, nice enough but we don't necessarily need all the building blocks we're guessing we will need for the most generalizable thing. Cross that bridge when we come to it. (we don't even know exactly what they will need, do we?)
  • keep an abstract class for routing config, and not a lot more. Maybe the way we took with condensers is a relevant example here: the configs for that do share an ABC, but the subclasses don't have the same attributes, and that's fine. Each will be configured as it needs, maybe with nothing in its config (the NoOpCondenser), or maybe with a bunch of attributes (max_size, whatever, for some specialized condensers which really need a config of their own). Again, cross that last bridge when we come to it.

OK, I started by saying there are two ways, but idk, maybe they're almost the same today. 😅

This routing feature, in this PR, does need to be enabled, so as long as it's enabled, it can do its thing IMHO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants