-
Notifications
You must be signed in to change notification settings - Fork 16
Generalizes GPT3CompletionModel to work with other providers, adds Anthropic #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…der, updated docs and tests accordingly.
…ffecting later tests
…oved spaces in checked response to identify replacement 'break down' as keyword 'breakdown'.
README.md
Outdated
| 1. If you haven't already, [make an OpenAI account](https://openai.com/api/) and [create an API key](https://platform.openai.com/api-keys). | ||
| 1. In your fork's "⚙️ Settings" tab, make a new Actions repository secret with the name `OPENAI_API_KEY` and paste in your API key as the secret. | ||
| 1. If you haven't already, follow the directions above to create an account and get an API key for your chosen model provider. | ||
| 1. In your fork's "⚙️ Settings" tab, make a new Actions repository secret with the name `<PROVIDER>_API_KEY` and paste in your API key as the secret. Replace `<PROVIDER>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to make a PR to add the Anthropic key to the rootstock workflow here:
https://github.com/manubot/rootstock/blob/main/.github/workflows/ai-revision.yaml#L59
If at some point in the future we theoretically support like a dozen or more services, maybe we just instruct the user to update their ai-revision workflow accordingly for whatever services they're using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point; I've converted this PR into a draft until I figure out the implications upstream, including the one you raised. I'm wondering if we should relax the requirement that <PROVIDER>_API_KEY exists and has a non-empty value for every provider, and just check that it's valid when we actually use it to query the API.
I don't know how many services we'll end up providing, but ideally we won't have to make PRs in multiple repos to support the changes going forward. Let me think on it; perhaps we can take in a value in a structured format from rootstock for all the AI Editor options, and the definition of that format can be in this repo, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take care of that small rootstock PR. Per our discussion, we'll add:
- comment above workflow step saying something like "duplicate step as necessary to use different providers"
- rename "open ai key" var to just "ai key"
- add provider env var
d33bs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job, wanted to add some comments in case they're helpful along the journey here.
| support whichever model providers LangChain supports. That said, we currently support OpenAI and Anthropic models only, | ||
| and are working to add support for other model providers. | ||
|
|
||
| When using OpenAI models, [our evaluations](https://github.com/pivlab/manubot-ai-editor-evals) show that `gpt-4-turbo` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly outside the bounds of this PR: I wondered if versioning the evals could make sense (perhaps through a DOI per finding or maybe through the poster which was shared). There could come a time (probably sooner than we think) that GPT-4-Turbo isn't available or relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point; I wonder if we should move the statement about which model was best in evaluation to the https://github.com/pivlab/manubot-ai-editor-evals repo, so that it can be updated without having to keep this repo up to date as well. I suppose @vincerubinetti and @miltondp might have opinions there, since they're the primary contributors on the evals repo.
Co-authored-by: Dave Bunten <[email protected]>
|
We need to figure out how we're going to handle the additional API key environment variables for new providers, since they require updates to rootstock as @vincerubinetti mentioned, and might quickly get unmanageable as the number of providers we support grows. I'd be in favor of resolving the API key like so:
Happy to hear differing opinions, of course! |
|
Regarding the API key discussion: I've started to pull the logic for validating the API key out of the As we discussed, the tool will prioritize provider-specific API keys, falling back to Any input on the above is welcome, but for now I'll assume that we're in agreement and continue to work on implementation. |
|
Reminder to me to do the rootstock PR when appropriate. Should be a quick change. |
…MODEL_PROVIDER to be specified via the environment, LANGUAGE_MODEL to override the proverider-specific default model.
…ave to specify keys for each provider.
…and provider-specific keys
e5ef52b to
e9a34e5
Compare
|
|
||
| return [model.id for model in models.data] | ||
|
|
||
| except openai.APIError as ex: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the conditions under which this exception might occur? I couldn't tell just from looking, so asking to make sure I understand and if necessary, suggest making docs to this effect. Mostly I ask because the local JSON reference seems like something that could get hard to manage over time (older models, newer models, renamings, etc - we'd be beholden to all data changes upstream). Comment also stands for the Anthropic provider class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most likely reason that exception would be thrown is if you don't have a valid API key for the provider; I've added some comments to that effect in the exception handler, so thanks for the clarifying question.
Just to explain things a bit, the reason I added the local model list at all is that, for some reason, you have to have a valid API key to even get the list of models from these providers. The check for whether the specified language model is included in the provider's model list occurs in the GPT3CompletionModel constructor, and thus is shared by many tests that otherwise don't actually query the APIs and thus don't need valid keys. Since we can't assume we have valid API keys in any tests except the runcost-decorated ones, I came up with this mostly to shore up the tests.
I agree that the baked-in model list isn't ideal, but I can somewhat justify it since the provider model lists change maybe two or three times a year and they're only used in cases where the provider API can't be contacted (which, if they were planning to actually use the providers, wouldn't be the case).
I tried to make it not too onerous to update, too: calling persist_provider_model_engines() will query the providers for their latest models and save the model list file. IMO all that's needed is the list of models at the time of the snapshot, not any other information about which models were added, renamed, etc. We could include this as a step in the release process, too, with the API keys needed to make it work coming from the repo secrets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, all the model list caching has been removed; after thinking on it, it's just another thing to maintain, and it's only (kind of) needed for the tests.
…_models() now including provider-specific code. Caches are now provider-specific.
… Added missing env vars to docs/env-vars.md
… a logger now to warn about falling back to the model cache.
…default model is in the cached list of models
…he rather than query APIs
…ly warns if API can't be accessed. Adds option for model provider to not allow listing of models by returning None.
pyproject.toml
Outdated
| ] | ||
| packages = [ { include = "manubot_ai_editor", from = "libs" } ] | ||
| include = [ | ||
| "manubot_ai_editor/ref/*.json", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checking: does this need to be updated to the new location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the tests aren't included in the package at all, and since the model list cache file is just for testing, I assumed it shouldn't be included either.
It does beg the question of whether we should include the tests, though. I don't write a lot of packages so I'm unaware of what the norm is, but perhaps we should do some research and see if that's something we want to add, and if so perhaps only for source builds.
| with provider_model_engine_json.open("r") as f: | ||
| provider_model_engines = json.load(f) | ||
|
|
||
| @classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This made me wonder: will this register properly to the class given it's defined outside of a class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The short answer is yes: Python functions, regardless of whether they're invoked as regular functions or as methods in a class, close over the environment in which they're declared.
In this case the environment includes the locals within patch_model_list_cache(). After its definition in that environment, in whatever context cached_model_list_retriever is invoked it'll have access to those locals, including provider_model_engines.
EDIT: Also, you don't have to take my word for it; there are tests that use it which pass, indicating provider_model_engines is in scope when the mocked function is invoked.
…ched to ensure that it's not in 'cost' tests
…sts that fail to retrieve model list w/bad keys. Adds pytest-antilru to remove caching effects in tests.
…ider. Updates cached model list.
|
Since all the tests are passing and I have one approval, I'm going to assume this is ok to merge. Once this is merged and we've updated PyPI so that it's included when installing |
This PR generalizes
GPT3CompletionModelto work with API clients for other model providers. The class now takes amodel_providerstring parameter, which must be a valid key in themanubot_ai_editor.models.MODEL_PROVIDERSdictionary. Explicit references to OpenAI have been generalized to apply to other model providers, e.g. theopenai_api_keyparameter is now justapi_key.GPT3CompletionModelnow supports Anthropic as a second model provider, and more can be added by extending theMODEL_PROVIDERSdict mentioned previously.The PR modifies the "cost" end-to-end test
tests.test_prompt_config.test_prompts_apply_gpt3to also check Anthropic. To run the tests against both OpenAI and Anthropic, be sure that you've exported bothOPENAI_API_KEYandANTHROPIC_API_KEYwith valid API keys for each, then runpoetry run pytest --runcostto run the end-to-end tests.End-to-end test tweaks: Note that the "cost" test always has the potential to break, since the LLM doesn't always obey the prompt's request to insert a special keyword into the text. This morning, the OpenAI test was unable to add "bottle" to the "abstract" section, so I changed it to "violin", which appeared to pass. Also, it was inserting the keyword "breakdown" as "break down", so I modified the test to remove the spaces in the response before checking for the keyword.
Documentation: I've gone through the README and tried to tweak it to explain that we now support multiple model providers, but it may require further tweaking. Also, I'm unsure if "model provider" is the preferred term for companies like OpenAI and Anthropic that provide APIs to query LLMs, or if we should use something else; feedback appreciated!