Generalizes GPT3CompletionModel to work with other providers, adds Anthropic #71

falquaddoomi · 2024-12-18T18:11:46Z

This PR generalizes GPT3CompletionModel to work with API clients for other model providers. The class now takes a model_provider string parameter, which must be a valid key in the manubot_ai_editor.models.MODEL_PROVIDERS dictionary. Explicit references to OpenAI have been generalized to apply to other model providers, e.g. the openai_api_key parameter is now just api_key.

GPT3CompletionModel now supports Anthropic as a second model provider, and more can be added by extending the MODEL_PROVIDERS dict mentioned previously.

The PR modifies the "cost" end-to-end test tests.test_prompt_config.test_prompts_apply_gpt3 to also check Anthropic. To run the tests against both OpenAI and Anthropic, be sure that you've exported both OPENAI_API_KEY and ANTHROPIC_API_KEY with valid API keys for each, then run poetry run pytest --runcost to run the end-to-end tests.

End-to-end test tweaks: Note that the "cost" test always has the potential to break, since the LLM doesn't always obey the prompt's request to insert a special keyword into the text. This morning, the OpenAI test was unable to add "bottle" to the "abstract" section, so I changed it to "violin", which appeared to pass. Also, it was inserting the keyword "breakdown" as "break down", so I modified the test to remove the spaces in the response before checking for the keyword.

Documentation: I've gone through the README and tried to tweak it to explain that we now support multiple model providers, but it may require further tweaking. Also, I'm unsure if "model provider" is the preferred term for companies like OpenAI and Anthropic that provide APIs to query LLMs, or if we should use something else; feedback appreciated!

…der, updated docs and tests accordingly.

…ffecting later tests

…oved spaces in checked response to identify replacement 'break down' as keyword 'breakdown'.

vincerubinetti · 2024-12-18T20:54:41Z

README.md

-1. If you haven't already, [make an OpenAI account](https://openai.com/api/) and [create an API key](https://platform.openai.com/api-keys).
-1. In your fork's "⚙️ Settings" tab, make a new Actions repository secret with the name `OPENAI_API_KEY` and paste in your API key as the secret.
+1. If you haven't already, follow the directions above to create an account and get an API key for your chosen model provider.
+1. In your fork's "⚙️ Settings" tab, make a new Actions repository secret with the name `<PROVIDER>_API_KEY` and paste in your API key as the secret. Replace `<PROVIDER>`


We might want to make a PR to add the Anthropic key to the rootstock workflow here:
https://github.com/manubot/rootstock/blob/main/.github/workflows/ai-revision.yaml#L59

If at some point in the future we theoretically support like a dozen or more services, maybe we just instruct the user to update their ai-revision workflow accordingly for whatever services they're using.

Excellent point; I've converted this PR into a draft until I figure out the implications upstream, including the one you raised. I'm wondering if we should relax the requirement that <PROVIDER>_API_KEY exists and has a non-empty value for every provider, and just check that it's valid when we actually use it to query the API.

I don't know how many services we'll end up providing, but ideally we won't have to make PRs in multiple repos to support the changes going forward. Let me think on it; perhaps we can take in a value in a structured format from rootstock for all the AI Editor options, and the definition of that format can be in this repo, too.

I can take care of that small rootstock PR. Per our discussion, we'll add:

comment above workflow step saying something like "duplicate step as necessary to use different providers"

rename "open ai key" var to just "ai key"

add provider env var

d33bs

Nice job, wanted to add some comments in case they're helpful along the journey here.

d33bs · 2025-01-07T19:46:08Z

README.md

+support whichever model providers LangChain supports. That said, we currently support OpenAI and Anthropic models only,
+and are working to add support for other model providers.
+
+When using OpenAI models, [our evaluations](https://github.com/pivlab/manubot-ai-editor-evals) show that `gpt-4-turbo`


Slightly outside the bounds of this PR: I wondered if versioning the evals could make sense (perhaps through a DOI per finding or maybe through the poster which was shared). There could come a time (probably sooner than we think) that GPT-4-Turbo isn't available or relevant.

That's a good point; I wonder if we should move the statement about which model was best in evaluation to the https://github.com/pivlab/manubot-ai-editor-evals repo, so that it can be updated without having to keep this repo up to date as well. I suppose @vincerubinetti and @miltondp might have opinions there, since they're the primary contributors on the evals repo.

libs/manubot_ai_editor/models.py

tests/test_model_basics.py

Co-authored-by: Dave Bunten <[email protected]>

falquaddoomi · 2025-01-15T21:02:05Z

We need to figure out how we're going to handle the additional API key environment variables for new providers, since they require updates to rootstock as @vincerubinetti mentioned, and might quickly get unmanageable as the number of providers we support grows.

I'd be in favor of resolving the API key like so:

if a provider-specific API key is supplied, e.g. ANTHROPIC_API_KEY, it'll be used with that provider
if the provider-specific API key isn't found, a generic one, e.g. PROVIDER_API_KEY will be checked
if that can't be found, either, and the provider requires a key it'll throw an error

Happy to hear differing opinions, of course!

falquaddoomi · 2025-01-29T20:51:19Z

Regarding the API key discussion: I've started to pull the logic for validating the API key out of the GPT3CompletionModel constructor, so that instances of the class can be created without an API key. Instead, the API key resolution will occur right before a live API is actually used, i.e. in the GPT3CompletionModel.revise_paragraph() method, and again only if the provider requires a key -- the local LLM via Ollama provider, for instance, won't require one.

As we discussed, the tool will prioritize provider-specific API keys, falling back to PROVIDER_API_KEY if a provider-specific key isn't found. In cases where the provider requires a key but none exists, we'll throw an error, but it'll occur at inference time rather than at the model's construction.

Any input on the above is welcome, but for now I'll assume that we're in agreement and continue to work on implementation.

vincerubinetti · 2025-01-29T21:08:04Z

Reminder to me to do the rootstock PR when appropriate. Should be a quick change.

…MODEL_PROVIDER to be specified via the environment, LANGUAGE_MODEL to override the proverider-specific default model.

…OVIDERS dict.

…ave to specify keys for each provider.

…and provider-specific keys

CONTRIBUTING.md

libs/manubot_ai_editor/model_providers.py

d33bs · 2025-04-02T22:57:23Z

libs/manubot_ai_editor/model_providers.py

+
+                return [model.id for model in models.data]
+
+            except openai.APIError as ex:


What are the conditions under which this exception might occur? I couldn't tell just from looking, so asking to make sure I understand and if necessary, suggest making docs to this effect. Mostly I ask because the local JSON reference seems like something that could get hard to manage over time (older models, newer models, renamings, etc - we'd be beholden to all data changes upstream). Comment also stands for the Anthropic provider class.

The most likely reason that exception would be thrown is if you don't have a valid API key for the provider; I've added some comments to that effect in the exception handler, so thanks for the clarifying question.

Just to explain things a bit, the reason I added the local model list at all is that, for some reason, you have to have a valid API key to even get the list of models from these providers. The check for whether the specified language model is included in the provider's model list occurs in the GPT3CompletionModel constructor, and thus is shared by many tests that otherwise don't actually query the APIs and thus don't need valid keys. Since we can't assume we have valid API keys in any tests except the runcost-decorated ones, I came up with this mostly to shore up the tests.

I agree that the baked-in model list isn't ideal, but I can somewhat justify it since the provider model lists change maybe two or three times a year and they're only used in cases where the provider API can't be contacted (which, if they were planning to actually use the providers, wouldn't be the case).

I tried to make it not too onerous to update, too: calling persist_provider_model_engines() will query the providers for their latest models and save the model list file. IMO all that's needed is the list of models at the time of the snapshot, not any other information about which models were added, renamed, etc. We could include this as a step in the release process, too, with the API keys needed to make it work coming from the repo secrets.

FYI, all the model list caching has been removed; after thinking on it, it's just another thing to maintain, and it's only (kind of) needed for the tests.

libs/manubot_ai_editor/model_providers.py

libs/manubot_ai_editor/models.py

…fic API keys

…_models() now including provider-specific code. Caches are now provider-specific.

… Added missing env vars to docs/env-vars.md

… a logger now to warn about falling back to the model cache.

…default model is in the cached list of models

… for provider

…he rather than query APIs

…ly warns if API can't be accessed. Adds option for model provider to not allow listing of models by returning None.

CONTRIBUTING.md

d33bs · 2025-04-16T19:34:35Z

pyproject.toml

 ]
 packages = [ { include = "manubot_ai_editor", from = "libs" } ]
-include = [
-  "manubot_ai_editor/ref/*.json",


Double checking: does this need to be updated to the new location?

Currently the tests aren't included in the package at all, and since the model list cache file is just for testing, I assumed it shouldn't be included either.

It does beg the question of whether we should include the tests, though. I don't write a lot of packages so I'm unaware of what the norm is, but perhaps we should do some research and see if that's something we want to add, and if so perhaps only for source builds.

d33bs · 2025-04-16T19:37:40Z

tests/conftest.py

+    with provider_model_engine_json.open("r") as f:
+        provider_model_engines = json.load(f)
+
+    @classmethod


This made me wonder: will this register properly to the class given it's defined outside of a class?

The short answer is yes: Python functions, regardless of whether they're invoked as regular functions or as methods in a class, close over the environment in which they're declared.

In this case the environment includes the locals within patch_model_list_cache(). After its definition in that environment, in whatever context cached_model_list_retriever is invoked it'll have access to those locals, including provider_model_engines.

EDIT: Also, you don't have to take my word for it; there are tests that use it which pass, indicating provider_model_engines is in scope when the mocked function is invoked.

tests/conftest.py

tests/test_model_providers.py

tests/provider_fixtures/provider_model_engines.json

…ched to ensure that it's not in 'cost' tests

…sts that fail to retrieve model list w/bad keys. Adds pytest-antilru to remove caching effects in tests.

…ider. Updates cached model list.

falquaddoomi · 2025-04-23T16:20:53Z

Since all the tests are passing and I have one approval, I'm going to assume this is ok to merge.

Once this is merged and we've updated PyPI so that it's included when installing manubot[ai-rev], I'll modify manubot/rootstock#522 to remove its explicit installation of this branch PR and un-draft it so it can be reviewed and merged, too.

falquaddoomi added 7 commits December 18, 2024 09:49

Added langchain-anthropic dependency, updated lockfile

ececf26

Added support for other model providers. Added anthropic as 2nd provi…

20e0eb3

…der, updated docs and tests accordingly.

Fixed conflict between mock.patch.dict and env restoration that was a…

16cae0a

…ffecting later tests

Merged anthropic e2e test with existing openAI test

804c6d0

Changed abstract keyword to 'violin', since 'bottle' was failing. Rem…

cfa5000

…oved spaces in checked response to identify replacement 'break down' as keyword 'breakdown'.

Added black's updates to tests/test_prompt_config.py

e83bed7

Adds missing ANTHROPIC_API_KEY to test workflow

a82ca7b

falquaddoomi requested review from d33bs, miltondp and vincerubinetti December 18, 2024 18:31

vincerubinetti reviewed Dec 18, 2024

View reviewed changes

falquaddoomi marked this pull request as draft December 18, 2024 20:56

d33bs reviewed Jan 7, 2025

View reviewed changes

Refactors duplicated parameterize args into shared list

6c479fd

Co-authored-by: Dave Bunten <[email protected]>

falquaddoomi marked this pull request as ready for review January 15, 2025 20:50

falquaddoomi marked this pull request as draft January 15, 2025 21:17

falquaddoomi added 2 commits January 16, 2025 14:50

Tweaked tests to get the --runcost test to pass

5381893

Merge branch 'main' into langchain-anthropic-integration

6dad1e8

falquaddoomi added 9 commits February 12, 2025 15:11

Merge branch 'main' into langchain-anthropic-integration

9611616

Updated poetry lockfile

64c12d8

Merge branch 'main' into langchain-anthropic-integration

1e51b21

Adds PROVIDER_API_KEY, MODEL_PROVIDER env vars.

905496a

Allows PROVIDER_API_KEY to be used instead of provider-specifc keys, …

44d086f

…MODEL_PROVIDER to be specified via the environment, LANGUAGE_MODEL to override the proverider-specific default model.

Fleshes out GPT3CompletionModel docstring, adds comments re: MODEL_PR…

f8ae398

…OVIDERS dict.

Adds tests for generic, provider-specific API keys.

f5f7991

Updates run-tests.yml workflow to use PROVIDER_API_KEY, so we don't h…

23786da

…ave to specify keys for each provider.

Updated CONTRIBUTING, README with information about PROVIDER_API_KEY …

83df56c

…and provider-specific keys

falquaddoomi force-pushed the langchain-anthropic-integration branch from e5ef52b to e9a34e5 Compare February 28, 2025 19:11

falquaddoomi marked this pull request as ready for review February 28, 2025 19:15

falquaddoomi requested review from d33bs and vincerubinetti March 31, 2025 20:03

d33bs approved these changes Apr 3, 2025

View reviewed changes

falquaddoomi added 13 commits April 7, 2025 16:43

pre-commit-update bumped poetry to 2.1.2, ruff to v0.11.4

d9ad8ec

Adds clarification about when PROVIDER_KEY is used vs. provider-speci…

f1282b3

…fic API keys

BaseModelProvider: adds @abstract_method to, um, abstract methods

4458cf1

Factors get_models() functionality into base method, w/ _get_provider…

51fbb4e

…_models() now including provider-specific code. Caches are now provider-specific.

Added notes re: PROVIDER_API_KEY, AI_EDITOR_MODEL_PROVIDER to README.…

98b8ebc

… Added missing env vars to docs/env-vars.md

Added missing return to get_models(), cleaned up some comments. Using…

f1ef5ea

… a logger now to warn about falling back to the model cache.

Removes env effects from generic key test, adds test that provider's …

04462af

…default model is in the cached list of models

GPT3CompletionModel: if model_engine is empty str, resorts to default…

c9f8317

… for provider

Includes resolved model provider in output

fd0ea76

Fixes env purge throwing exception when env vars don't exist

5a4c063

Moves model list JSON to tests, adds get_models() mock to use the cac…

0d3281b

…he rather than query APIs

Removes model caching logic, now always consults provider API, and on…

d1ca5a3

…ly warns if API can't be accessed. Adds option for model provider to not allow listing of models by returning None.

Black formatting

4842d85

d33bs self-requested a review April 16, 2025 19:16

d33bs approved these changes Apr 16, 2025

View reviewed changes

falquaddoomi and others added 7 commits April 17, 2025 10:58

Removed empty exception text from model list fetch failure warning

d21a7af

Skips model list cache patch for 'cost' models; adds mark if it's pat…

58ee583

…ched to ensure that it's not in 'cost' tests

Adds type hint to provider params

d898c00

Adds 'cost' tests that retrieve model list from provider APIs, and te…

55b5806

…sts that fail to retrieve model list w/bad keys. Adds pytest-antilru to remove caching effects in tests.

[pre-commit.ci lite] apply automatic fixes

ff2cc31

Fixed ruff issue

b0e002e

Adds back in script to persist and retrieve model list cache per prov…

db78f0b

…ider. Updates cached model list.

falquaddoomi merged commit 4631e30 into main Apr 23, 2025
7 checks passed

falquaddoomi mentioned this pull request May 14, 2025

Write structured metadata after Manubot AI Editor run #92

Open


		return [model.id for model in models.data]

		except openai.APIError as ex:

Generalizes GPT3CompletionModel to work with other providers, adds Anthropic #71

Generalizes GPT3CompletionModel to work with other providers, adds Anthropic #71

Uh oh!

Conversation

falquaddoomi commented Dec 18, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d33bs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

falquaddoomi commented Jan 15, 2025

Uh oh!

falquaddoomi commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincerubinetti commented Jan 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

falquaddoomi Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

falquaddoomi commented Apr 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

falquaddoomi commented Jan 29, 2025 •

edited

Loading

falquaddoomi Apr 17, 2025 •

edited

Loading