Skip to content

Conversation

@falquaddoomi
Copy link
Collaborator

This PR generalizes GPT3CompletionModel to work with API clients for other model providers. The class now takes a model_provider string parameter, which must be a valid key in the manubot_ai_editor.models.MODEL_PROVIDERS dictionary. Explicit references to OpenAI have been generalized to apply to other model providers, e.g. the openai_api_key parameter is now just api_key.

GPT3CompletionModel now supports Anthropic as a second model provider, and more can be added by extending the MODEL_PROVIDERS dict mentioned previously.

The PR modifies the "cost" end-to-end test tests.test_prompt_config.test_prompts_apply_gpt3 to also check Anthropic. To run the tests against both OpenAI and Anthropic, be sure that you've exported both OPENAI_API_KEY and ANTHROPIC_API_KEY with valid API keys for each, then run poetry run pytest --runcost to run the end-to-end tests.

End-to-end test tweaks: Note that the "cost" test always has the potential to break, since the LLM doesn't always obey the prompt's request to insert a special keyword into the text. This morning, the OpenAI test was unable to add "bottle" to the "abstract" section, so I changed it to "violin", which appeared to pass. Also, it was inserting the keyword "breakdown" as "break down", so I modified the test to remove the spaces in the response before checking for the keyword.

Documentation: I've gone through the README and tried to tweak it to explain that we now support multiple model providers, but it may require further tweaking. Also, I'm unsure if "model provider" is the preferred term for companies like OpenAI and Anthropic that provide APIs to query LLMs, or if we should use something else; feedback appreciated!

README.md Outdated
1. If you haven't already, [make an OpenAI account](https://openai.com/api/) and [create an API key](https://platform.openai.com/api-keys).
1. In your fork's "⚙️ Settings" tab, make a new Actions repository secret with the name `OPENAI_API_KEY` and paste in your API key as the secret.
1. If you haven't already, follow the directions above to create an account and get an API key for your chosen model provider.
1. In your fork's "⚙️ Settings" tab, make a new Actions repository secret with the name `<PROVIDER>_API_KEY` and paste in your API key as the secret. Replace `<PROVIDER>`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to make a PR to add the Anthropic key to the rootstock workflow here:
https://github.com/manubot/rootstock/blob/main/.github/workflows/ai-revision.yaml#L59

If at some point in the future we theoretically support like a dozen or more services, maybe we just instruct the user to update their ai-revision workflow accordingly for whatever services they're using.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point; I've converted this PR into a draft until I figure out the implications upstream, including the one you raised. I'm wondering if we should relax the requirement that <PROVIDER>_API_KEY exists and has a non-empty value for every provider, and just check that it's valid when we actually use it to query the API.

I don't know how many services we'll end up providing, but ideally we won't have to make PRs in multiple repos to support the changes going forward. Let me think on it; perhaps we can take in a value in a structured format from rootstock for all the AI Editor options, and the definition of that format can be in this repo, too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take care of that small rootstock PR. Per our discussion, we'll add:

  • comment above workflow step saying something like "duplicate step as necessary to use different providers"
  • rename "open ai key" var to just "ai key"
  • add provider env var

@falquaddoomi falquaddoomi marked this pull request as draft December 18, 2024 20:56
Copy link
Collaborator

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job, wanted to add some comments in case they're helpful along the journey here.

support whichever model providers LangChain supports. That said, we currently support OpenAI and Anthropic models only,
and are working to add support for other model providers.

When using OpenAI models, [our evaluations](https://github.com/pivlab/manubot-ai-editor-evals) show that `gpt-4-turbo`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly outside the bounds of this PR: I wondered if versioning the evals could make sense (perhaps through a DOI per finding or maybe through the poster which was shared). There could come a time (probably sooner than we think) that GPT-4-Turbo isn't available or relevant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point; I wonder if we should move the statement about which model was best in evaluation to the https://github.com/pivlab/manubot-ai-editor-evals repo, so that it can be updated without having to keep this repo up to date as well. I suppose @vincerubinetti and @miltondp might have opinions there, since they're the primary contributors on the evals repo.

@falquaddoomi falquaddoomi marked this pull request as ready for review January 15, 2025 20:50
@falquaddoomi
Copy link
Collaborator Author

We need to figure out how we're going to handle the additional API key environment variables for new providers, since they require updates to rootstock as @vincerubinetti mentioned, and might quickly get unmanageable as the number of providers we support grows.

I'd be in favor of resolving the API key like so:

  • if a provider-specific API key is supplied, e.g. ANTHROPIC_API_KEY, it'll be used with that provider
  • if the provider-specific API key isn't found, a generic one, e.g. PROVIDER_API_KEY will be checked
  • if that can't be found, either, and the provider requires a key it'll throw an error

Happy to hear differing opinions, of course!

@falquaddoomi falquaddoomi marked this pull request as draft January 15, 2025 21:17
@falquaddoomi
Copy link
Collaborator Author

falquaddoomi commented Jan 29, 2025

Regarding the API key discussion: I've started to pull the logic for validating the API key out of the GPT3CompletionModel constructor, so that instances of the class can be created without an API key. Instead, the API key resolution will occur right before a live API is actually used, i.e. in the GPT3CompletionModel.revise_paragraph() method, and again only if the provider requires a key -- the local LLM via Ollama provider, for instance, won't require one.

As we discussed, the tool will prioritize provider-specific API keys, falling back to PROVIDER_API_KEY if a provider-specific key isn't found. In cases where the provider requires a key but none exists, we'll throw an error, but it'll occur at inference time rather than at the model's construction.

Any input on the above is welcome, but for now I'll assume that we're in agreement and continue to work on implementation.

@vincerubinetti
Copy link
Collaborator

Reminder to me to do the rootstock PR when appropriate. Should be a quick change.

@falquaddoomi falquaddoomi force-pushed the langchain-anthropic-integration branch from e5ef52b to e9a34e5 Compare February 28, 2025 19:11
@falquaddoomi falquaddoomi marked this pull request as ready for review February 28, 2025 19:15

return [model.id for model in models.data]

except openai.APIError as ex:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the conditions under which this exception might occur? I couldn't tell just from looking, so asking to make sure I understand and if necessary, suggest making docs to this effect. Mostly I ask because the local JSON reference seems like something that could get hard to manage over time (older models, newer models, renamings, etc - we'd be beholden to all data changes upstream). Comment also stands for the Anthropic provider class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most likely reason that exception would be thrown is if you don't have a valid API key for the provider; I've added some comments to that effect in the exception handler, so thanks for the clarifying question.

Just to explain things a bit, the reason I added the local model list at all is that, for some reason, you have to have a valid API key to even get the list of models from these providers. The check for whether the specified language model is included in the provider's model list occurs in the GPT3CompletionModel constructor, and thus is shared by many tests that otherwise don't actually query the APIs and thus don't need valid keys. Since we can't assume we have valid API keys in any tests except the runcost-decorated ones, I came up with this mostly to shore up the tests.

I agree that the baked-in model list isn't ideal, but I can somewhat justify it since the provider model lists change maybe two or three times a year and they're only used in cases where the provider API can't be contacted (which, if they were planning to actually use the providers, wouldn't be the case).

I tried to make it not too onerous to update, too: calling persist_provider_model_engines() will query the providers for their latest models and save the model list file. IMO all that's needed is the list of models at the time of the snapshot, not any other information about which models were added, renamed, etc. We could include this as a step in the release process, too, with the API keys needed to make it work coming from the repo secrets.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, all the model list caching has been removed; after thinking on it, it's just another thing to maintain, and it's only (kind of) needed for the tests.

@d33bs d33bs self-requested a review April 16, 2025 19:16
pyproject.toml Outdated
]
packages = [ { include = "manubot_ai_editor", from = "libs" } ]
include = [
"manubot_ai_editor/ref/*.json",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checking: does this need to be updated to the new location?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the tests aren't included in the package at all, and since the model list cache file is just for testing, I assumed it shouldn't be included either.

It does beg the question of whether we should include the tests, though. I don't write a lot of packages so I'm unaware of what the norm is, but perhaps we should do some research and see if that's something we want to add, and if so perhaps only for source builds.

with provider_model_engine_json.open("r") as f:
provider_model_engines = json.load(f)

@classmethod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This made me wonder: will this register properly to the class given it's defined outside of a class?

Copy link
Collaborator Author

@falquaddoomi falquaddoomi Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The short answer is yes: Python functions, regardless of whether they're invoked as regular functions or as methods in a class, close over the environment in which they're declared.

In this case the environment includes the locals within patch_model_list_cache(). After its definition in that environment, in whatever context cached_model_list_retriever is invoked it'll have access to those locals, including provider_model_engines.

EDIT: Also, you don't have to take my word for it; there are tests that use it which pass, indicating provider_model_engines is in scope when the mocked function is invoked.

@falquaddoomi
Copy link
Collaborator Author

Since all the tests are passing and I have one approval, I'm going to assume this is ok to merge.

Once this is merged and we've updated PyPI so that it's included when installing manubot[ai-rev], I'll modify manubot/rootstock#522 to remove its explicit installation of this branch PR and un-draft it so it can be reviewed and merged, too.

@falquaddoomi falquaddoomi merged commit 4631e30 into main Apr 23, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants