- add notes here for the next release
- Add support for translation endpoint i.e. whisper models. (#59 @wtulloch)
- This also includes internal changes to support non-token based rate limiting
- Add support for specifying the
dimension
parameter in in embeddings requests fortext-embedding-3
and later models (#55 - @tanya-borisova) - Ensure that an API key is always generated if not provided (#56 - @lucashuet93)
- BREAKING CHANGE Requests for an incompatible model (e.g. chat requests for an embedding model) fail with a 400 error (#58 - @tanya-borisova)
- Terraform deployment option (#60 @mluker)
- Support for ARM architecture for local Docker builds (#32 @mluker)
- Numerous fixes and repo improvements: #24, #26, #38, #41, #42, #43, #45, #51 @martinpeck
- Migrate to current repo from previous repo
- BREAKING CHANGE:: rename
aoai-simulated-api
toaoai-api-simulator
in code (also foraoai_simulated_api
package) - BREAKING CHANGE:: update metric prefix from
aoai-simulated-api.
toaoai-api-simulator.
- BREAKING CHANGE:: rename
- Return to sliding window rate limiting. This change moves from the limits package to a custom rate-limiting implementation to address performance with sliding windows (#51)
- Update rate-limit handling for tokens based on experimentation (limited set of models currently - see #52)
- Extensibility updates
- Focus core simulator on OpenAI (moved doc intelligence generator to example extension)
- API authorization is now part of forwarders/generators to allow extensions to add their own authentication schemes. BREAKING CHANGE: If you have custom forwarders/generators they need to be updated to handle this (see examples for implementation details)
- Enable adding custom rate limiters
- Move latency calculation to generators. This allows for extensions to customise latency values. NOTE: If you have custom generators they need to be updated to handle this (see examples for implementation details)
- Add rate-limiting for replayed requests
- Add
ALLOW_UNDEFINED_OPENAI_DEPLOYMENTS
configuration option to control whether the simulator will generate responses for any deployment or only known deployments - Fix: tokens used by streaming completions were not included in token counts for rate-limits
- Token usage metrics are now split into prompt and completion tokens using metric dimensions
- BREAKING CHANGE: Token metrics have been renamed from
aoai-simulator.tokens_used
andaoai-simulator.tokens_requested
toaoai-simulator.tokens.used
andaoai-simulator.tokens.requested
for consistency with latency metric names - Dimension size for embedding deployments can now be specified in config (#39 - @MDUYN)
- Improve error info when no matching handler is found
- Fix tokens-per-minute to requests-per-minute conversion bug
- Add option to configure latency for generated responses for OpenAI endpoints
- Add
/++/config
endpoint to get and set configuration values
Initial tagged version
Includes