Conversation
|
Tested locally using llm-proxy benchmark suite: |
|
I did not yet verify streaming support and other stuff, but it's generally working with completions. Also the cloudwatch logging is not yet verified. I'll roll this out to AWS and then give it a full test later on. |
There was a problem hiding this comment.
Pull request overview
Implements the first MVP of openai-proxy: a standalone Rack + Puma Ruby service that mints short-lived proxy tokens per project and transparently proxies /v1/* requests to OpenAI, with Redis hot-path helpers, optional response caching, and async CloudWatch usage shipping.
Changes:
- Add core proxy application: routing, management endpoints (projects + token minting), and transparent
/v1/*forwarding (buffered + streaming). - Add persistence + security primitives: MySQL repositories for projects/tokens and AES-256-GCM encryption for stored upstream API keys.
- Add hot-path + ops tooling: Redis token cache, Redis HTTP response cache, CloudWatch usage worker/sink, Docker/Compose, CI workflows, RuboCop, and RSpec (unit + integration + optional real-API smoke).
Reviewed changes
Copilot reviewed 61 out of 62 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| spec/worker_spec.rb | Unit coverage for observability worker batching/shutdown behavior. |
| spec/usage_queue_spec.rb | Unit coverage for Redis-backed usage queue push/pop_batch. |
| spec/usage_event_builder_spec.rb | Unit coverage for extracting usage from JSON + SSE responses. |
| spec/token_validator_spec.rb | Unit coverage for token validation error cases and cache warming. |
| spec/token_repository_spec.rb | Unit coverage for token persistence and schema bootstrapping. |
| spec/token_generator_spec.rb | Unit coverage for token format/validation. |
| spec/token_cache_spec.rb | Unit coverage for Redis token caching TTL/serialization. |
| spec/support/collecting_usage_queue.rb | Test helper queue implementation for event capture assertions. |
| spec/streaming_body_spec.rb | Unit coverage for StreamingBody chunk yielding and close semantics. |
| spec/spec_helper.rb | RSpec + SimpleCov configuration (including branch coverage). |
| spec/response_cache_spec.rb | Unit coverage for Redis-backed response cache (entry + alias behavior). |
| spec/proxy_spec.rb | Proxy behavior tests: JSON, streaming, caching, and upstream failures. |
| spec/project_repository_spec.rb | Unit coverage for project persistence and encryption-at-rest expectations. |
| spec/project_record_spec.rb | Unit coverage for API key obfuscation helper. |
| spec/project_api_key_cipher_spec.rb | Unit coverage for AES-GCM encrypt/decrypt and plaintext pass-through. |
| spec/openai_proxy_spec.rb | App graph construction test ensuring singleton build + dependency wiring. |
| spec/integration/real_openai_spec.rb | Optional real-OpenAI smoke test for end-to-end proxying. |
| spec/integration/compose_stack_spec.rb | Compose-backed integration test for full proxy flow and caching. |
| spec/config_spec.rb | Unit coverage for env-based config parsing and validation. |
| spec/cloudwatch_log_sink_spec.rb | Unit coverage for CloudWatch sink enablement/stream handling/retries. |
| spec/cache_helpers_spec.rb | Unit coverage for cache-control parsing and cache key stability. |
| spec/application_spec.rb | Unit coverage for Rack app routing/auth/validation/proxy dispatch. |
| Rakefile | Adds default RSpec rake task. |
| openai_proxy.gemspec | Defines gem metadata and runtime dependencies. |
| Makefile | Developer commands for test/coverage/lint/run and syntax checks. |
| lib/openai_proxy/version.rb | Introduces gem version constant. |
| lib/openai_proxy/token_validator.rb | Token validation with cache + repository lookup and error codes. |
| lib/openai_proxy/token_repository.rb | Token persistence, lookup join, and schema bootstrap. |
| lib/openai_proxy/token_record.rb | Token record struct with expiry and cache TTL helpers. |
| lib/openai_proxy/token_generator.rb | Token generation + format validation. |
| lib/openai_proxy/token_cache.rb | Redis token cache serialization/TTL behavior. |
| lib/openai_proxy/streaming_body.rb | Streaming Rack body backed by queue + worker thread. |
| lib/openai_proxy/response_cache.rb | Redis response cache (entry + alias indirection). |
| lib/openai_proxy/proxy.rb | Core upstream forwarding (buffered + streaming), caching, usage capture. |
| lib/openai_proxy/project_repository.rb | Project persistence and API key encryption integration. |
| lib/openai_proxy/project_record.rb | Project record struct with API key obfuscation. |
| lib/openai_proxy/project_api_key_cipher.rb | AES-256-GCM encryption/decryption for stored upstream keys. |
| lib/openai_proxy/observability/worker.rb | Background worker loop draining Redis usage queue to sink. |
| lib/openai_proxy/observability/usage_queue.rb | Redis list-based queue implementation for usage events. |
| lib/openai_proxy/observability/usage_event_builder.rb | Builds usage events from request/response (incl. SSE parsing). |
| lib/openai_proxy/observability/cloudwatch_log_sink.rb | CloudWatch Logs sink implementation for publishing usage events. |
| lib/openai_proxy/log_sanitizer.rb | Redaction helper for logs (Bearer/sk-* patterns). |
| lib/openai_proxy/config.rb | Environment-driven configuration (timeouts, cache, limits, etc.). |
| lib/openai_proxy/cache_helpers.rb | Cache-control parsing, TTL decisions, and stable cache key helpers. |
| lib/openai_proxy/application.rb | Rack application routing for health, management API, and proxying. |
| lib/openai_proxy.rb | Top-level require + dependency graph construction (DB/Redis/worker/proxy). |
| Gemfile.lock | Locks dependency versions for the application/gem. |
| Gemfile | Declares dependencies for runtime and development/test groups. |
| exe/openai_proxy | CLI entrypoint to run the Rack app via Rackup::Server. |
| Dockerfile | Container build for running the proxy under Puma. |
| docker-compose.yml | Local stack (proxy + MySQL + Redis) with health checks and env wiring. |
| docker-compose.integration.yml | Integration override using an upstream echo server + cache enabled. |
| db/schema.sql | MySQL schema for projects and tokens tables. |
| config/puma.rb | Puma runtime configuration (bind/port, threads, workers, preload). |
| config.ru | Rack config to run the application and shutdown resources at exit. |
| .rubocop.yml | RuboCop configuration for Ruby 3.3 + RSpec/Performance cops. |
| .rspec | RSpec defaults (require helper, documentation format). |
| .github/workflows/test.yml | CI: unit (coverage) + compose-backed integration job. |
| .github/workflows/release.yml | CI: stable-tag gated GitHub Release creation. |
| .github/workflows/lint.yml | CI: RuboCop + syntax checks. |
| .github/workflows/docker.yml | CI: docker build/push workflow with tag sanitization logic. |
| .github/scripts/release-tag.sh | Tag classification/version extraction helper for releases. |
|
I like the overall direction here — Rack + Puma, Sequel/MySQL for durable state, Redis for hot-path cache, and off-path usage shipping all make sense for the MVP. That said, I think the scope may be a bit too broad for a first production-ready cut. The two areas that make me want to narrow scope are:
A smaller concern, but worth noting: • the in-process CloudWatch worker is okay for MVP, but if we expect multiple Puma workers or stricter delivery guarantees, we may eventually want to split that into a separate worker process. If the goal is to get the narrowest reliable replacement shipped quickly, I’d strongly consider trimming the first version down to: • transparent /v1/* proxying • project + token management • MySQL-backed persistence • Redis-backed token cache • async usage queueing …and leave response caching / more advanced operational behavior for a follow-up PR. |
|
Just a comment, since I saw the versions in the README: We should be forward looking and already use Ruby 4 and MySQL 8.4 here. |
|
I would split the observability discussion into two parts.
So the simplification I would recommend is:
That gives us a simpler and easier-to-defend MVP:
Separately, for the profiling work:
|
|
During iteration on this PR, earlier runtime assumptions around Redis were dropped in favor of an in-process cache and in-memory observability queue Some commits also reflect narrowing the MVP toward a smaller hot-path and simpler runtime model That context is still relevant when reading the branch history, but it is easier to keep it in comments than in the main PR description. |
|
Updated benchmark after moving away from redis and landing profiling optimizations: |
Summary
Implement the first production-shaped
openai-proxyMVP as a standalone Ruby OpenAI proxy.The scope is intentionally narrow:
/v1/*Changes
/v1/*proxy behaviorllm-proxy benchmarkcompatibility via upstream timing headersArchitectural Direction
Testing
bundle exec rspecmake lintmake coveragenpm run buildinsofatutor/.cobain/cdkfor the matching Cobain stack cleanupllm-proxy benchmarkagainst the compose deploymentNotes
mainalready contains the initial repo bootstrap; this PR contains the actual implementation.sofatutorCobain stack was updated to match the currentopenaiproxydeployment assumptions.