Implement openai-proxy MVP by mfittko · Pull Request #1 · sofatutor/openai-proxy

mfittko · 2026-03-31T18:06:52Z

Summary

Implement the first production-shaped openai-proxy MVP as a standalone Ruby OpenAI proxy.

The scope is intentionally narrow:

transparent OpenAI-compatible proxying on /v1/*
project-scoped upstream API keys stored in MySQL with AES-256-GCM encryption
short-lived proxy tokens minted through a minimal management API and CLI
per-process in-memory token validation cache on the hot path
asynchronous usage logging through an in-memory queue with CloudWatch delivery when configured, otherwise JSONL emission to an opt-in file or stdout

Changes

add the Rack + Puma service, routing, management endpoints, and transparent /v1/* proxy behavior
add the management CLI for project listing, project updates, and token minting against the same management API
add MySQL persistence for projects and tokens, plus encrypted upstream API key storage
add explicit hot-path profiling and benchmark helpers, including llm-proxy benchmark compatibility via upstream timing headers
add buffered and streaming response handling with usage extraction from JSON and SSE-style responses
add browser-facing CORS support, including configurable allowed origins and preview-host handling
add a per-process in-memory token cache
add async observability delivery with CloudWatch as the configured sink and JSONL fallback logging to file or stdout when CloudWatch is not configured
add Docker, Compose, CI workflows, RuboCop, and RSpec coverage for application, proxy, CLI, observability, and integration paths
add Ruby 4 runtime/tooling support and a multi-stage runtime image

Architectural Direction

Keep the request path minimal: validate token cheaply, resolve the upstream API key, and forward the request.
Avoid external systems on the hot path beyond MySQL fallback and the upstream OpenAI-compatible request.
Keep observability off-path, but always on: queue in memory, then ship to CloudWatch or emit structured JSON lines locally.
Do not broaden scope into admin UI, provider abstraction, or response caching in this MVP.

Testing

bundle exec rspec
make lint
make coverage
npm run build in sofatutor/.cobain/cdk for the matching Cobain stack cleanup
local benchmark validation through llm-proxy benchmark against the compose deployment

Notes

main already contains the initial repo bootstrap; this PR contains the actual implementation.
The related sofatutor Cobain stack was updated to match the current openaiproxy deployment assumptions.

mfittko · 2026-03-31T18:13:35Z

Tested locally using llm-proxy benchmark suite:

PROXY_TOKEN='sk-JHjWPxjirUU3Hl5UFfQIZw' /Users/manuelfittko/github/llm-proxy/bin/llm-proxy benchmark --base-url http://127.0.0.1:18080 --endpoint /v1/chat/completions --method POST --token-env PROXY_TOKEN --requests 1000 --concurrency 50 --json '{"model":"gpt-4.1-nano","messages":[{"role":"user","content":"Reply with one word: ping"}],"max_tokens":8}'
Requests sent: 1000, completed: 1000, failed: 1
+------------------------------------------------+
| Total requests        | 1000                   |
| Concurrency           | 50                     |
| Duration (s)          | 20.15                  |
| Success               | 999                    |
| Failed                | 1                      |
| Requests/sec          | 49.64                  |
| Avg latency           | 564.714ms              |
| Min latency           | 294.613ms              |
| Max latency           | 4.747s                 |
| p90 latency           | 733.548ms              |
| p90 mean latency      | 486.881ms              |
| Upstream latency avg  | 550.395ms              |
| Upstream latency min  | 292.097ms              |
| Upstream latency max  | 4.744s                 |
| Upstream latency p90  | 663.882ms              |
| Upstream latency p90 mean | 480.739ms              |
| Proxy latency avg     | 14.321ms               |
| Proxy latency min     | 1.281ms                |
| Proxy latency max     | 320.235ms              |
| Proxy latency p90     | 7.140ms                |
| Proxy latency p90 mean | 2.902ms                |
+------------------------------------------------+
| Response code                                  |
| 200                   | 999                    |
| Network error         | 1                      |
+------------------------------------------------+

mfittko · 2026-03-31T18:14:45Z

I did not yet verify streaming support and other stuff, but it's generally working with completions. Also the cloudwatch logging is not yet verified. I'll roll this out to AWS and then give it a full test later on.

Copilot

Pull request overview

Implements the first MVP of openai-proxy: a standalone Rack + Puma Ruby service that mints short-lived proxy tokens per project and transparently proxies /v1/* requests to OpenAI, with Redis hot-path helpers, optional response caching, and async CloudWatch usage shipping.

Changes:

Add core proxy application: routing, management endpoints (projects + token minting), and transparent /v1/* forwarding (buffered + streaming).
Add persistence + security primitives: MySQL repositories for projects/tokens and AES-256-GCM encryption for stored upstream API keys.
Add hot-path + ops tooling: Redis token cache, Redis HTTP response cache, CloudWatch usage worker/sink, Docker/Compose, CI workflows, RuboCop, and RSpec (unit + integration + optional real-API smoke).

Reviewed changes

Copilot reviewed 61 out of 62 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
spec/worker_spec.rb	Unit coverage for observability worker batching/shutdown behavior.
spec/usage_queue_spec.rb	Unit coverage for Redis-backed usage queue push/pop_batch.
spec/usage_event_builder_spec.rb	Unit coverage for extracting usage from JSON + SSE responses.
spec/token_validator_spec.rb	Unit coverage for token validation error cases and cache warming.
spec/token_repository_spec.rb	Unit coverage for token persistence and schema bootstrapping.
spec/token_generator_spec.rb	Unit coverage for token format/validation.
spec/token_cache_spec.rb	Unit coverage for Redis token caching TTL/serialization.
spec/support/collecting_usage_queue.rb	Test helper queue implementation for event capture assertions.
spec/streaming_body_spec.rb	Unit coverage for StreamingBody chunk yielding and close semantics.
spec/spec_helper.rb	RSpec + SimpleCov configuration (including branch coverage).
spec/response_cache_spec.rb	Unit coverage for Redis-backed response cache (entry + alias behavior).
spec/proxy_spec.rb	Proxy behavior tests: JSON, streaming, caching, and upstream failures.
spec/project_repository_spec.rb	Unit coverage for project persistence and encryption-at-rest expectations.
spec/project_record_spec.rb	Unit coverage for API key obfuscation helper.
spec/project_api_key_cipher_spec.rb	Unit coverage for AES-GCM encrypt/decrypt and plaintext pass-through.
spec/openai_proxy_spec.rb	App graph construction test ensuring singleton build + dependency wiring.
spec/integration/real_openai_spec.rb	Optional real-OpenAI smoke test for end-to-end proxying.
spec/integration/compose_stack_spec.rb	Compose-backed integration test for full proxy flow and caching.
spec/config_spec.rb	Unit coverage for env-based config parsing and validation.
spec/cloudwatch_log_sink_spec.rb	Unit coverage for CloudWatch sink enablement/stream handling/retries.
spec/cache_helpers_spec.rb	Unit coverage for cache-control parsing and cache key stability.
spec/application_spec.rb	Unit coverage for Rack app routing/auth/validation/proxy dispatch.
Rakefile	Adds default RSpec rake task.
openai_proxy.gemspec	Defines gem metadata and runtime dependencies.
Makefile	Developer commands for test/coverage/lint/run and syntax checks.
lib/openai_proxy/version.rb	Introduces gem version constant.
lib/openai_proxy/token_validator.rb	Token validation with cache + repository lookup and error codes.
lib/openai_proxy/token_repository.rb	Token persistence, lookup join, and schema bootstrap.
lib/openai_proxy/token_record.rb	Token record struct with expiry and cache TTL helpers.
lib/openai_proxy/token_generator.rb	Token generation + format validation.
lib/openai_proxy/token_cache.rb	Redis token cache serialization/TTL behavior.
lib/openai_proxy/streaming_body.rb	Streaming Rack body backed by queue + worker thread.
lib/openai_proxy/response_cache.rb	Redis response cache (entry + alias indirection).
lib/openai_proxy/proxy.rb	Core upstream forwarding (buffered + streaming), caching, usage capture.
lib/openai_proxy/project_repository.rb	Project persistence and API key encryption integration.
lib/openai_proxy/project_record.rb	Project record struct with API key obfuscation.
lib/openai_proxy/project_api_key_cipher.rb	AES-256-GCM encryption/decryption for stored upstream keys.
lib/openai_proxy/observability/worker.rb	Background worker loop draining Redis usage queue to sink.
lib/openai_proxy/observability/usage_queue.rb	Redis list-based queue implementation for usage events.
lib/openai_proxy/observability/usage_event_builder.rb	Builds usage events from request/response (incl. SSE parsing).
lib/openai_proxy/observability/cloudwatch_log_sink.rb	CloudWatch Logs sink implementation for publishing usage events.
lib/openai_proxy/log_sanitizer.rb	Redaction helper for logs (Bearer/sk-* patterns).
lib/openai_proxy/config.rb	Environment-driven configuration (timeouts, cache, limits, etc.).
lib/openai_proxy/cache_helpers.rb	Cache-control parsing, TTL decisions, and stable cache key helpers.
lib/openai_proxy/application.rb	Rack application routing for health, management API, and proxying.
lib/openai_proxy.rb	Top-level require + dependency graph construction (DB/Redis/worker/proxy).
Gemfile.lock	Locks dependency versions for the application/gem.
Gemfile	Declares dependencies for runtime and development/test groups.
exe/openai_proxy	CLI entrypoint to run the Rack app via Rackup::Server.
Dockerfile	Container build for running the proxy under Puma.
docker-compose.yml	Local stack (proxy + MySQL + Redis) with health checks and env wiring.
docker-compose.integration.yml	Integration override using an upstream echo server + cache enabled.
db/schema.sql	MySQL schema for projects and tokens tables.
config/puma.rb	Puma runtime configuration (bind/port, threads, workers, preload).
config.ru	Rack config to run the application and shutdown resources at exit.
.rubocop.yml	RuboCop configuration for Ruby 3.3 + RSpec/Performance cops.
.rspec	RSpec defaults (require helper, documentation format).
.github/workflows/test.yml	CI: unit (coverage) + compose-backed integration job.
.github/workflows/release.yml	CI: stable-tag gated GitHub Release creation.
.github/workflows/lint.yml	CI: RuboCop + syntax checks.
.github/workflows/docker.yml	CI: docker build/push workflow with tag sanitization logic.
.github/scripts/release-tag.sh	Tag classification/version extraction helper for releases.

lib/openai_proxy/observability/cloudwatch_log_sink.rb

lib/openai_proxy/cache_helpers.rb

lib/openai_proxy/application.rb

lib/openai_proxy/proxy.rb

lib/openai_proxy/token_cache.rb

lib/openai_proxy/token_validator.rb

mfittko · 2026-04-01T09:24:39Z

I like the overall direction here — Rack + Puma, Sequel/MySQL for durable state, Redis for hot-path cache, and off-path usage shipping all make sense for the MVP.

That said, I think the scope may be a bit too broad for a first production-ready cut. The two areas that make me want to narrow scope are:

Response caching - This adds a lot of semantic and operational surface area for an MVP whose main job is transparent proxying. I’d consider deferring response caching and keeping only token caching for now, unless we already know this is required for launch.
Schema management at app boot - Running schema setup from application code is convenient, but it mixes runtime serving with schema lifecycle. I’d prefer an explicit migration/setup step in deploys, with the app assuming the schema already exists.

A smaller concern, but worth noting:

• the in-process CloudWatch worker is okay for MVP, but if we expect multiple Puma workers or stricter delivery guarantees, we may eventually want to split that into a separate worker process.

If the goal is to get the narrowest reliable replacement shipped quickly, I’d strongly consider trimming the first version down to:

• transparent /v1/* proxying

• project + token management

• MySQL-backed persistence

• Redis-backed token cache

• async usage queueing

…and leave response caching / more advanced operational behavior for a follow-up PR.

korny · 2026-04-01T16:58:45Z

Just a comment, since I saw the versions in the README: We should be forward looking and already use Ruby 4 and MySQL 8.4 here.

mfittko · 2026-04-03T15:19:45Z

I would split the observability discussion into two parts.

What I think should stay in this MVP PR

Keep streamed usage extraction for the SSE variants we actually need to support here.
In particular, keep support for:
- /v1/responses style streams where usage arrives on response.completed
- chat-completions style streams where a later SSE chunk carries usage
That part looks justified, because otherwise streamed requests across different OpenAI-style endpoints will log usage inconsistently.

What I think can move to a follow-up PR

The fallback token-count estimation when the upstream does not emit usage at all.
The extra response-content reconstruction that exists mainly to support that estimation path.

So the simplification I would recommend is:

Keep real upstream streamed usage extraction.
Drop the estimated/fallback usage path for now, if we are comfortable with usage being absent in logs when the upstream response does not provide it.

That gives us a simpler and easier-to-defend MVP:

streamed usage is handled consistently when upstream provides it
we avoid carrying speculative estimation logic in the initial merge

Separately, for the profiling work:

keeping the x-upstream-request-start and x-upstream-request-stop response headers in lib/openai_proxy/proxy.rb seems fine
the part that still feels like follow-up scope is the broader profiler plumbing and benchmark/profiling support layered through the hot path

mfittko · 2026-04-03T16:55:25Z

During iteration on this PR, earlier runtime assumptions around Redis were dropped in favor of an in-process cache and in-memory observability queue

Some commits also reflect narrowing the MVP toward a smaller hot-path and simpler runtime model

That context is still relevant when reading the branch history, but it is easier to keep it in comments than in the main PR description.

mfittko · 2026-04-03T22:00:30Z

Updated benchmark after moving away from redis and landing profiling optimizations:

bin/llm-proxy benchmark --base-url http://127.0.0.1:18080 --endpoint /v1/chat/completions --method POST --token-env PROXY_TOKEN --requests 1000 --concurrency 50 --json '{"model":"gpt-4.1-nano","messages":[{"role":"user","content":"Reply with one word: ping"}],"max_tokens":8}'
Requests sent: 1000, completed: 1000, failed: 0
+------------------------------------------------+
| Total requests        | 1000                   |
| Concurrency           | 50                     |
| Duration (s)          | 11.88                  |
| Success               | 1000                   |
| Failed                | 0                      |
| Requests/sec          | 84.18                  |
| Avg latency           | 485.728ms              |
| Min latency           | 326.550ms              |
| Max latency           | 1.875s                 |
| p90 latency           | 574.351ms              |
| p90 mean latency      | 452.753ms              |
| Upstream latency avg  | 479.429ms              |
| Upstream latency min  | 325.697ms              |
| Upstream latency max  | 1.874s                 |
| Upstream latency p90  | 571.485ms              |
| Upstream latency p90 mean | 448.889ms              |
| Proxy latency avg     | 6.300ms                |
| Proxy latency min     | 532.055µs              |
| Proxy latency max     | 158.905ms              |
| Proxy latency p90     | 5.989ms                |
| Proxy latency p90 mean | 1.986ms                |
+------------------------------------------------+
| Response code                                  |
| 200                   | 1000                   |
+------------------------------------------------+

Implement openai-proxy MVP

26466aa

mfittko self-assigned this Mar 31, 2026

mfittko requested a review from Ayushi1296 March 31, 2026 18:09

mfittko requested a review from Copilot March 31, 2026 18:20

Copilot started reviewing on behalf of mfittko March 31, 2026 18:21 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

mfittko added 9 commits March 31, 2026 22:12

[review] address PR #1 comments

5469355

Push same-repo PR images to GHCR

afe1d75

Add browser CORS support for proxy requests

372eca1

Fix RuboCop line-length regressions

90eacb4

Fix buffered fallback in streaming proxy path

15750ad

[management] add CLI and project updates

cc68374

[db] validate stale pooled connections

e302931

[config] restore validation timeout default

4f612f6

[observability] add usage fallback extraction

ca15ab1

mfittko mentioned this pull request Apr 1, 2026

Add safe response caching for non-PII, low-input OpenAI responses after MVP #2

Open

mfittko added 2 commits April 1, 2026 21:08

Scope MVP to token validation caching only

4441300

Upgrade runtime and tooling to Ruby 4.0

56b9bdb

mfittko mentioned this pull request Apr 1, 2026

Follow up: add scoped response caching after PR #1 #3

Open

mfittko added 7 commits April 3, 2026 02:35

Make allowed origins configurable

8d8a20c

Cache project API keys in memory

4eeba35

Add project API key cache to TokenValidator initialization

e7eb957

Fix formatting for project API key cache write in token validator spec

09cee25

Add support for dash-delimited preview origins in CORS headers

ceb9ac2

Profile and speed up the local hot path

c97f026

Use a multi-stage runtime image

11b09b3

mfittko added 4 commits April 3, 2026 10:55

Fix Ruby 4 CI regressions

b50c02b

Simplify openai-proxy hot path observability

667b214

Harden in-memory observability pipeline

ff0aa23

test: fix proxy config fixture defaults

cd3366b

refactor: drop estimated usage fallback

abe0075

mfittko mentioned this pull request Apr 3, 2026

[observability] Add estimated usage fallback #4

Open

refactor: move profiler support to follow-up

9e3e68d

mfittko mentioned this pull request Apr 3, 2026

[observability] Add profiler and benchmark support #5

Open

mfittko added 2 commits April 3, 2026 23:33

refactor: move token counter to usage fallback follow-up

838022c

refactor: move release plumbing to follow-up

5a6f2a6

mfittko mentioned this pull request Apr 3, 2026

[release] Add Docker and release workflows #6

Open

mfittko added 3 commits April 4, 2026 00:20

fix: preserve preview cors allowlist

2dbd7e6

fix: start observability worker after fork

9a7b245

fix: accept symbol-keyed usage events

1cce870

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement openai-proxy MVP#1

Implement openai-proxy MVP#1
mfittko wants to merge 30 commits intomainfrom
feat/initial-openai-proxy

mfittko commented Mar 31, 2026 •

edited

Loading

Uh oh!

mfittko commented Mar 31, 2026

Uh oh!

mfittko commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mfittko commented Apr 1, 2026

Uh oh!

korny commented Apr 1, 2026 •

edited

Loading

Uh oh!

mfittko commented Apr 3, 2026 •

edited

Loading

Uh oh!

mfittko commented Apr 3, 2026 •

edited

Loading

Uh oh!

mfittko commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mfittko commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Architectural Direction

Testing

Notes

Uh oh!

mfittko commented Mar 31, 2026

Uh oh!

mfittko commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mfittko commented Apr 1, 2026

Uh oh!

korny commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfittko commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfittko commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfittko commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mfittko commented Mar 31, 2026 •

edited

Loading

korny commented Apr 1, 2026 •

edited

Loading

mfittko commented Apr 3, 2026 •

edited

Loading

mfittko commented Apr 3, 2026 •

edited

Loading