Skip to content

[observability] Add CloudWatch dispatcher and propagate token metadata through async events#276

Open
mfittko wants to merge 21 commits intomainfrom
feat/cloudwatch-observability
Open

[observability] Add CloudWatch dispatcher and propagate token metadata through async events#276
mfittko wants to merge 21 commits intomainfrom
feat/cloudwatch-observability

Conversation

@mfittko
Copy link
Copy Markdown
Contributor

@mfittko mfittko commented Mar 28, 2026

Summary

This PR adds the observability plumbing needed to export richer llm-proxy usage events, centered on three concrete changes:

  • token metadata is persisted and carried through validation and proxy handling
  • async observability events are enriched with project, token, and user attribution
  • the dispatcher stack gains a CloudWatch Logs backend and supporting runtime hardening

What Changed

  • Persist token metadata in token storage and schema paths:
    • add metadata to token models and database accessors
    • update SQLite schema plus MySQL/PostgreSQL migrations
    • marshal and unmarshal token metadata between database and token domain types
  • Propagate token metadata through the runtime path:
    • include metadata in token manager, validator, and token cache flows
    • attach token/project context to request-scoped observability data in the proxy
    • expose token metadata to middleware and async event publication
  • Extend dispatcher payload modeling:
    • replace prompt/completion-only usage fields with canonical input/output/total usage fields
    • carry detailed token usage maps where available
    • normalize user and metadata attribution for downstream backends
  • Add CloudWatch dispatcher support:
    • add the CloudWatch Logs plugin and register it in the dispatcher plugin registry
    • wire dispatcher --service cloudwatch in the CLI
    • support region/log group/log stream configuration from flags and environment
  • Expand event transformation and decoding:
    • improve OpenAI response transformation for /v1/responses, streamed responses, compressed responses, and richer usage extraction
    • preserve model, request, and response context for downstream backends
    • keep Helicone payload generation aligned with the new usage model
  • Harden event-bus and proxy paths touched by observability:
    • make Redis Streams publishing buffered and asynchronous
    • keep request-scoped observability enrichment available across proxy/middleware boundaries
    • add cache helper coverage and related proxy/runtime test adjustments

Testing

  • Added focused tests for:
    • token metadata persistence and database round-tripping
    • token cache and validator metadata propagation
    • middleware/proxy observability enrichment
    • CloudWatch plugin behavior and payload sanitization
    • Redis Streams buffered publishing behavior
    • dispatcher and OpenAI transformer coverage for streamed/compressed responses and usage extraction
  • Targeted validation run:
    • go test ./internal/dispatcher ./internal/eventtransformer ./internal/eventbus

Documentation

  • Updated docs/guides/api-configuration.md for the expanded dispatcher configuration surface

Latest Review Notes

Fresh review found two follow-up risks still worth checking before merge:

  • single-event Responses SSE payloads may bypass the streaming merge path, which would drop model and usage extraction for response.completed-only bodies
  • the new fallback token counting still appears incomplete for legacy /v1/completions request/response shapes without upstream usage

Copilot AI review requested due to automatic review settings March 28, 2026 10:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CloudWatch-backed observability plumbing to the Go-based LLM proxy so usage can be attributed to projects/tokens/users, and dispatched to a CloudWatch Logs backend.

Changes:

  • Persist and surface token metadata (DB → token validation → proxy request context → async events).
  • Extend observability event payloads with ProjectID, TokenID, and TokenMetadata, and map user_id into dispatcher payloads.
  • Add a cloudwatch dispatcher backend plugin + CLI wiring.

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
scripts/schema.sql Adds metadata column to SQLite schema for tokens.
internal/token/validate.go Introduces ValidateTokenData* APIs returning TokenData (incl. metadata).
internal/token/manager.go Clones token metadata on creation to avoid shared map mutation.
internal/token/cache.go Adds cached TokenData validation paths and clones metadata on reads.
internal/server/server.go Accepts token metadata in management token-create request and returns it.
internal/server/management_api_test.go Verifies token create persists metadata.
internal/proxy/proxy.go Enriches observability events with project/token IDs + metadata; adds request-scoped event context.
internal/proxy/interfaces.go Adds TokenDataValidator and new context keys for token record ID/metadata.
internal/proxy/proxy_test.go Tests observability enrichment includes token metadata.
internal/proxy/cache_hit_fastpath_test.go Updates cache-hit tests to use token-data-aware validator mock.
internal/middleware/instrumentation.go Adds EventEnricher hook to enrich events before publish.
internal/middleware/instrumentation_test.go Tests EventEnricher propagation into published events.
internal/eventbus/eventbus.go Extends Event with ProjectID, TokenID, TokenMetadata.
internal/dispatcher/transformer.go Emits token/project metadata into dispatcher payload; extracts user_id, usage/model for /v1/responses.
internal/dispatcher/transformer_additional_test.go Tests usage + token metadata → dispatcher payload mapping.
internal/dispatcher/plugins/registry.go Registers new cloudwatch backend.
internal/dispatcher/plugins/cloudwatch.go Implements CloudWatch Logs backend writing sanitized JSON messages.
internal/dispatcher/plugins/cloudwatch_test.go Tests payload sanitization and stream-creation-on-missing behavior.
internal/database/models.go Adds Metadata field to DB token model.
internal/database/token.go Stores/loads token metadata; marshals/unmarshals map↔JSON.
internal/database/token_test.go Covers CRUD roundtrip including metadata column.
internal/database/migrations/sql/postgres/00007_add_token_metadata.sql Adds Postgres migration for token metadata column.
internal/database/migrations/sql/mysql/00001_initial_schema.sql Adds metadata column to MySQL initial schema (but not an upgrade migration).
cmd/proxy/main.go Adds cloudwatch to dispatcher CLI help and wires env→config keys.
go.mod / go.sum Adds AWS SDK v2 dependencies for CloudWatch Logs.

@mfittko mfittko self-assigned this Mar 28, 2026
@mfittko
Copy link
Copy Markdown
Contributor Author

mfittko commented Mar 28, 2026

Full review pass completed.

Summary:

  • Re-checked the token metadata persistence and validation path from management API through DB, validator cache, proxy context, event bus, and dispatcher payload transformation.
  • Re-checked the CloudWatch plugin behavior around stream creation, event ordering, sequence-token handling, and retry behavior.
  • Re-ran local validation with make lint and make test on the current branch state.

Validation:

  • make lint
  • make test
  • PR checks are green, including Build, Lint, Unit, Integration, PostgreSQL, MySQL, Docker, E2E, and Combined Coverage.

Result:

  • No blocking issues found in the current branch state.
  • Prior Copilot review comments are resolved and there are no unresolved review threads.
  • Ready to merge from my side.

@mfittko

@mfittko mfittko changed the title [observability] Add CloudWatch dispatcher and token metadata [production-readiness] Prepare llm-proxy for observability rollout Mar 30, 2026
@mfittko mfittko requested a review from Copilot March 30, 2026 20:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 38 out of 39 changed files in this pull request and generated 2 comments.

@mfittko mfittko changed the title [production-readiness] Prepare llm-proxy for observability rollout [observability] Add CloudWatch dispatcher and propagate token metadata through async events Mar 31, 2026
…io transcriptions and add corresponding tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants