feat(sentinel): add scoped observability links#198
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces scoped observability for tenant users, enabling access to Prometheus metrics and Grafana dashboards restricted to their own namespaces or owned servers. It adds new API endpoints for observability links and Prometheus query proxying, updates the UI with 'Metrics' and 'Grafana' actions, and provides supporting documentation and tests. A review comment suggests enhancing the Prometheus proxy by logging the response body of failed upstream queries to improve debuggability.
| if resp.StatusCode != http.StatusOK { | ||
| _, _ = io.Copy(io.Discard, io.LimitReader(resp.Body, 1024)) | ||
| writeJSON(w, http.StatusBadGateway, map[string]string{"error": "prometheus_query_failed"}) | ||
| return | ||
| } |
There was a problem hiding this comment.
When the upstream Prometheus query fails with a non-200 status code, the response body is discarded without being logged. This can make debugging issues with the Prometheus integration difficult, as the reason for the failure is not captured. It would be beneficial to log the status code and the body of the error response from Prometheus to aid in troubleshooting.
| if resp.StatusCode != http.StatusOK { | |
| _, _ = io.Copy(io.Discard, io.LimitReader(resp.Body, 1024)) | |
| writeJSON(w, http.StatusBadGateway, map[string]string{"error": "prometheus_query_failed"}) | |
| return | |
| } | |
| if resp.StatusCode != http.StatusOK { | |
| body, _ := io.ReadAll(io.LimitReader(resp.Body, 4096)) | |
| log.Printf("observability prometheus query returned status %d namespace=%q server=%q query_id=%q body=%s", resp.StatusCode, target.Namespace, target.Name, query.ID, string(body)) | |
| writeJSON(w, http.StatusBadGateway, map[string]string{"error": "prometheus_query_failed"}) | |
| return | |
| } |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c28bdd297f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ID: "request_rate", | ||
| Name: "Request rate", | ||
| Description: "Five-minute MCP gateway request rate for this server.", | ||
| Query: "sum(rate(mcp_gateway_requests_total{" + selector + "}[5m]))", |
There was a problem hiding this comment.
Query metrics that Prometheus actually scrapes
In the bundled deployment this query returns no data for every server: rg finds no producer for mcp_gateway_requests_total (nor the other mcp_gateway_* series), and k8s/11-prometheus.yaml only scrapes mcp-sentinel-api, ingest, processor, and clickhouse, not MCP gateway sidecars. Because the UI now shows a Metrics button whenever these links are present, users get an apparently working scoped link that always opens an empty Prometheus result until the gateway metrics are emitted and scraped, or the allowlist is changed to series that actually exist.
Useful? React with 👍 / 👎.
Expose tenant Prometheus and platform-scoped Grafana actions by default for readable MCPServer rows, without requiring Grafana env flags for the safe default path. Add gateway sidecar metrics, Prometheus discovery for annotated MCPServer services, and operator service annotations so scoped dashboards have real per-server traffic data. Document the default behavior, update QA skills for tenant UI observability checks, and make service Docker builds honor the target platform for Kind arm64 nodes.
Remove BuildKit-only BUILDPLATFORM usage from service Dockerfiles so CI's legacy operator image build can parse the Dockerfile. Pass TARGETOS and TARGETARCH explicitly from DOCKER_PLATFORM in Makefile.operator to preserve requested-platform builds when DOCKER_BUILDKIT=0 is used.
Summary
namespace/servertarget before querying Prometheus.Metrics/ configuredGrafanaactions only when the API says the server is observable for the caller.PROMETHEUS_API_URL,GRAFANA_SERVER_DASHBOARD_URL, andGRAFANA_SCOPED_USER_ACCESS.Testing
go test ./internal/runtimeapi -count=1fromservices/apigo test ./... -count=1fromservices/uigo test ./... -count=1fromservices/apigo test ./... -count=1from repo rootgo vet ./...from repo root,services/api, andservices/uiRefs #181