Skip to content

feat(sentinel): add scoped observability links#198

Open
Agent-Hellboy wants to merge 3 commits into
mainfrom
sentinel/scoped_observability
Open

feat(sentinel): add scoped observability links#198
Agent-Hellboy wants to merge 3 commits into
mainfrom
sentinel/scoped_observability

Conversation

@Agent-Hellboy
Copy link
Copy Markdown
Owner

Summary

  • Add scoped observability API links for readable MCP servers and an allowlisted Prometheus query proxy that verifies the live namespace/server target before querying Prometheus.
  • Include observability metadata in runtime server listings and show My Activity Metrics / configured Grafana actions only when the API says the server is observable for the caller.
  • Document the split between raw admin-only Grafana/Prometheus and scoped user observability, plus PROMETHEUS_API_URL, GRAFANA_SERVER_DASHBOARD_URL, and GRAFANA_SCOPED_USER_ACCESS.

Testing

  • go test ./internal/runtimeapi -count=1 from services/api
  • go test ./... -count=1 from services/ui
  • go test ./... -count=1 from services/api
  • go test ./... -count=1 from repo root
  • go vet ./... from repo root, services/api, and services/ui
  • Commit pre-commit hooks: gitleaks, go fmt, staticcheck, go vet, go unit tests, generated file drift

Refs #181

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces scoped observability for tenant users, enabling access to Prometheus metrics and Grafana dashboards restricted to their own namespaces or owned servers. It adds new API endpoints for observability links and Prometheus query proxying, updates the UI with 'Metrics' and 'Grafana' actions, and provides supporting documentation and tests. A review comment suggests enhancing the Prometheus proxy by logging the response body of failed upstream queries to improve debuggability.

Comment on lines +129 to +133
if resp.StatusCode != http.StatusOK {
_, _ = io.Copy(io.Discard, io.LimitReader(resp.Body, 1024))
writeJSON(w, http.StatusBadGateway, map[string]string{"error": "prometheus_query_failed"})
return
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When the upstream Prometheus query fails with a non-200 status code, the response body is discarded without being logged. This can make debugging issues with the Prometheus integration difficult, as the reason for the failure is not captured. It would be beneficial to log the status code and the body of the error response from Prometheus to aid in troubleshooting.

Suggested change
if resp.StatusCode != http.StatusOK {
_, _ = io.Copy(io.Discard, io.LimitReader(resp.Body, 1024))
writeJSON(w, http.StatusBadGateway, map[string]string{"error": "prometheus_query_failed"})
return
}
if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 4096))
log.Printf("observability prometheus query returned status %d namespace=%q server=%q query_id=%q body=%s", resp.StatusCode, target.Namespace, target.Name, query.ID, string(body))
writeJSON(w, http.StatusBadGateway, map[string]string{"error": "prometheus_query_failed"})
return
}

@Agent-Hellboy Agent-Hellboy marked this pull request as ready for review May 19, 2026 05:36
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c28bdd297f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ID: "request_rate",
Name: "Request rate",
Description: "Five-minute MCP gateway request rate for this server.",
Query: "sum(rate(mcp_gateway_requests_total{" + selector + "}[5m]))",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Query metrics that Prometheus actually scrapes

In the bundled deployment this query returns no data for every server: rg finds no producer for mcp_gateway_requests_total (nor the other mcp_gateway_* series), and k8s/11-prometheus.yaml only scrapes mcp-sentinel-api, ingest, processor, and clickhouse, not MCP gateway sidecars. Because the UI now shows a Metrics button whenever these links are present, users get an apparently working scoped link that always opens an empty Prometheus result until the gateway metrics are emitted and scraped, or the allowlist is changed to series that actually exist.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Expose tenant Prometheus and platform-scoped Grafana actions by default for readable MCPServer rows, without requiring Grafana env flags for the safe default path.

Add gateway sidecar metrics, Prometheus discovery for annotated MCPServer services, and operator service annotations so scoped dashboards have real per-server traffic data.

Document the default behavior, update QA skills for tenant UI observability checks, and make service Docker builds honor the target platform for Kind arm64 nodes.
Remove BuildKit-only BUILDPLATFORM usage from service Dockerfiles so CI's legacy operator image build can parse the Dockerfile.

Pass TARGETOS and TARGETARCH explicitly from DOCKER_PLATFORM in Makefile.operator to preserve requested-platform builds when DOCKER_BUILDKIT=0 is used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant