Skip to content

Conversation

@tgrunnagle
Copy link

@tgrunnagle tgrunnagle commented Jan 25, 2026

Overview

This PR adds design documentation for deploying ToolHive's existing pkg/authserver as a standalone Kubernetes service with mTLS authentication.

What's Included

Problem Being Solved

The authserver (pkg/authserver/) is a complete OAuth2/OIDC implementation but exists as an unintegrated library with no deployment model, no proxyrunner integration, and no operator support.

Proposed Solution

  1. New MCPAuthServer CRD - Deploy authserver as a standalone Kubernetes service
  2. mTLS Authentication - Secure communication between proxyrunners and authserver using cert-manager
  3. Token Exchange Endpoint - Allow proxyrunners to retrieve upstream IDP tokens via /internal/token-exchange
  4. MCPServer CRD Updates - Add authServerClientConfig for mTLS client certificate configuration

Key Design Decisions

  • Per-MCPServer client certificates for identity and audit
  • cert-manager integration for automated certificate lifecycle
  • RFC 9728 OAuth Protected Resource Metadata for discovery
  • Token caching in proxyrunner to reduce authserver calls

Related

@tgrunnagle tgrunnagle requested a review from jhrozek January 25, 2026 17:42
Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still going through the doc digesting and commenting, here's my first batch of comments.

key: ca.crt

# Allowed client certificate patterns (for access control)
allowedSubjects:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think this is good in the sense that we control who can access, but don't see how we gate which tokens the server can retrieve?

And a problem we need to figure out is - when the client talks to the authserver, the only identity-like information authserver is in the /authorize request. The MCP spec mandates that the client sends the resource and I suggest we take advantage of it.

Then the new authorization server controller could maintain a mapping (based on MCPGroup or label matching or something) between the resource and the MCPServer (name,namespace) pairs which we'll get from the certificate identity:

  {
    "https://github.example.com/": {
      "namespace": "engineering",
      "name": "github-mcp"
    },
    "https://slack.example.com": {
      "namespace": "everyone",
      "name": "slack-mcp"
    }
  }

The authserver storage already has a session ID. This binds the JWT to the upstream token (alice's github token vs bob's github token). We'll then extend the storage to include the owner as well (name,namespace) so when authserver stores the token we'll have something like:

sessions[tsid] = {
  owner = name/namespace
  tokens = { access, refresh, ..}
}

then we issue a JWT with the tsid claim.

On token retrieval, we extract the owner from the client cert and verify:

  1. the owner
  2. the tsid

for vMCP, this means that the vMCP instance is the owner of all the upstream tokens for all its back ends, but I don't know how to separate those..maybe based on claims? Maybe we could have a backend prefix in the claim? OTOH I would like the backends to be quite opaque from the client perspective.

this way we authenticate the clients and authorize access to the tokens on the level of the mcpserver and the user.

The authserver stores **upstream IDP tokens** (access tokens, refresh tokens) and links them to the JWTs it issues via the `tsid` (token session ID) claim. When a proxyrunner receives a client request with an authserver-issued JWT, it may need to:

1. **Retrieve the upstream access token** to pass to backend MCP servers that require the original IDP token
2. **Exchange tokens** (RFC 8693) to get a backend-specific token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very interesting take. at first I was going to push back but 1) standards > NIH 2) we already have the code and 3) the fields in the exchange request seem to be extensible.

My only remaining reservation is that technically we're not exchanging tokens (there's no mechanism minting tokens on request) but retrieving existing pre-stored tokens.

- move `AuthServerConfig` from `MCPServer` to `MCPExternalAuthConfig`
- use SPIFFE urls for allowed clients
- Support multiple signing keys
- Scope `allowedSubjects` down to MCP server names (optional)
- Support `upstreamIdps` in MCPAuthServer (for future multiple Idp support)
- Complete `MCPAuthServer` go types code snippet
@tgrunnagle tgrunnagle marked this pull request as ready for review January 26, 2026 20:04
@tgrunnagle tgrunnagle requested review from JAORMX and yrobla January 26, 2026 20:04
namespace: toolhive-system
spec:
issuer: "https://mcp-authserver.toolhive-system.svc.cluster.local"
replicas: 2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't we just setting this to 1 all the time? are we looking for scalability at this point? if so, we should consider synchronization, mutexes, etc... around all the spec

```go
// pkg/auth/authserver/cache.go

// TokenCache provides thread-safe caching of exchanged tokens
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is just in-memory cache or are we looking to use another methods?


// UpstreamIdps configures upstream identity providers for user authentication
// The authserver federates authentication to these IDPs
// Currently only a single IDP is supported; multiple IDPs planned for vMCP use case
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now will that be integrated with vmcp code? i do not see details about it...

Following the [MCPServer CRD pattern](cmd/thv-operator/api/v1alpha1/mcpserver_types.go):

```yaml
apiVersion: toolhive.stacklok.dev/v1alpha1
Copy link

@yrobla yrobla Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we miss the observedgeneration, phase, etc... fields here, as we have with other crds?
also, are we thinking in adding validation webhooks?

AuthServerConfig *AuthServerClientConfig `json:"authserver_config,omitempty"`
}

type AuthServerClientConfig struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm runconfig is platform agnostic ... so if we add here authserverconfig, with certificates, isn't that just used for k8s? or how do we configure certs in client?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants