feat: add Anthropic OAuth integration for user-specific API access#34
feat: add Anthropic OAuth integration for user-specific API access#34mgpai22 wants to merge 11 commits intoColeMurray:mainfrom
Conversation
- Base64 encode secrets JSON to safely pass multiline PEM keys - Use uv run for modal deploy commands - Add fastapi, PyJWT, and cryptography to modal-infra dependencies - Update .gitignore for .env*.local files - Add terraform production .gitignore - Update terraform lock file hashes
- Add /internal/anthropic-token CRUD endpoints for encrypted token storage - Add token refresh logic with correct Anthropic v1 OAuth endpoint - Add ANTHROPIC_CLIENT_ID to Env interface for refresh flow - Add Anthropic token columns to participants schema - Export token management utilities from auth module - Remove redundant X-Internal-Secret check from store handler (HMAC middleware already handles auth)
…fic API access - Retrieve and auto-refresh OAuth token when spawning sandboxes - Pass token through control plane -> Modal -> sandbox environment - Sandbox supervisor sets ANTHROPIC_API_KEY to user's OAuth token - Write auth.json for OpenCode compatibility - Fall back to shared API key when no OAuth token is available
- Add OAuth initiation endpoint with PKCE (S256) code challenge - Add token exchange callback using code-paste flow - Add settings page with AnthropicConnection component - Add status and disconnect API routes using controlPlaneFetch - Use Anthropic's code-display callback URI to avoid redirect registration issues
- Include userId, githubLogin, githubName, githubEmail when creating sessions so the control plane sets the correct owner for OAuth token lookup - Add settings link to session sidebar
- Add anthropic_client_id and anthropic_client_secret variables - Pass ANTHROPIC_CLIENT_ID to control plane worker for token refresh - Add Anthropic OAuth env vars to Vercel web app
- Pass anthropic_oauth_token through web_api.py to SessionConfig - Write OAuth tokens to auth.json instead of overriding ANTHROPIC_API_KEY, since OAuth tokens (sk-ant-oat01-*) cannot be used as x-api-key headers - Filter user message parts in bridge SSE to prevent echo on silent failure - Detect empty assistant message sets and report errors instead of success - Add robust error message extraction for OpenCode's various error formats - Add diagnostic logging for API key availability at sandbox startup
Resolve merge conflicts: - client.ts: take upstream's structured logging - durable-object.ts: take upstream's lifecycle manager delegation - bridge.py: keep error extraction + add structured logging - manager.py: keep both GITHUB_APP_TOKEN and ANTHROPIC_OAUTH_TOKEN Add OAuth token support through the lifecycle manager chain: - Add anthropicOAuthToken to CreateSandboxConfig and RestoreConfig - Add getAnthropicOAuthToken resolver to SandboxLifecycleConfig - Pass token through ModalSandboxProvider to ModalClient - Create OAuth token resolver in SessionDO for lifecycle manager
- Add User-Agent header to token refresh request to avoid Cloudflare blocking non-browser requests (error 1010) - Add proactive token refresh in DO alarm handler so tokens stay fresh between sandbox spawns - Push refreshed tokens to running sandboxes via WebSocket update_token command so long-running sessions don't lose auth mid-session - Add update_token command handler in bridge to write refreshed auth.json
Display a badge in the session UI showing whether the agent is using the user's Claude Code Subscription (OAuth) or the shared API key. Persists auth method in sandbox table and broadcasts via WebSocket.
Greptile OverviewGreptile SummaryThis PR adds comprehensive Anthropic OAuth integration enabling user-specific API access with proactive token refresh. The implementation includes a code-paste OAuth flow, automatic token refresh via lifecycle manager alarms, WebSocket-based token push to sandboxes, and UI indicators showing authentication method. Key Achievements:
Architecture:
|
| private async proactiveTokenRefresh(): Promise<void> { | ||
| if (!this.env.SESSION_INDEX || !this.env.TOKEN_ENCRYPTION_KEY) return; | ||
|
|
||
| try { | ||
| const ownerResult = this.sql.exec( | ||
| `SELECT user_id FROM participants WHERE role = 'owner' LIMIT 1` | ||
| ); | ||
| const owners = ownerResult.toArray() as { user_id: string }[]; | ||
| const ownerUserId = owners[0]?.user_id; | ||
| if (!ownerUserId) return; | ||
|
|
||
| const tokenData = (await this.env.SESSION_INDEX.get( | ||
| `anthropic:token:${ownerUserId}`, | ||
| "json" | ||
| )) as { | ||
| accessTokenEncrypted: string; | ||
| refreshTokenEncrypted?: string; | ||
| expiresAt: number; | ||
| } | null; | ||
|
|
||
| if (!tokenData || !tokenData.refreshTokenEncrypted) return; | ||
|
|
||
| // Import inline to avoid circular deps | ||
| const { tokenNeedsRefresh, refreshAnthropicToken } = await import("../auth/anthropic"); | ||
| const { decryptToken } = await import("../auth/crypto"); | ||
|
|
||
| // Check if token needs refresh (within 5 min of expiry) | ||
| if (!tokenNeedsRefresh(tokenData.expiresAt)) return; | ||
|
|
||
| this.log.info("Proactive token refresh starting", { user_id: ownerUserId }); | ||
|
|
||
| const clientId = this.env.ANTHROPIC_CLIENT_ID || ""; | ||
| const encKey = this.env.TOKEN_ENCRYPTION_KEY; | ||
|
|
||
| const refreshResult = await refreshAnthropicToken( | ||
| tokenData.refreshTokenEncrypted, | ||
| clientId, | ||
| encKey | ||
| ); | ||
|
|
||
| if (!refreshResult.success || !refreshResult.accessToken || !refreshResult.expiresAt) { | ||
| this.log.warn("Proactive token refresh failed", { error: refreshResult.error }); | ||
| return; | ||
| } | ||
|
|
||
| // Persist refreshed tokens to KV | ||
| await this.env.SESSION_INDEX.put( | ||
| `anthropic:token:${ownerUserId}`, | ||
| JSON.stringify({ | ||
| accessTokenEncrypted: refreshResult.accessToken, | ||
| refreshTokenEncrypted: refreshResult.refreshToken || tokenData.refreshTokenEncrypted, | ||
| expiresAt: refreshResult.expiresAt, | ||
| storedAt: Date.now(), | ||
| }) | ||
| ); | ||
|
|
||
| this.log.info("Proactive token refresh succeeded", { | ||
| user_id: ownerUserId, | ||
| new_expiry: new Date(refreshResult.expiresAt).toISOString(), | ||
| }); | ||
|
|
||
| // If sandbox is running, push the new token to it | ||
| const sandboxWs = this.getSandboxWebSocket(); | ||
| if (sandboxWs) { | ||
| const decryptedToken = await decryptToken(refreshResult.accessToken, encKey); | ||
| this.safeSend(sandboxWs, { | ||
| type: "update_token", | ||
| token: decryptedToken, | ||
| expiresAt: refreshResult.expiresAt, | ||
| }); | ||
| this.log.info("Pushed refreshed token to sandbox"); | ||
| } | ||
| } catch (e) { | ||
| this.log.error("Proactive token refresh error", { | ||
| error: e instanceof Error ? e : String(e), | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
If token refresh fails or owner has no OAuth token, the alarm will still reschedule in 30 seconds via scheduleInactivityCheck(). This could cause repeated failed refresh attempts. Consider tracking refresh failures and backing off or skipping refresh attempts for sessions without OAuth tokens.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/control-plane/src/session/durable-object.ts
Line: 686:763
Comment:
If token refresh fails or owner has no OAuth token, the alarm will still reschedule in 30 seconds via `scheduleInactivityCheck()`. This could cause repeated failed refresh attempts. Consider tracking refresh failures and backing off or skipping refresh attempts for sessions without OAuth tokens.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| # Check for user's Anthropic OAuth token (for user-specific API access) | ||
| # OAuth tokens (sk-ant-oat01-...) cannot be used as x-api-key headers. | ||
| # Instead, write auth.json so OpenCode uses its native OAuth auth flow. | ||
| # The shared ANTHROPIC_API_KEY remains as a fallback. | ||
| anthropic_oauth_token = os.environ.get("ANTHROPIC_OAUTH_TOKEN") | ||
| if anthropic_oauth_token: | ||
| print("[supervisor] Using user's Anthropic OAuth token for API access") | ||
|
|
||
| # Write the auth.json file for OpenCode's native OAuth auth | ||
| opencode_data_dir = Path.home() / ".local" / "share" / "opencode" | ||
| opencode_data_dir.mkdir(parents=True, exist_ok=True) | ||
| auth_json_path = opencode_data_dir / "auth.json" | ||
| auth_data = { | ||
| "accessToken": anthropic_oauth_token, | ||
| "expiresAt": int(time.time() * 1000) + 3600000, # 1 hour from now | ||
| } | ||
| auth_json_path.write_text(json.dumps(auth_data)) | ||
| print(f"[supervisor] Wrote OAuth token to {auth_json_path}") | ||
| else: | ||
| print("[supervisor] No OAuth token, using shared ANTHROPIC_API_KEY") |
There was a problem hiding this comment.
The hardcoded expiresAt of 1 hour from now doesn't match the actual token expiry from control plane. When tokens are refreshed proactively (5 min before expiry), OpenCode may see the old expiry timestamp. Consider passing expiresAt as an environment variable from the control plane or accepting it in the refresh command payload.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/modal-infra/src/sandbox/entrypoint.py
Line: 238:257
Comment:
The hardcoded `expiresAt` of 1 hour from now doesn't match the actual token expiry from control plane. When tokens are refreshed proactively (5 min before expiry), OpenCode may see the old expiry timestamp. Consider passing `expiresAt` as an environment variable from the control plane or accepting it in the refresh command payload.
How can I resolve this? If you propose a fix, please make it concise.| async def _handle_update_token(self, cmd: dict[str, Any]) -> None: | ||
| """Handle update_token command - write refreshed OAuth token to auth.json.""" | ||
| token = cmd.get("token") | ||
| expires_at = cmd.get("expiresAt") | ||
| if not token: | ||
| self.log.warn("bridge.update_token_missing") | ||
| return | ||
|
|
||
| try: | ||
| opencode_data_dir = Path.home() / ".local" / "share" / "opencode" | ||
| opencode_data_dir.mkdir(parents=True, exist_ok=True) | ||
| auth_json_path = opencode_data_dir / "auth.json" | ||
| auth_data = { | ||
| "accessToken": token, | ||
| "expiresAt": expires_at or int(time.time() * 1000) + 3600000, | ||
| } | ||
| auth_json_path.write_text(json.dumps(auth_data)) | ||
| self.log.info("bridge.token_updated") | ||
| except Exception as e: | ||
| self.log.error("bridge.update_token_error", exc=e) | ||
|
|
There was a problem hiding this comment.
The update_token handler receives expiresAt from the control plane but defaults to 1 hour if missing. Using the actual expiresAt from the refresh response ensures OpenCode's auth.json matches the token's real expiry time.
| async def _handle_update_token(self, cmd: dict[str, Any]) -> None: | |
| """Handle update_token command - write refreshed OAuth token to auth.json.""" | |
| token = cmd.get("token") | |
| expires_at = cmd.get("expiresAt") | |
| if not token: | |
| self.log.warn("bridge.update_token_missing") | |
| return | |
| try: | |
| opencode_data_dir = Path.home() / ".local" / "share" / "opencode" | |
| opencode_data_dir.mkdir(parents=True, exist_ok=True) | |
| auth_json_path = opencode_data_dir / "auth.json" | |
| auth_data = { | |
| "accessToken": token, | |
| "expiresAt": expires_at or int(time.time() * 1000) + 3600000, | |
| } | |
| auth_json_path.write_text(json.dumps(auth_data)) | |
| self.log.info("bridge.token_updated") | |
| except Exception as e: | |
| self.log.error("bridge.update_token_error", exc=e) | |
| try: | |
| opencode_data_dir = Path.home() / ".local" / "share" / "opencode" | |
| opencode_data_dir.mkdir(parents=True, exist_ok=True) | |
| auth_json_path = opencode_data_dir / "auth.json" | |
| # Use expiresAt from control plane, or fallback to current time + 1 hour | |
| auth_data = { | |
| "accessToken": token, | |
| "expiresAt": expires_at if expires_at else int(time.time() * 1000) + 3600000, | |
| } | |
| auth_json_path.write_text(json.dumps(auth_data)) | |
| self.log.info("bridge.token_updated", expires_at=expires_at) | |
| except Exception as e: | |
| self.log.error("bridge.update_token_error", exc=e) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/modal-infra/src/sandbox/bridge.py
Line: 1153:1173
Comment:
The `update_token` handler receives `expiresAt` from the control plane but defaults to 1 hour if missing. Using the actual `expiresAt` from the refresh response ensures OpenCode's auth.json matches the token's real expiry time.
```suggestion
try:
opencode_data_dir = Path.home() / ".local" / "share" / "opencode"
opencode_data_dir.mkdir(parents=True, exist_ok=True)
auth_json_path = opencode_data_dir / "auth.json"
# Use expiresAt from control plane, or fallback to current time + 1 hour
auth_data = {
"accessToken": token,
"expiresAt": expires_at if expires_at else int(time.time() * 1000) + 3600000,
}
auth_json_path.write_text(json.dumps(auth_data))
self.log.info("bridge.token_updated", expires_at=expires_at)
except Exception as e:
self.log.error("bridge.update_token_error", exc=e)
```
How can I resolve this? If you propose a fix, please make it concise.| export function tokenNeedsRefresh(expiresAt: number): boolean { | ||
| return Date.now() >= expiresAt - REFRESH_BUFFER_MS; | ||
| } |
There was a problem hiding this comment.
Consider using a more generous refresh buffer (e.g., 15-30 minutes) to account for clock skew, network delays, and token refresh latency. The 5-minute buffer might be tight for distributed systems where the control plane alarm, token refresh API call, and sandbox update all need to complete before expiry.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/control-plane/src/auth/anthropic.ts
Line: 42:44
Comment:
Consider using a more generous refresh buffer (e.g., 15-30 minutes) to account for clock skew, network delays, and token refresh latency. The 5-minute buffer might be tight for distributed systems where the control plane alarm, token refresh API call, and sandbox update all need to complete before expiry.
How can I resolve this? If you propose a fix, please make it concise.| // The pasted code is "<authorization_code>#<state>" — split on "#" | ||
| const splits = rawCode.split("#"); | ||
| const code = splits[0]; | ||
| const state = splits[1] || ""; | ||
|
|
There was a problem hiding this comment.
The code splits on # assuming the format is always <code>#<state>. If Anthropic's OAuth response format changes or users paste malformed codes, splits[1] could be undefined. Consider validating the split result and providing clearer error messages for malformed codes.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/web/src/app/api/auth/anthropic/callback/route.ts
Line: 60:64
Comment:
The code splits on `#` assuming the format is always `<code>#<state>`. If Anthropic's OAuth response format changes or users paste malformed codes, `splits[1]` could be undefined. Consider validating the split result and providing clearer error messages for malformed codes.
How can I resolve this? If you propose a fix, please make it concise.| private createAnthropicOAuthTokenResolver(): (() => Promise<string | undefined>) | undefined { | ||
| if (!this.env.SESSION_INDEX || !this.env.TOKEN_ENCRYPTION_KEY) { | ||
| return undefined; | ||
| } | ||
|
|
||
| const kvRef = this.env.SESSION_INDEX; | ||
| const encKey = this.env.TOKEN_ENCRYPTION_KEY; | ||
| const clientId = this.env.ANTHROPIC_CLIENT_ID || ""; | ||
| const sql = this.sql; | ||
| const log = this.log; | ||
|
|
||
| return async (): Promise<string | undefined> => { | ||
| const ownerResult = sql.exec(`SELECT user_id FROM participants WHERE role = 'owner' LIMIT 1`); | ||
| const owners = ownerResult.toArray() as { user_id: string }[]; | ||
| const ownerUserId = owners[0]?.user_id; | ||
| if (!ownerUserId) return undefined; | ||
|
|
||
| const tokenData = (await kvRef.get(`anthropic:token:${ownerUserId}`, "json")) as { | ||
| accessTokenEncrypted: string; | ||
| refreshTokenEncrypted?: string; | ||
| expiresAt: number; | ||
| } | null; | ||
|
|
||
| if (!tokenData) return undefined; | ||
|
|
||
| const token = | ||
| (await getValidAnthropicToken( | ||
| tokenData.accessTokenEncrypted, | ||
| tokenData.refreshTokenEncrypted, | ||
| tokenData.expiresAt, | ||
| clientId, | ||
| encKey, | ||
| async (result) => { | ||
| if (result.success && result.accessToken && result.expiresAt) { | ||
| await kvRef.put( | ||
| `anthropic:token:${ownerUserId}`, | ||
| JSON.stringify({ | ||
| accessTokenEncrypted: result.accessToken, | ||
| refreshTokenEncrypted: result.refreshToken || tokenData.refreshTokenEncrypted, | ||
| expiresAt: result.expiresAt, | ||
| storedAt: Date.now(), | ||
| }) | ||
| ); | ||
| log.info("Refreshed Anthropic token", { | ||
| user_id: ownerUserId, | ||
| new_expiry: new Date(result.expiresAt).toISOString(), | ||
| }); | ||
| } | ||
| } | ||
| )) || undefined; | ||
|
|
||
| if (token) { | ||
| log.info("Using Anthropic OAuth token", { user_id: ownerUserId }); | ||
| } else { | ||
| log.info("Anthropic token expired/refresh failed", { user_id: ownerUserId }); | ||
| } | ||
|
|
||
| return token; | ||
| }; | ||
| } |
There was a problem hiding this comment.
The OAuth token resolver queries the database and KV on every call during sandbox lifecycle operations. Consider caching the resolved token for a short duration (e.g., 1-5 minutes) to reduce database queries, especially since tokens are valid for extended periods and refreshed proactively.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/control-plane/src/session/durable-object.ts
Line: 281:340
Comment:
The OAuth token resolver queries the database and KV on every call during sandbox lifecycle operations. Consider caching the resolved token for a short duration (e.g., 1-5 minutes) to reduce database queries, especially since tokens are valid for extended periods and refreshed proactively.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.- Increase token refresh buffer from 5 to 15 minutes before expiry - Validate authorization code is non-empty in OAuth callback - Improve bridge token update logging with explicit falsy check and expires_at in log output - Add exponential backoff to proactive token refresh (1m-1hr cap) with noOAuthTokenConfigured fast-path to skip DB/KV lookups entirely - Pass real expiresAt through full stack (resolver → lifecycle manager → provider → client → Modal sandbox → entrypoint) instead of hardcoding 1hr fallback - Fix pre-existing bug: client.ts was not serializing anthropicOAuthToken in the createSandbox JSON body - Cache OAuth token resolver results for 2 minutes to reduce redundant DB/KV queries during spawn
|
to my knowledge, anthropic's position on 3rd party usage of the claude sub has not changed. Given this, I am not willing to add a feature that would potentially lead to unsuspecting users having their accounts banned. Happy to revisit if/when policy is changed. |
Summary
Changes across packages
control-plane: OAuth token storage/management in Durable Objects, token refresh in lifecycle manager, new API routesmodal-infra: Bridge SSE handler for token refresh events, sandbox entrypoint passes OAuth tokensweb: OAuth callback route, settings page with connect/disconnect, auth badge in session headerterraform: Anthropic OAuth environment variables for Modal and control plane