A lightweight proxy server that translates Claude Code's Anthropic API calls into NVIDIA NIM, OpenRouter, or LM Studio format. Get 40 free requests/min on NVIDIA NIM, access hundreds of models on OpenRouter, or run fully local with LM Studio.
Features · Quick Start · How It Works · Discord Bot · Configuration
| Feature | Description |
|---|---|
| Zero Cost | 40 req/min free on NVIDIA NIM. Free models on OpenRouter. Fully local with LM Studio |
| Drop-in Replacement | Set 2 env vars — no modifications to Claude Code CLI or VSCode extension needed |
| 3 Providers | NVIDIA NIM, OpenRouter (hundreds of models), LM Studio (local & offline) |
| Thinking Token Support | Parses <think> tags and reasoning_content into native Claude thinking blocks |
| Heuristic Tool Parser | Models outputting tool calls as text are auto-parsed into structured tool use |
| Request Optimization | 5 categories of trivial API calls intercepted locally — saves quota and latency |
| Discord Bot | Remote autonomous coding with tree-based threading, session persistence, and live progress (Telegram also supported) |
| Smart Rate Limiting | Proactive rolling-window throttle + reactive 429 exponential backoff + optional concurrency cap across all providers |
| Subagent Control | Task tool interception forces run_in_background=False — no runaway subagents |
| Extensible | Clean BaseProvider and MessagingPlatform ABCs — add new providers or platforms easily |
- Get an API key (or use LM Studio locally):
- NVIDIA NIM: build.nvidia.com/settings/api-keys
- OpenRouter: openrouter.ai/keys
- LM Studio: No API key needed — run locally with LM Studio
- Install Claude Code
- Install uv
git clone https://github.com/Alishahryar1/free-claude-code.git
cd free-claude-code
cp .env.example .envChoose your provider and edit .env:
NVIDIA NIM (recommended — 40 req/min free)
PROVIDER_TYPE=nvidia_nim
NVIDIA_NIM_API_KEY=nvapi-your-key-here
MODEL=stepfun-ai/step-3.5-flashOpenRouter (hundreds of models)
PROVIDER_TYPE=open_router
OPENROUTER_API_KEY=sk-or-your-key-here
MODEL=stepfun/step-3.5-flash:freeLM Studio (fully local, no API key)
PROVIDER_TYPE=lmstudio
MODEL=lmstudio-community/qwen2.5-7b-instructTerminal 1 — Start the proxy server:
uv run uvicorn server:app --host 0.0.0.0 --port 8082Terminal 2 — Run Claude Code:
ANTHROPIC_AUTH_TOKEN=freecc ANTHROPIC_BASE_URL=http://localhost:8082 claudeThat's it! Claude Code now uses your configured provider for free.
Multi-Model Support (Model Picker)
claude-pick is an interactive model selector that lets you choose any model from your active provider each time you launch Claude — no need to edit MODEL in .env every time you want to switch.
Screen.Recording.2026-02-18.at.5.48.41.PM.mov
1. Install fzf (highly recommended for the interactive picker):
brew install fzf # macOS/Linux2. Add the alias to ~/.zshrc or ~/.bashrc:
# Use the absolute path to your cloned repo
alias claude-pick="/absolute/path/to/free-claude-code/claude-pick"Then reload your shell (source ~/.zshrc or source ~/.bashrc) and run claude-pick to pick a model and launch Claude.
Skip the picker with a fixed model (no picker needed):
alias claude-kimi='ANTHROPIC_BASE_URL="http://localhost:8082" ANTHROPIC_AUTH_TOKEN="freecc:moonshotai/kimi-k2.5" claude'VSCode Extension Setup
- Start the proxy server (same as above).
- Open Settings (
Ctrl + ,) and search forclaude-code.environmentVariables. - Click Edit in settings.json and add:
"claude-code.environmentVariables": [
{ "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" },
{ "name": "ANTHROPIC_AUTH_TOKEN", "value": "freecc" }
]- Reload extensions.
- If you see the login screen ("How do you want to log in?"): Click Anthropic Console, then authorize. The extension will start working. You may be redirected to buy credits in the browser — ignore that; the extension already works.
To switch back to Anthropic models, comment out the added block and reload extensions.
┌─────────────────┐ ┌──────────────────────┐ ┌──────────────────┐
│ Claude Code │───────>│ Free Claude Code │───────>│ LLM Provider │
│ CLI / VSCode │<───────│ Proxy (:8082) │<───────│ NIM / OR / LMS │
└─────────────────┘ └──────────────────────┘ └──────────────────┘
Anthropic API │ OpenAI-compatible
format (SSE) ┌───────┴────────┐ format (SSE)
│ Optimizations │
├────────────────┤
│ Quota probes │
│ Title gen skip │
│ Prefix detect │
│ Suggestion skip│
│ Filepath mock │
└────────────────┘
- Transparent proxy — Claude Code sends standard Anthropic API requests to the proxy server
- Request optimization — 5 categories of trivial requests (quota probes, title generation, prefix detection, suggestions, filepath extraction) are intercepted and responded to instantly without using API quota
- Format translation — Real requests are translated from Anthropic format to the provider's OpenAI-compatible format and streamed back
- Thinking tokens —
<think>tags andreasoning_contentfields are converted into native Claude thinking blocks so Claude Code renders them correctly
| Provider | Cost | Rate Limit | Models | Best For |
|---|---|---|---|---|
| NVIDIA NIM | Free | 40 req/min | Kimi K2, GLM5, Devstral, MiniMax | Daily driver — generous free tier |
| OpenRouter | Free / Paid | Varies | 200+ (GPT-4o, Claude, Step, etc.) | Model variety, fallback options |
| LM Studio | Free (local) | Unlimited | Any GGUF model | Privacy, offline use, no rate limits |
Switch providers by changing PROVIDER_TYPE in .env:
| Provider | PROVIDER_TYPE |
API Key Variable | Base URL |
|---|---|---|---|
| NVIDIA NIM | nvidia_nim |
NVIDIA_NIM_API_KEY |
integrate.api.nvidia.com/v1 |
| OpenRouter | open_router |
OPENROUTER_API_KEY |
openrouter.ai/api/v1 |
| LM Studio | lmstudio |
(none) | localhost:1234/v1 |
OpenRouter gives access to hundreds of models (StepFun, OpenAI, Anthropic, etc.) through a single API. Set MODEL to any OpenRouter model ID.
LM Studio runs locally — start the server in LM Studio's Developer tab or via lms server start, load a model, and set MODEL to the model identifier.
Control Claude Code remotely from Discord. Send tasks, watch live progress, and manage multiple concurrent sessions. Discord is the default messaging platform; Telegram is also supported.
Capabilities:
- Tree-based message threading — reply to messages to fork conversations
- Session persistence across server restarts
- Live streaming of thinking tokens, tool calls, and results
- Unlimited concurrent Claude CLI sessions (provider concurrency controlled by
PROVIDER_MAX_CONCURRENCY) - Voice notes — send voice messages; they are transcribed to text and processed like regular prompts (see Voice Notes)
- Commands:
/stop(cancel tasks; reply to a message to stop only that task),/clear(standalone: reset all sessions; reply to a message to clear that branch downwards),/stats
-
Create a Discord Bot — Go to Discord Developer Portal, create an application, add a bot, and copy the token. Enable Message Content Intent under Bot settings.
-
Edit
.env:
MESSAGING_PLATFORM=discord
DISCORD_BOT_TOKEN=your_discord_bot_token
ALLOWED_DISCORD_CHANNELS=123456789,987654321Enable Developer Mode in Discord (Settings → Advanced), then right-click a channel and "Copy ID" to get channel IDs. Comma-separate multiple channels. If empty, no channels are allowed.
- Configure the workspace (where Claude will operate):
CLAUDE_WORKSPACE=./agent_workspace
ALLOWED_DIR=C:/Users/yourname/projects- Start the server:
uv run uvicorn server:app --host 0.0.0.0 --port 8082- Invite the bot to your server (OAuth2 → URL Generator, scopes:
bot, permissions: Read Messages, Send Messages, Manage Messages, Read Message History). Send a message in an allowed channel with a task. Claude responds with thinking tokens, tool calls as they execute, and the final result. Reply to messages to cancel tasks or clear branches (see Commands above).
To use Telegram instead, set MESSAGING_PLATFORM=telegram and configure:
TELEGRAM_BOT_TOKEN=123456789:ABCdefGHIjklMNOpqrSTUvwxYZ
ALLOWED_TELEGRAM_USER_ID=your_telegram_user_idGet a token from @BotFather; find your user ID via @userinfobot.
Send voice messages on Telegram or Discord; they are transcribed to text and processed as regular prompts. Uses Hugging Face transformers Whisper — free, no API key, works offline, CUDA 13 compatible. No ffmpeg required (audio loaded via librosa).
Install the optional voice extra:
uv sync --extra voiceConfiguration:
| Variable | Description | Default |
|---|---|---|
VOICE_NOTE_ENABLED |
Enable voice note handling | true |
WHISPER_MODEL |
Hugging Face model ID or short name (tiny, base, small, medium, large-v2, large-v3, large-v3-turbo) |
base |
WHISPER_DEVICE |
cpu | cuda |
cpu |
HF_TOKEN |
Hugging Face token for faster model downloads (optional; create one) | — |
NVIDIA NIM
Full list in nvidia_nim_models.json.
Popular models:
qwen/qwen3.5-397b-a17bz-ai/glm5stepfun-ai/step-3.5-flashmoonshotai/kimi-k2.5minimaxai/minimax-m2.1
Browse: build.nvidia.com
Update model list:
curl "https://integrate.api.nvidia.com/v1/models" > nvidia_nim_models.jsonOpenRouter
Hundreds of models from StepFun, OpenAI, Anthropic, Google, and more.
Popular models:
stepfun/step-3.5-flash:freedeepseek/deepseek-r1-0528:freeopenai/gpt-oss-120b:free
Browse: openrouter.ai/models
Browse free models: https://openrouter.ai/collections/free-models
LM Studio
Run models locally with LM Studio. Load a model in the Chat or Developer tab, then set MODEL to its identifier.
Examples (native tool-use support):
lmstudio-community/qwen2.5-7b-instructlmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUFbartowski/Ministral-8B-Instruct-2410-GGUF
Browse: model.lmstudio.ai
| Variable | Description | Default |
|---|---|---|
PROVIDER_TYPE |
Provider: nvidia_nim, open_router, or lmstudio |
nvidia_nim |
MODEL |
Model to use for all requests | stepfun-ai/step-3.5-flash |
NVIDIA_NIM_API_KEY |
NVIDIA API key (NIM provider) | required |
OPENROUTER_API_KEY |
OpenRouter API key (OpenRouter provider) | required |
LM_STUDIO_BASE_URL |
LM Studio server URL | http://localhost:1234/v1 |
PROVIDER_RATE_LIMIT |
LLM API requests per window | 40 |
PROVIDER_RATE_WINDOW |
Rate limit window (seconds) | 60 |
PROVIDER_MAX_CONCURRENCY |
Max simultaneous open provider streams | 5 |
HTTP_READ_TIMEOUT |
Read timeout for provider API requests (seconds) | 300 |
HTTP_WRITE_TIMEOUT |
Write timeout for provider API requests (seconds) | 10 |
HTTP_CONNECT_TIMEOUT |
Connect timeout for provider API requests (seconds) | 2 |
FAST_PREFIX_DETECTION |
Enable fast prefix detection | true |
ENABLE_NETWORK_PROBE_MOCK |
Enable network probe mock | true |
ENABLE_TITLE_GENERATION_SKIP |
Skip title generation | true |
ENABLE_SUGGESTION_MODE_SKIP |
Skip suggestion mode | true |
ENABLE_FILEPATH_EXTRACTION_MOCK |
Enable filepath extraction mock | true |
MESSAGING_PLATFORM |
Messaging platform: discord or telegram |
discord |
DISCORD_BOT_TOKEN |
Discord Bot Token | "" |
ALLOWED_DISCORD_CHANNELS |
Comma-separated channel IDs (empty = none allowed) | "" |
TELEGRAM_BOT_TOKEN |
Telegram Bot Token | "" |
ALLOWED_TELEGRAM_USER_ID |
Allowed Telegram User ID | "" |
VOICE_NOTE_ENABLED |
Enable voice note handling | true |
WHISPER_MODEL |
Local Whisper model size | base |
WHISPER_DEVICE |
cpu | cuda |
cpu |
MESSAGING_RATE_LIMIT |
Messaging messages per window | 1 |
MESSAGING_RATE_WINDOW |
Messaging window (seconds) | 1 |
CLAUDE_WORKSPACE |
Directory for agent workspace | ./agent_workspace |
ALLOWED_DIR |
Allowed directories for agent | "" |
See .env.example for all supported parameters.
free-claude-code/
├── server.py # Entry point
├── api/ # FastAPI routes, request detection, optimization handlers
├── providers/ # BaseProvider, OpenAICompatibleProvider, NIM, OpenRouter, LM Studio
│ └── common/ # Shared utils (SSE builder, message converter, parsers, error mapping)
├── messaging/ # MessagingPlatform ABC + Discord/Telegram bots, session management
├── config/ # Settings, NIM config, logging
├── cli/ # CLI session and process management
├── utils/ # Text utilities
└── tests/ # Pytest test suite
uv run ruff format # Format code
uv run ruff check # Code style checking
uv run ty check # Type checking
uv run pytest # Run testsFor OpenAI-compatible APIs (Groq, Together AI, etc.), extend OpenAICompatibleProvider:
from providers.openai_compat import OpenAICompatibleProvider
from providers.base import ProviderConfig
class MyProvider(OpenAICompatibleProvider):
def __init__(self, config: ProviderConfig):
super().__init__(config, provider_name="MYPROVIDER",
base_url="https://api.example.com/v1", api_key=config.api_key)
def _build_request_body(self, request):
return build_request_body(request) # Your request builderFor fully custom APIs, extend BaseProvider directly:
from providers.base import BaseProvider, ProviderConfig
class MyProvider(BaseProvider):
async def stream_response(self, request, input_tokens=0, *, request_id=None):
# Yield Anthropic SSE format events
...Extend MessagingPlatform in messaging/ to add Slack or other platforms:
from messaging.base import MessagingPlatform
class MyPlatform(MessagingPlatform):
async def start(self):
# Initialize connection
...
async def stop(self):
# Cleanup
...
async def send_message(self, chat_id, text, reply_to=None, parse_mode=None, message_thread_id=None):
# Send a message
...
async def edit_message(self, chat_id, message_id, text, parse_mode=None):
# Edit an existing message
...
def on_message(self, handler):
# Register callback for incoming messages
...Contributions are welcome! Here are some ways to help:
- Report bugs or suggest features via Issues
- Add new LLM providers (Groq, Together AI, etc.)
- Add new messaging platforms (Slack, etc.)
- Improve test coverage
# Fork the repo, then:
git checkout -b my-feature
# Make your changes
uv run ruff format && uv run ruff check && uv run ty check && uv run pytest
# Open a pull requestThis project is licensed under the MIT License — see the LICENSE file for details.
Built with FastAPI, OpenAI Python SDK, discord.py, and python-telegram-bot.
