Skip to content

Commit eab55fe

Browse files
docs: added more info to load balancing & passthrough endpoints
1 parent ff09685 commit eab55fe

File tree

2 files changed

+71
-38
lines changed

2 files changed

+71
-38
lines changed

docs/my-website/docs/pass_through/intro.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,43 @@ These endpoints are useful for 2 scenarios:
1111
## How is your request handled?
1212

1313
The request is passed through to the provider's endpoint. The response is then passed back to the client. **No translation is done.**
14+
15+
### Request Forwarding Process
16+
17+
1. **Request Reception**: LiteLLM receives your request at `/provider/endpoint`
18+
2. **Authentication**: Your LiteLLM API key is validated and mapped to the provider's API key
19+
3. **Request Transformation**: Request is reformatted for the target provider's API
20+
4. **Forwarding**: Request is sent to the actual provider endpoint
21+
5. **Response Handling**: Provider response is returned directly to you
22+
23+
### Authentication Flow
24+
25+
```mermaid
26+
graph LR
27+
A[Client Request] --> B[LiteLLM Proxy]
28+
B --> C[Validate LiteLLM API Key]
29+
C --> D[Map to Provider API Key]
30+
D --> E[Forward to Provider]
31+
E --> F[Return Response]
32+
```
33+
34+
**Key Points:**
35+
- Use your **LiteLLM API key** in requests, not the provider's key
36+
- LiteLLM handles the provider authentication internally
37+
- Same authentication works across all passthrough endpoints
38+
39+
### Error Handling
40+
41+
**Provider Errors**: Forwarded directly to you with original error codes and messages
42+
43+
**LiteLLM Errors**:
44+
- `401`: Invalid LiteLLM API key
45+
- `404`: Provider or endpoint not supported
46+
- `500`: Internal routing/forwarding errors
47+
48+
### Benefits
49+
50+
- **Unified Authentication**: One API key for all providers
51+
- **Centralized Logging**: All requests logged through LiteLLM
52+
- **Cost Tracking**: Usage tracked across all endpoints
53+
- **Access Control**: Same permissions apply to passthrough endpoints

docs/my-website/docs/proxy/load_balancing.md

Lines changed: 31 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,23 @@ For more details on routing strategies / params, see [Routing](../routing.md)
1313

1414
:::
1515

16+
## How Load Balancing Works
17+
18+
LiteLLM automatically distributes requests across multiple deployments of the same model using its built-in router. the proxy routes traffic to optimize performance and reliability.
19+
20+
"simple-shuffle" routing strategy is used by default
21+
22+
### Routing Strategies
23+
24+
| Strategy | Description | When to Use |
25+
|----------|-------------|-------------|
26+
| **simple-shuffle** (recommended) | Randomly distributes requests | General purpose, good for even load distribution |
27+
| **least-busy** | Routes to deployment with fewest active requests | High concurrency scenarios |
28+
| **usage-based-routing** (bad for perf) | Routes to deployment with lowest current usage (RPM/TPM) | When you want to respect rate limits evenly |
29+
| **latency-based-routing** | Routes to fastest responding deployment | Latency-critical applications |
30+
| **cost-based-routing** | Routes to deployment with lowest cost | Cost-sensitive applications |
31+
32+
1633
## Quick Start - Load Balancing
1734
#### Step 1 - Set deployments on config
1835

@@ -106,49 +123,13 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \
106123
]
107124
}'
108125
```
109-
</TabItem>
110-
<TabItem value="langchain" label="Langchain">
111-
112-
```python
113-
from langchain.chat_models import ChatOpenAI
114-
from langchain.prompts.chat import (
115-
ChatPromptTemplate,
116-
HumanMessagePromptTemplate,
117-
SystemMessagePromptTemplate,
118-
)
119-
from langchain.schema import HumanMessage, SystemMessage
120-
import os
121-
122-
os.environ["OPENAI_API_KEY"] = "anything"
123-
124-
chat = ChatOpenAI(
125-
openai_api_base="http://0.0.0.0:4000",
126-
model="gpt-3.5-turbo",
127-
)
128-
129-
messages = [
130-
SystemMessage(
131-
content="You are a helpful assistant that im using to make a test request to."
132-
),
133-
HumanMessage(
134-
content="test from litellm. tell me why it's amazing in 1 sentence"
135-
),
136-
]
137-
response = chat(messages)
138-
139-
print(response)
140-
```
141-
142-
</TabItem>
143-
144-
</Tabs>
145126

146127

147128
### Test - Loadbalancing
148129

149130
In this request, the following will occur:
150131
1. A rate limit exception will be raised
151-
2. LiteLLM proxy will retry the request on the model group (default is 3).
132+
2. LiteLLM proxy will retry the request on the model group (default retries are 3).
152133

153134
```bash
154135
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
@@ -256,4 +237,16 @@ model_group_alias: Optional[Dict[str, Union[str, RouterModelGroupAliasItem]]] =
256237
class RouterModelGroupAliasItem(TypedDict):
257238
model: str
258239
hidden: bool # if 'True', don't return on `/v1/models`, `/v1/model/info`, `/v1/model_group/info`
259-
```
240+
```
241+
242+
### When You'll See Load Balancing in Action
243+
244+
**Immediate Effects:**
245+
246+
- Different deployments serve subsequent requests (visible in logs)
247+
- Better response times during high traffic
248+
249+
**Observable Benefits:**
250+
- **Higher throughput**: More requests handled simultaneously across deployments
251+
- **Improved reliability**: If one deployment fails, traffic automatically routes to healthy ones
252+
- **Better resource utilization**: Load spread evenly across all available deployments

0 commit comments

Comments
 (0)