Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/my-website/docs/observability/callbacks.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,14 @@

liteLLM provides `input_callbacks`, `success_callbacks` and `failure_callbacks`, making it easy for you to send data to a particular provider depending on the status of your responses.

:::tip
**New to LiteLLM Callbacks?** Check out our comprehensive [Callback Management Guide](./callback_management.md) to understand when to use different callback hooks like `async_log_success_event` vs `async_post_call_success_hook`.
:::

liteLLM supports:

- [Custom Callback Functions](https://docs.litellm.ai/docs/observability/custom_callback)
- [Callback Management Guide](./callback_management.md) - **Comprehensive guide for choosing the right hooks**
- [Lunary](https://lunary.ai/docs)
- [Langfuse](https://langfuse.com/docs)
- [LangSmith](https://www.langchain.com/langsmith)
Expand Down
293 changes: 65 additions & 228 deletions docs/my-website/docs/observability/custom_callback.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
**For PROXY** [Go Here](../proxy/logging.md#custom-callback-class-async)
:::


## Callback Class
You can create a custom callback class to precisely log events as they occur in litellm.

Expand Down Expand Up @@ -57,6 +56,17 @@ def async completion():
asyncio.run(completion())
```

## Common Hooks

- `async_log_success_event` - Log successful API calls
- `async_log_failure_event` - Log failed API calls
- `log_pre_api_call` - Log before API call
- `log_post_api_call` - Log after API call

**Proxy-only hooks** (only work with LiteLLM Proxy):
- `async_post_call_success_hook` - Access user data + modify responses
- `async_pre_call_hook` - Modify requests before sending

## Callback Functions
If you just want to log on a specific event (e.g. on input) - you can use callback functions.

Expand Down Expand Up @@ -174,260 +184,87 @@ async def test_chat_openai():
asyncio.run(test_chat_openai())
```

:::info

We're actively trying to expand this to other event types. [Tell us if you need this!](https://github.com/BerriAI/litellm/issues/1007)
:::
## What's Available in kwargs?

## What's in kwargs?
The kwargs dictionary contains all the details about your API call:

Notice we pass in a kwargs argument to custom callback.
```python
def custom_callback(
kwargs, # kwargs to completion
completion_response, # response from completion
start_time, end_time # start/end time
):
# Your custom code here
print("LITELLM: in custom callback function")
print("kwargs", kwargs)
print("completion_response", completion_response)
print("start_time", start_time)
print("end_time", end_time)
```

This is a dictionary containing all the model-call details (the params we receive, the values we send to the http endpoint, the response we receive, stacktrace in case of errors, etc.).

This is all logged in the [model_call_details via our Logger](https://github.com/BerriAI/litellm/blob/fc757dc1b47d2eb9d0ea47d6ad224955b705059d/litellm/utils.py#L246).

Here's exactly what you can expect in the kwargs dictionary:
```shell
### DEFAULT PARAMS ###
"model": self.model,
"messages": self.messages,
"optional_params": self.optional_params, # model-specific params passed in
"litellm_params": self.litellm_params, # litellm-specific params passed in (e.g. metadata passed to completion call)
"start_time": self.start_time, # datetime object of when call was started

### PRE-API CALL PARAMS ### (check via kwargs["log_event_type"]="pre_api_call")
"input" = input # the exact prompt sent to the LLM API
"api_key" = api_key # the api key used for that LLM API
"additional_args" = additional_args # any additional details for that API call (e.g. contains optional params sent)

### POST-API CALL PARAMS ### (check via kwargs["log_event_type"]="post_api_call")
"original_response" = original_response # the original http response received (saved via response.text)

### ON-SUCCESS PARAMS ### (check via kwargs["log_event_type"]="successful_api_call")
"complete_streaming_response" = complete_streaming_response # the complete streamed response (only set if `completion(..stream=True)`)
"end_time" = end_time # datetime object of when call was completed

### ON-FAILURE PARAMS ### (check via kwargs["log_event_type"]="failed_api_call")
"exception" = exception # the Exception raised
"traceback_exception" = traceback_exception # the traceback generated via `traceback.format_exc()`
"end_time" = end_time # datetime object of when call was completed
```


### Cache hits

Cache hits are logged in success events as `kwarg["cache_hit"]`.

Here's an example of accessing it:

```python
import litellm
from litellm.integrations.custom_logger import CustomLogger
from litellm import completion, acompletion, Cache

class MyCustomHandler(CustomLogger):
async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
print(f"On Success")
print(f"Value of Cache hit: {kwargs['cache_hit']"})
async def test_async_completion_azure_caching():
customHandler_caching = MyCustomHandler()
litellm.cache = Cache(type="redis", host=os.environ['REDIS_HOST'], port=os.environ['REDIS_PORT'], password=os.environ['REDIS_PASSWORD'])
litellm.callbacks = [customHandler_caching]
unique_time = time.time()
response1 = await litellm.acompletion(model="azure/chatgpt-v-2",
messages=[{
"role": "user",
"content": f"Hi 👋 - i'm async azure {unique_time}"
}],
caching=True)
await asyncio.sleep(1)
print(f"customHandler_caching.states pre-cache hit: {customHandler_caching.states}")
response2 = await litellm.acompletion(model="azure/chatgpt-v-2",
messages=[{
"role": "user",
"content": f"Hi 👋 - i'm async azure {unique_time}"
}],
caching=True)
await asyncio.sleep(1) # success callbacks are done in parallel
print(f"customHandler_caching.states post-cache hit: {customHandler_caching.states}")
assert len(customHandler_caching.errors) == 0
assert len(customHandler_caching.states) == 4 # pre, post, success, success
```

### Get complete streaming response

LiteLLM will pass you the complete streaming response in the final streaming chunk as part of the kwargs for your custom callback function.

```python
# litellm.set_verbose = False
def custom_callback(
kwargs, # kwargs to completion
completion_response, # response from completion
start_time, end_time # start/end time
):
# print(f"streaming response: {completion_response}")
if "complete_streaming_response" in kwargs:
print(f"Complete Streaming Response: {kwargs['complete_streaming_response']}")

# Assign the custom callback function
litellm.success_callback = [custom_callback]

response = completion(model="claude-instant-1", messages=messages, stream=True)
for idx, chunk in enumerate(response):
pass
```


### Log additional metadata

LiteLLM accepts a metadata dictionary in the completion call. You can pass additional metadata into your completion call via `completion(..., metadata={"key": "value"})`.

Since this is a [litellm-specific param](https://github.com/BerriAI/litellm/blob/b6a015404eed8a0fa701e98f4581604629300ee3/litellm/main.py#L235), it's accessible via kwargs["litellm_params"]

```python
from litellm import completion
import os, litellm

## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-api-key"

messages = [{ "content": "Hello, how are you?","role": "user"}]

def custom_callback(
kwargs, # kwargs to completion
completion_response, # response from completion
start_time, end_time # start/end time
):
print(kwargs["litellm_params"]["metadata"])
def custom_callback(kwargs, completion_response, start_time, end_time):
# Access common data
model = kwargs.get("model")
messages = kwargs.get("messages", [])
cost = kwargs.get("response_cost", 0)
cache_hit = kwargs.get("cache_hit", False)


# Assign the custom callback function
litellm.success_callback = [custom_callback]

response = litellm.completion(model="gpt-3.5-turbo", messages=messages, metadata={"hello": "world"})
# Access metadata you passed in
metadata = kwargs.get("litellm_params", {}).get("metadata", {})
```

## Examples
**Key fields in kwargs:**
- `model` - The model name
- `messages` - Input messages
- `response_cost` - Calculated cost
- `cache_hit` - Whether response was cached
- `litellm_params.metadata` - Your custom metadata

### Custom Callback to track costs for Streaming + Non-Streaming
By default, the response cost is accessible in the logging object via `kwargs["response_cost"]` on success (sync + async)
```python
## Practical Examples

# Step 1. Write your custom callback function
def track_cost_callback(
kwargs, # kwargs to completion
completion_response, # response from completion
start_time, end_time # start/end time
):
try:
response_cost = kwargs["response_cost"] # litellm calculates response cost for you
print("regular response_cost", response_cost)
except:
pass
### Track API Costs
```python
def track_cost_callback(kwargs, completion_response, start_time, end_time):
cost = kwargs["response_cost"] # litellm calculates this for you
print(f"Request cost: ${cost}")

# Step 2. Assign the custom callback function
litellm.success_callback = [track_cost_callback]

# Step 3. Make litellm.completion call
response = completion(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": "Hi 👋 - i'm openai"
}
]
)

print(response)
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hello"}])
```

### Custom Callback to log transformed Input to LLMs
### Log Inputs to LLMs
```python
def get_transformed_inputs(
kwargs,
):
def get_transformed_inputs(kwargs):
params_to_model = kwargs["additional_args"]["complete_input_dict"]
print("params to model", params_to_model)

litellm.input_callback = [get_transformed_inputs]

def test_chat_openai():
try:
response = completion(model="claude-2",
messages=[{
"role": "user",
"content": "Hi 👋 - i'm openai"
}])

print(response)

except Exception as e:
print(e)
pass
response = completion(model="claude-2", messages=[{"role": "user", "content": "Hello"}])
```

#### Output
```shell
params to model {'model': 'claude-2', 'prompt': "\n\nHuman: Hi 👋 - i'm openai\n\nAssistant: ", 'max_tokens_to_sample': 256}
```

### Custom Callback to write to Mixpanel

### Send to External Service
```python
import mixpanel
import litellm
from litellm import completion

def custom_callback(
kwargs, # kwargs to completion
completion_response, # response from completion
start_time, end_time # start/end time
):
# Your custom code here
mixpanel.track("LLM Response", {"llm_response": completion_response})


# Assign the custom callback function
litellm.success_callback = [custom_callback]

response = completion(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": "Hi 👋 - i'm openai"
}
]
)
import requests

print(response)
def send_to_analytics(kwargs, completion_response, start_time, end_time):
data = {
"model": kwargs.get("model"),
"cost": kwargs.get("response_cost", 0),
"duration": (end_time - start_time).total_seconds()
}
requests.post("https://your-analytics.com/api", json=data)

litellm.success_callback = [send_to_analytics]
```

## Common Issues

### Callback Not Called
Make sure you:
1. Register callbacks correctly: `litellm.callbacks = [MyHandler()]`
2. Use the right hook names (check spelling)
3. Don't use proxy-only hooks in library mode

### Performance Issues
- Use async hooks for I/O operations
- Don't block in callback functions
- Handle exceptions properly:








```python
class SafeHandler(CustomLogger):
async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
try:
await external_service(response_obj)
except Exception as e:
print(f"Callback error: {e}") # Log but don't break the flow
```

4 changes: 4 additions & 0 deletions docs/my-website/docs/proxy/call_hooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ import Image from '@theme/IdealImage';
- Reject data before making llm api calls / before returning the response
- Enforce 'user' param for all openai endpoint calls

:::tip
**Understanding Callback Hooks?** Check out our [Callback Management Guide](../observability/callback_management.md) to understand the differences between proxy-specific hooks like `async_pre_call_hook` and general logging hooks like `async_log_success_event`.
:::

See a complete example with our [parallel request rate limiter](https://github.com/BerriAI/litellm/blob/main/litellm/proxy/hooks/parallel_request_limiter.py)

## Quick Start
Expand Down
Loading
Loading