BerriAI · krrishdholakia · Sep 4, 2025 · Sep 3, 2025 · Sep 3, 2025
diff --git a/docs/my-website/docs/observability/callbacks.md b/docs/my-website/docs/observability/callbacks.md
@@ -4,9 +4,14 @@
 
 liteLLM provides `input_callbacks`, `success_callbacks` and `failure_callbacks`, making it easy for you to send data to a particular provider depending on the status of your responses.
 
+:::tip
+**New to LiteLLM Callbacks?** Check out our comprehensive [Callback Management Guide](./callback_management.md) to understand when to use different callback hooks like `async_log_success_event` vs `async_post_call_success_hook`.
+:::
+
 liteLLM supports:
 
 - [Custom Callback Functions](https://docs.litellm.ai/docs/observability/custom_callback)
+- [Callback Management Guide](./callback_management.md) - **Comprehensive guide for choosing the right hooks**
 - [Lunary](https://lunary.ai/docs)
 - [Langfuse](https://langfuse.com/docs)
 - [LangSmith](https://www.langchain.com/langsmith)

diff --git a/docs/my-website/docs/observability/custom_callback.md b/docs/my-website/docs/observability/custom_callback.md
@@ -4,7 +4,6 @@
 **For PROXY** [Go Here](../proxy/logging.md#custom-callback-class-async)
 ::: 
 
-
 ## Callback Class
 You can create a custom callback class to precisely log events as they occur in litellm. 
 
@@ -57,6 +56,17 @@ def async completion():
 asyncio.run(completion())
 ```
 
+## Common Hooks
+
+- `async_log_success_event` - Log successful API calls
+- `async_log_failure_event` - Log failed API calls  
+- `log_pre_api_call` - Log before API call
+- `log_post_api_call` - Log after API call
+
+**Proxy-only hooks** (only work with LiteLLM Proxy):
+- `async_post_call_success_hook` - Access user data + modify responses
+- `async_pre_call_hook` - Modify requests before sending
+
 ## Callback Functions
 If you just want to log on a specific event (e.g. on input) - you can use callback functions. 
 
@@ -174,260 +184,87 @@ async def test_chat_openai():
 asyncio.run(test_chat_openai())
 ```
 
-:::info
-
-We're actively trying to expand this to other event types. [Tell us if you need this!](https://github.com/BerriAI/litellm/issues/1007)
-:::
+## What's Available in kwargs?
 
-## What's in kwargs? 
+The kwargs dictionary contains all the details about your API call:
 
-Notice we pass in a kwargs argument to custom callback. 
 ```python
-def custom_callback(
-    kwargs,                 # kwargs to completion
-    completion_response,    # response from completion
-    start_time, end_time    # start/end time
-):
-    # Your custom code here
-    print("LITELLM: in custom callback function")
-    print("kwargs", kwargs)
-    print("completion_response", completion_response)
-    print("start_time", start_time)
-    print("end_time", end_time)
-```
-
-This is a dictionary containing all the model-call details (the params we receive, the values we send to the http endpoint, the response we receive, stacktrace in case of errors, etc.). 
-
-This is all logged in the [model_call_details via our Logger](https://github.com/BerriAI/litellm/blob/fc757dc1b47d2eb9d0ea47d6ad224955b705059d/litellm/utils.py#L246).
-
-Here's exactly what you can expect in the kwargs dictionary:
-```shell
-### DEFAULT PARAMS ### 
-"model": self.model,
-"messages": self.messages,
-"optional_params": self.optional_params, # model-specific params passed in
-"litellm_params": self.litellm_params, # litellm-specific params passed in (e.g. metadata passed to completion call)
-"start_time": self.start_time, # datetime object of when call was started
-
-### PRE-API CALL PARAMS ### (check via kwargs["log_event_type"]="pre_api_call")
-"input" = input # the exact prompt sent to the LLM API
-"api_key" = api_key # the api key used for that LLM API 
-"additional_args" = additional_args # any additional details for that API call (e.g. contains optional params sent)
-
-### POST-API CALL PARAMS ### (check via kwargs["log_event_type"]="post_api_call")
-"original_response" = original_response # the original http response received (saved via response.text)
-
-### ON-SUCCESS PARAMS ### (check via kwargs["log_event_type"]="successful_api_call")
-"complete_streaming_response" = complete_streaming_response # the complete streamed response (only set if `completion(..stream=True)`)
-"end_time" = end_time # datetime object of when call was completed
-
-### ON-FAILURE PARAMS ### (check via kwargs["log_event_type"]="failed_api_call")
-"exception" = exception # the Exception raised
-"traceback_exception" = traceback_exception # the traceback generated via `traceback.format_exc()`
-"end_time" = end_time # datetime object of when call was completed
-```
-
-
-### Cache hits
-
-Cache hits are logged in success events as `kwarg["cache_hit"]`. 
-
-Here's an example of accessing it: 
-
-  ```python
-  import litellm
-from litellm.integrations.custom_logger import CustomLogger
-from litellm import completion, acompletion, Cache
-
-class MyCustomHandler(CustomLogger):
-    async def async_log_success_event(self, kwargs, response_obj, start_time, end_time): 
-        print(f"On Success")
-        print(f"Value of Cache hit: {kwargs['cache_hit']"})
-
-async def test_async_completion_azure_caching():
-    customHandler_caching = MyCustomHandler()
-    litellm.cache = Cache(type="redis", host=os.environ['REDIS_HOST'], port=os.environ['REDIS_PORT'], password=os.environ['REDIS_PASSWORD'])
-    litellm.callbacks = [customHandler_caching]
-    unique_time = time.time()
-    response1 = await litellm.acompletion(model="azure/chatgpt-v-2",
-                            messages=[{
-                                "role": "user",
-                                "content": f"Hi 👋 - i'm async azure {unique_time}"
-                            }],
-                            caching=True)
-    await asyncio.sleep(1)
-    print(f"customHandler_caching.states pre-cache hit: {customHandler_caching.states}")
-    response2 = await litellm.acompletion(model="azure/chatgpt-v-2",
-                            messages=[{
-                                "role": "user",
-                                "content": f"Hi 👋 - i'm async azure {unique_time}"
-                            }],
-                            caching=True)
-    await asyncio.sleep(1) # success callbacks are done in parallel
-    print(f"customHandler_caching.states post-cache hit: {customHandler_caching.states}")
-    assert len(customHandler_caching.errors) == 0
-    assert len(customHandler_caching.states) == 4 # pre, post, success, success
-  ```
-
-### Get complete streaming response
-
-LiteLLM will pass you the complete streaming response in the final streaming chunk as part of the kwargs for your custom callback function.
-
-```python
-# litellm.set_verbose = False
-        def custom_callback(
-            kwargs,                 # kwargs to completion
-            completion_response,    # response from completion
-            start_time, end_time    # start/end time
-        ):
-            # print(f"streaming response: {completion_response}")
-            if "complete_streaming_response" in kwargs: 
-                print(f"Complete Streaming Response: {kwargs['complete_streaming_response']}")
-
-        # Assign the custom callback function
-        litellm.success_callback = [custom_callback]
-
-        response = completion(model="claude-instant-1", messages=messages, stream=True)
-        for idx, chunk in enumerate(response): 
-            pass
-```
-
-
-### Log additional metadata
-
-LiteLLM accepts a metadata dictionary in the completion call. You can pass additional metadata into your completion call via `completion(..., metadata={"key": "value"})`. 
-
-Since this is a [litellm-specific param](https://github.com/BerriAI/litellm/blob/b6a015404eed8a0fa701e98f4581604629300ee3/litellm/main.py#L235), it's accessible via kwargs["litellm_params"]
-
-```python
-from litellm import completion
-import os, litellm
-
-## set ENV variables
-os.environ["OPENAI_API_KEY"] = "your-api-key"
-
-messages = [{ "content": "Hello, how are you?","role": "user"}]
-
-def custom_callback(
-    kwargs,                 # kwargs to completion
-    completion_response,    # response from completion
-    start_time, end_time    # start/end time
-):
-    print(kwargs["litellm_params"]["metadata"])
+def custom_callback(kwargs, completion_response, start_time, end_time):
+    # Access common data
+    model = kwargs.get("model")
+    messages = kwargs.get("messages", [])
+    cost = kwargs.get("response_cost", 0)
+    cache_hit = kwargs.get("cache_hit", False)
 
-
-# Assign the custom callback function
-litellm.success_callback = [custom_callback]
-
-response = litellm.completion(model="gpt-3.5-turbo", messages=messages, metadata={"hello": "world"})
+    # Access metadata you passed in
+    metadata = kwargs.get("litellm_params", {}).get("metadata", {})
 ```
 
-## Examples
+**Key fields in kwargs:**
+- `model` - The model name
+- `messages` - Input messages  
+- `response_cost` - Calculated cost
+- `cache_hit` - Whether response was cached
+- `litellm_params.metadata` - Your custom metadata
 
-### Custom Callback to track costs for Streaming + Non-Streaming
-By default, the response cost is accessible in the logging object via `kwargs["response_cost"]` on success (sync + async)
-```python
+## Practical Examples
 
-# Step 1. Write your custom callback function
-def track_cost_callback(
-    kwargs,                 # kwargs to completion
-    completion_response,    # response from completion
-    start_time, end_time    # start/end time
-):
-    try:
-        response_cost = kwargs["response_cost"] # litellm calculates response cost for you
-        print("regular response_cost", response_cost)
-    except:
-        pass
+### Track API Costs
+```python
+def track_cost_callback(kwargs, completion_response, start_time, end_time):
+    cost = kwargs["response_cost"] # litellm calculates this for you
+    print(f"Request cost: ${cost}")
 
-# Step 2. Assign the custom callback function
 litellm.success_callback = [track_cost_callback]
 
-# Step 3. Make litellm.completion call
-response = completion(
-    model="gpt-3.5-turbo",
-    messages=[
-        {
-            "role": "user",
-            "content": "Hi 👋 - i'm openai"
-        }
-    ]
-)
-
-print(response)
+response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hello"}])
 ```
 
-### Custom Callback to log transformed Input to LLMs
+### Log Inputs to LLMs
 ```python
-def get_transformed_inputs(
-    kwargs,
-):
+def get_transformed_inputs(kwargs):
     params_to_model = kwargs["additional_args"]["complete_input_dict"]
     print("params to model", params_to_model)
 
 litellm.input_callback = [get_transformed_inputs]
 
-def test_chat_openai():
-    try:
-        response = completion(model="claude-2",
-                              messages=[{
-                                  "role": "user",
-                                  "content": "Hi 👋 - i'm openai"
-                              }])
-
-        print(response)
-
-    except Exception as e:
-        print(e)
-        pass
+response = completion(model="claude-2", messages=[{"role": "user", "content": "Hello"}])
 ```
 
-#### Output
-```shell
-params to model {'model': 'claude-2', 'prompt': "\n\nHuman: Hi 👋 - i'm openai\n\nAssistant: ", 'max_tokens_to_sample': 256}
-```
-
-### Custom Callback to write to Mixpanel
-
+### Send to External Service
 ```python
-import mixpanel
-import litellm
-from litellm import completion
-
-def custom_callback(
-    kwargs,                 # kwargs to completion
-    completion_response,    # response from completion
-    start_time, end_time    # start/end time
-):
-    # Your custom code here
-    mixpanel.track("LLM Response", {"llm_response": completion_response})
-
-
-# Assign the custom callback function
-litellm.success_callback = [custom_callback]
-
-response = completion(
-    model="gpt-3.5-turbo",
-    messages=[
-        {
-            "role": "user",
-            "content": "Hi 👋 - i'm openai"
-        }
-    ]
-)
+import requests
 
-print(response)
+def send_to_analytics(kwargs, completion_response, start_time, end_time):
+    data = {
+        "model": kwargs.get("model"),
+        "cost": kwargs.get("response_cost", 0),
+        "duration": (end_time - start_time).total_seconds()
+    }
+    requests.post("https://your-analytics.com/api", json=data)
 
+litellm.success_callback = [send_to_analytics]
 ```
 
+## Common Issues
 
+### Callback Not Called
+Make sure you:
+1. Register callbacks correctly: `litellm.callbacks = [MyHandler()]`
+2. Use the right hook names (check spelling)
+3. Don't use proxy-only hooks in library mode
 
+### Performance Issues  
+- Use async hooks for I/O operations
+- Don't block in callback functions
+- Handle exceptions properly:
 
-
-
-
-
-
-
-
+```python
+class SafeHandler(CustomLogger):
+    async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
+        try:
+            await external_service(response_obj)
+        except Exception as e:
+            print(f"Callback error: {e}")  # Log but don't break the flow
+```
 
diff --git a/docs/my-website/docs/proxy/call_hooks.md b/docs/my-website/docs/proxy/call_hooks.md
@@ -6,6 +6,10 @@ import Image from '@theme/IdealImage';
 - Reject data before making llm api calls / before returning the response 
 - Enforce 'user' param for all openai endpoint calls
 
+:::tip
+**Understanding Callback Hooks?** Check out our [Callback Management Guide](../observability/callback_management.md) to understand the differences between proxy-specific hooks like `async_pre_call_hook` and general logging hooks like `async_log_success_event`.
+:::
+
 See a complete example with our [parallel request rate limiter](https://github.com/BerriAI/litellm/blob/main/litellm/proxy/hooks/parallel_request_limiter.py)
 
 ## Quick Start