feat(llm-observability): add langchain integration #159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

skoob13 merged 11 commits into master from feat/llm-observability-v0.1-langchain

Jan 13, 2025

Contributor

skoob13 commented Jan 10, 2025 •

edited

Loading

Problem

Implements a callback handler for LangChain, which can be used as a global or local instance.

Global instance:

callback_handler = PosthogCallbackHandler(posthog_client)
def call_model():
    res = ChatOpenAI().invoke(["Foo"], callbacks=[callback_handler])

Local instance:

def call_model():
    callback_handler = PosthogCallbackHandler(posthog_client)
    res = ChatOpenAI().invoke(["Foo"], callbacks=[callback_handler])

Changes

Add a handler.
Reorganize imports so the OpenAI wrapper is now imported as from posthog.ai.openai import OpenAI and the handler as from posthog.ai.langchain import PosthogCallbackHandler.
Added tests for the integration: the handler supports LangChain>=0.2.0, including new 0.3.0.
Bumped the Python version on CI to 3.9 because it's the minimum supported version by LangChain. It won't affect users who are not using the AI wrappers.
Set up setup.py to install optional packages for the integration.

skoob13 added 2 commits

January 11, 2025 12:34


          feat(ai): LangChain integration v0.1

945d67d


          test: langchain integration tests

2d6513e

skoob13 force-pushed the feat/llm-observability-v0.1-langchain branch from 5d04873 to 2d6513e Compare

January 11, 2025 14:41


          test: langchain-openai for v2 and v3

f629591

skoob13 changed the title ~~WIP: LangChain integration for AI Observability~~ feat(llm-observability): add langchain integration

skoob13 requested review from Twixes and k11kirky

January 13, 2025 11:59

skoob13 marked this pull request as ready for review

January 13, 2025 11:59


          chore: reorganize imports

d1fa606

skoob13 force-pushed the feat/llm-observability-v0.1-langchain branch from 3f21b1f to d1fa606 Compare

January 13, 2025 12:01

skoob13 added 4 commits

January 13, 2025 14:38


          fix: ci

d8e2a50


          fix: set python on ci to 3.9

6111eff


          fix: upgrade ci for python 3.9

abc08a7


          fix: fallback for distinct_id

3f0f352

Twixes reviewed

View reviewed changes

Member

Twixes left a comment

A few comments, but overall looking solid. Great test coverage

.github/workflows/ci.yml

    
                            with:

                                fetch-depth: 1

                          - name: Set up Python 3.8

Member

Twixes Jan 13, 2025

Why the Python version change? Just wanna be sure we aren't silently dropping older Python versions. Though 3.8 should be fine to drop as it's EoL – we should just be explicit about that event if it happens. (To be honest also surprised we don't use a matrix of Python versions for this Actions job)

Contributor Author

skoob13 Jan 13, 2025

langchain-community requires Python >=3.9, so the CI was failing on 3.8. It is listed under test requirements, so no package will be installed for people not using the AI integration.

posthog/ai/openai/openai_async.py Outdated

Comment on lines 5 to 10

    
              try:

                  import openai

              except ImportError:

                  raise ModuleNotFoundError("Please install the OpenAI SDK to use this feature: 'pip install openai'")

              import openai.resources

Member

Twixes Jan 13, 2025

Good catch. For maximum readability would be great to put both openai imports under that try

posthog/ai/openai/openai.py Outdated

Comment on lines 5 to 10

    
              try:

                  import openai

              except ImportError:

                  raise ModuleNotFoundError("Please install the OpenAI SDK to use this feature: 'pip install openai'")

              import openai.resources

Member

Twixes Jan 13, 2025

Same as with the other openai imports

posthog/ai/langchain/callbacks.py Outdated

    
              RunStorage = Dict[UUID, RunMetadata]

              class PosthogCallbackHandler(BaseCallbackHandler):

Member

Twixes Jan 13, 2025

The Posthog capitalization really bugs me 😅 But I also see it's what we're already using in the Python SDK. Maybe to avoid the PostHog vs. Posthog inconsistency, we can export just CallbackHandler – same as Langfuse (from langfuse.callback import CallbackHandler)?

Contributor Author

skoob13 Jan 13, 2025

Good call. I followed the Client example, but let's rename it.

posthog/ai/langchain/callbacks.py Outdated

    
              from posthog.ai.utils import get_model_params

              from posthog.client import Client

              PosthogProperties = Dict[str, Any]

Member

Twixes Jan 13, 2025

This alias is not super useful, I think just Dict[str, Any] in function signatures might actually be a bit more obvious for SDK users

posthog/ai/langchain/callbacks.py Outdated

    
                      self._properties = properties

                      self._runs = {}

                      self._parent_tree = {}

                      self.log = logging.getLogger("posthog")

Member

Twixes Jan 13, 2025

Looks like this can be just at the top level of the file, given we're going to be reusing the same logger instance every time

posthog/ai/langchain/callbacks.py

    
                          "$ai_model": run.get("model"),

                          "$ai_model_parameters": run.get("model_params"),

                          "$ai_input": run.get("messages"),

                          "$ai_output": {"choices": output},

Member

Twixes Jan 13, 2025

Idea for simplifying the data model @skoob13 @k11kirky:

Suggested change

      
                        "$ai_output": {"choices": output},
          
                        "$ai_output_choices": output,

this should be just as readable, but with less nesting!

Contributor Author

skoob13 Jan 13, 2025

100% agree, @k11kirky any objections?

posthog/ai/langchain/callbacks.py

    
                          "$ai_output_tokens": output_tokens,

                          "$ai_latency": latency,

                          "$ai_trace_id": trace_id,

                          "$ai_posthog_properties": self._properties,

Member

Twixes Jan 13, 2025

Why not unpack instead?

Suggested change

      
                        "$ai_posthog_properties": self._properties,
          
                        **self._properties,

Contributor Author

skoob13 Jan 13, 2025

I think it makes sense to unpack them since customers and we don't need to unpack nested JSON values. I've followed the data model. @k11kirky what do you think?

posthog/ai/langchain/callbacks.py

    
                          output = [_extract_raw_esponse(generation) for generation in generation_result]

                      event_properties = {

                          "$ai_provider": run.get("provider"),

Member

Twixes Jan 13, 2025

We're missing $ai_request_url – is this metadata we have access to here?

Contributor Author

skoob13 Jan 13, 2025

I haven't seen that one. Let me check; we should have this information.

Contributor Author

skoob13 Jan 13, 2025

base_url is retrievable. However, an actual endpoint URL is tricky. I think the base API URL matters the most, so we can use it for the MVP. Otherwise, we should postpone it for Langchain.

posthog/ai/langchain/callbacks.py Outdated

    
              def _get_http_status(error: BaseException) -> int:

                  # OpenAI: https://github.com/anthropics/anthropic-sdk-python/blob/main/src/anthropic/_exceptions.py

Member

Twixes Jan 13, 2025

Looks like the wrong link

Twixes reviewed

View reviewed changes

posthog/ai/langchain/callbacks.py

    
                          "$ai_posthog_properties": self._properties,

                      }

                      self._client.capture(

                          distinct_id=self._distinct_id or trace_id,

Member

Twixes Jan 13, 2025

So this will result in lots of persons, as we discussed. Does error tracking do something useful here that we can reuse?

Contributor Author

skoob13 Jan 13, 2025

They explicitly mark the event as personless:

if self._distinct_id is None:
        event_properties["$process_person_profile"] = False

Working on that now.

skoob13 added 3 commits

January 13, 2025 16:29


          fix: personless events for omitted distinct_ids

9f7e094


          fix: review comments

739c88a


          feat: base url retrieval

fff0a3f

k11kirky approved these changes

View reviewed changes

skoob13 merged commit e51b883 into master

2 checks passed

skoob13 deleted the feat/llm-observability-v0.1-langchain branch

January 13, 2025 17:40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet