-
Notifications
You must be signed in to change notification settings - Fork 53
feat(llm-observability): add langchain integration #159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5d04873 to
2d6513e
Compare
3f21b1f to
d1fa606
Compare
Twixes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but overall looking solid. Great test coverage
| with: | ||
| fetch-depth: 1 | ||
|
|
||
| - name: Set up Python 3.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the Python version change? Just wanna be sure we aren't silently dropping older Python versions. Though 3.8 should be fine to drop as it's EoL – we should just be explicit about that event if it happens. (To be honest also surprised we don't use a matrix of Python versions for this Actions job)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
langchain-community requires Python >=3.9, so the CI was failing on 3.8. It is listed under test requirements, so no package will be installed for people not using the AI integration.
posthog/ai/openai/openai_async.py
Outdated
| try: | ||
| import openai | ||
| except ImportError: | ||
| raise ModuleNotFoundError("Please install the OpenAI SDK to use this feature: 'pip install openai'") | ||
|
|
||
| import openai.resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. For maximum readability would be great to put both openai imports under that try
posthog/ai/openai/openai.py
Outdated
| try: | ||
| import openai | ||
| except ImportError: | ||
| raise ModuleNotFoundError("Please install the OpenAI SDK to use this feature: 'pip install openai'") | ||
|
|
||
| import openai.resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as with the other openai imports
posthog/ai/langchain/callbacks.py
Outdated
| RunStorage = Dict[UUID, RunMetadata] | ||
|
|
||
|
|
||
| class PosthogCallbackHandler(BaseCallbackHandler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Posthog capitalization really bugs me 😅 But I also see it's what we're already using in the Python SDK. Maybe to avoid the PostHog vs. Posthog inconsistency, we can export just CallbackHandler – same as Langfuse (from langfuse.callback import CallbackHandler)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I followed the Client example, but let's rename it.
posthog/ai/langchain/callbacks.py
Outdated
| from posthog.ai.utils import get_model_params | ||
| from posthog.client import Client | ||
|
|
||
| PosthogProperties = Dict[str, Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This alias is not super useful, I think just Dict[str, Any] in function signatures might actually be a bit more obvious for SDK users
posthog/ai/langchain/callbacks.py
Outdated
| self._properties = properties | ||
| self._runs = {} | ||
| self._parent_tree = {} | ||
| self.log = logging.getLogger("posthog") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this can be just at the top level of the file, given we're going to be reusing the same logger instance every time
| "$ai_model": run.get("model"), | ||
| "$ai_model_parameters": run.get("model_params"), | ||
| "$ai_input": run.get("messages"), | ||
| "$ai_output": {"choices": output}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% agree, @k11kirky any objections?
| "$ai_output_tokens": output_tokens, | ||
| "$ai_latency": latency, | ||
| "$ai_trace_id": trace_id, | ||
| "$ai_posthog_properties": self._properties, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not unpack instead?
| "$ai_posthog_properties": self._properties, | |
| **self._properties, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to unpack them since customers and we don't need to unpack nested JSON values. I've followed the data model. @k11kirky what do you think?
| output = [_extract_raw_esponse(generation) for generation in generation_result] | ||
|
|
||
| event_properties = { | ||
| "$ai_provider": run.get("provider"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're missing $ai_request_url – is this metadata we have access to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen that one. Let me check; we should have this information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base_url is retrievable. However, an actual endpoint URL is tricky. I think the base API URL matters the most, so we can use it for the MVP. Otherwise, we should postpone it for Langchain.
posthog/ai/langchain/callbacks.py
Outdated
|
|
||
|
|
||
| def _get_http_status(error: BaseException) -> int: | ||
| # OpenAI: https://github.com/anthropics/anthropic-sdk-python/blob/main/src/anthropic/_exceptions.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the wrong link
| "$ai_posthog_properties": self._properties, | ||
| } | ||
| self._client.capture( | ||
| distinct_id=self._distinct_id or trace_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this will result in lots of persons, as we discussed. Does error tracking do something useful here that we can reuse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They explicitly mark the event as personless:
if self._distinct_id is None:
event_properties["$process_person_profile"] = FalseWorking on that now.
Problem
Implements a callback handler for LangChain, which can be used as a global or local instance.
Global instance:
Local instance:
Changes
from posthog.ai.openai import OpenAIand the handler asfrom posthog.ai.langchain import PosthogCallbackHandler.