Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Community): Adding Structured Support for ChatPerplexity #29361

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

keenborder786
Copy link
Contributor

@keenborder786 keenborder786 commented Jan 23, 2025

Copy link

vercel bot commented Jan 23, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Feb 7, 2025 9:00pm

@keenborder786 keenborder786 marked this pull request as ready for review January 25, 2025 19:21
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. community Related to langchain-community labels Jan 25, 2025
@keenborder786
Copy link
Contributor Author

@ccurme

@chain
def _oai_structured_outputs_parser(ai_msg: AIMessage) -> PydanticBaseModel:
if ai_msg.additional_kwargs.get("parsed"):
return ai_msg.additional_kwargs["parsed"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a BaseModel instance getting populated under "parsed" in .additional_kwargs?

@keenborder786
Copy link
Contributor Author

@ccurme please see now. I have double checked now and tested as well with Preplexity Docs.

@keenborder786
Copy link
Contributor Author

@ccurme looking all good, please review

@keenborder786
Copy link
Contributor Author

@ccurme

1 similar comment
@keenborder786
Copy link
Contributor Author

@ccurme

Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I enabled standard tests for perplexity to pick up tests for structured output. It's currently failing-- we expect to handle TypedDict, Pydantic, and JSON schema.

More importantly, this doesn't appear to work for any input type. Let me know if I'm doing something wrong.

from langchain_community.chat_models import ChatPerplexity
from pydantic import BaseModel, Field

class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

llm = ChatPerplexity(model="sonar").with_structured_output(Joke)
result = llm.invoke("Tell me a joke about cats.")

BadRequestError: Error code: 400 - {'error': {'message': '["At body -> response_format -> ResponseFormatText -> type: Input should be 'text'", "At body -> response_format -> ResponseFormatJSONSchema -> type: Input should be 'json_schema'", "At body -> response_format -> ResponseFormatJSONSchema -> json_schema: Field required", "At body -> response_format -> ResponseFormatRegex -> type: Input should be 'regex'", "At body -> response_format -> ResponseFormatRegex -> regex: Field required"]', 'type': 'bad_request', 'code': 400}}

@keenborder786
Copy link
Contributor Author

okay @ccurme

@keenborder786
Copy link
Contributor Author

@ccurme I have ensured that we are handling TypedDict, Pydantic, and JSON Schema. To clarify, currently, Perplexity only supports JSON Schema for structured output. Additionally, I have accounted for both Pydantic V1 and Pydantic V2 when converting schemas to JSON.

@keenborder786
Copy link
Contributor Author

@ccurme

@ccurme
Copy link
Collaborator

ccurme commented Feb 4, 2025

Thanks for the update. I'm still getting the same error though

from langchain_community.chat_models import ChatPerplexity
from pydantic import BaseModel, Field

class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

llm = ChatPerplexity(model="sonar").with_structured_output(Joke)
result = llm.invoke("Tell me a joke about cats.")

Are you able to reproduce the issue?

@keenborder786
Copy link
Contributor Author

@ccurme no

@keenborder786
Copy link
Contributor Author

What is the exact error you are facing?

@ccurme
Copy link
Collaborator

ccurme commented Feb 4, 2025

What is the exact error you are facing?

BadRequestError: Error code: 400 - {'error': {'message': '["At body -> response_format -> ResponseFormatText -> type: Input should be 'text'", "At body -> response_format -> ResponseFormatJSONSchema -> type: Input should be 'json_schema'", "At body -> response_format -> ResponseFormatJSONSchema -> json_schema: Field required", "At body -> response_format -> ResponseFormatRegex -> type: Input should be 'regex'", "At body -> response_format -> ResponseFormatRegex -> regex: Field required"]', 'type': 'bad_request', 'code': 400}}

Here is the relevant key according to the docs:

    "response_format": {
		    "type": "json_schema",
        "json_schema": {"schema": AnswerFormat.model_json_schema()},
    },

else:
response_format = schema.schema() # type: ignore[union-attr]
llm = self.bind(response_format=response_format)
output_parser = JsonOutputParser()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a pydantic object is passed in for the schema, we should return a pydantic object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ccurme I accidently forget to add the schema. It has been fixed now and I have tested it with a testing account as well.

@keenborder786
Copy link
Contributor Author

@ccurme

Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @keenborder786, there's still a bit of work to do on this one.

Here are the test cases to get passing:

from langchain_community.chat_models import ChatPerplexity
from pydantic import BaseModel, Field


query = "Tell me a joke about cats. Output a json object."
llm = ChatPerplexity(model="sonar")


# Pydantic
class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

structured_llm = llm.with_structured_output(Joke)
result = structured_llm.invoke(query)
assert isinstance(result, Joke)

## Streaming
for chunk in structured_llm.stream(query):
    assert isinstance(chunk, Joke)

# JSON schema

structured_llm = llm.with_structured_output(Joke.model_json_schema())
result = structured_llm.invoke(query)
assert isinstance(result, dict)
assert isinstance(result["setup"], str)
assert isinstance(result["punchline"], str)


for chunk in structured_llm.stream(query):
    assert isinstance(chunk, dict)

assert isinstance(chunk["setup"], str)
assert isinstance(chunk["punchline"], str)

# TypedDict

from typing_extensions import Annotated, TypedDict

class JokeDict(TypedDict):
    """Joke to tell user."""

    setup: Annotated[str, ..., "question to set up a joke"]
    punchline: Annotated[str, ..., "answer to resolve the joke"]


structured_llm = llm.with_structured_output(JokeDict)
result = structured_llm.invoke(query)
assert isinstance(result, dict)
assert isinstance(result["setup"], str)
assert isinstance(result["punchline"], str)


for chunk in structured_llm.stream(query):
    assert isinstance(chunk, dict)

assert isinstance(chunk["setup"], str)
assert isinstance(chunk["punchline"], str)

These are essentially our standard tests for structured output. We cannot use them out of the box because Perplexity's feature is different in that it appears as though you need to specifically prompt it to return a JSON object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Related to langchain-community size:L This PR changes 100-499 lines, ignoring generated files.
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

2 participants