pamelafox · pamelafox · Mar 20, 2025 · Mar 18, 2025 · Mar 18, 2025 · Mar 20, 2025
diff --git a/README.md b/README.md
@@ -44,6 +44,16 @@ Then run the scripts (in order of increasing complexity):
 * [`rag_documents_flow.py`](./rag_pdfs.py): A RAG flow that retrieves matching results from the local JSON file created by `rag_documents_ingestion.py`.
 * [`rag_documents_hybrid.py`](./rag_documents_hybrid.py): A RAG flow that implements a hybrid retrieval with both vector and keyword search, merging with Reciprocal Rank Fusion (RRF), and semantic re-ranking with a cross-encoder model.
 
+## Structured outputs with OpenAI
+
+These scripts demonstrate how to use the OpenAI API to generate structured responses using Pydantic data models:
+
+* [`structured_outputs_basic.py`](./structured_outputs_basic.py): Basic example extracting simple event information using a Pydantic model.
+* [`structured_outputs_description.py`](./structured_outputs_description.py): Uses additional descriptions in Pydantic model fields to clarify to the model how to format the response.
+* [`structured_outputs_enum.py`](./structured_outputs_enum.py): Uses enumerations (Enums) to restrict possible values in structured responses.
+* [`structured_outputs_function_calling.py`](./structured_outputs_function_calling.py): Demonstrates how to use functions defined with Pydantic for automatic function calling based on user queries.
+* [`structured_outputs_nested.py`](./structured_outputs_nested.py): Uses nested Pydantic models to handle more complex structured responses, such as events with participants having multiple attributes.
+
 ## Setting up the environment
 
 If you open this up in a Dev Container or GitHub Codespaces, everything will be setup for you.

diff --git a/spanish/README.md b/spanish/README.md
@@ -43,6 +43,18 @@ Luego ejecuta los scripts (en orden de complejidad creciente):
 * [`rag_documents_flow.py`](./rag_pdfs.py): Un flujo RAG que recupera resultados coincidentes del archivo JSON local creado por `rag_documents_ingestion.py`.
 * [`rag_documents_hybrid.py`](./rag_documents_hybrid.py): Un flujo RAG que implementa una recuperación híbrida con búsqueda vectorial y por palabras clave, fusionando con Reciprocal Rank Fusion (RRF), y reclasificación semántica con un modelo cross-encoder.
 
+## Salidas estructuradas con OpenAI
+
+Estos scripts muestran cómo usar la API de OpenAI para generar respuestas estructuradas usando modelos de datos con Pydantic:
+
+* [`structured_outputs_basic.py`](./structured_outputs_basic.py): Ejemplo básico que extrae información sencilla de un evento usando un modelo Pydantic.
+* [`structured_outputs_description.py`](./structured_outputs_description.py): Usa descripciones adicionales en los campos del modelo Pydantic para aclararle al modelo cómo formatear la respuesta.
+* [`structured_outputs_enum.py`](./structured_outputs_enum.py): Usa enumeraciones (Enums) para restringir los valores posibles en la respuesta estructurada.
+* [`structured_outputs_function_calling.py`](./structured_outputs_function_calling.py): Muestra cómo usar funciones definidas con Pydantic para que el modelo las llame automáticamente según la consulta del usuario.
+* [`structured_outputs_nested.py`](./structured_outputs_nested.py): Usa modelos anidados con Pydantic para manejar respuestas estructuradas más complejas, como eventos con participantes que tienen múltiples atributos.
+
+
+
 ## Configuración del entorno
 
 Si abres esto en un Dev Container o GitHub Codespaces, todo estará configurado para ti.

diff --git a/spanish/structured_outputs_basic.py b/spanish/structured_outputs_basic.py
@@ -0,0 +1,58 @@
+import os
+
+import azure.identity
+import openai
+import rich
+from dotenv import load_dotenv
+from pydantic import BaseModel
+
+# Setup the OpenAI client to use either Azure, OpenAI.com, or Ollama API
+load_dotenv(override=True)
+API_HOST = os.getenv("API_HOST", "github")
+
+if API_HOST == "azure":
+    token_provider = azure.identity.get_bearer_token_provider(
+        azure.identity.DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
+    )
+    client = openai.AzureOpenAI(
+        api_version=os.environ["AZURE_OPENAI_VERSION"],
+        azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
+        azure_ad_token_provider=token_provider,
+    )
+    MODEL_NAME = os.environ["AZURE_OPENAI_DEPLOYMENT"]
+
+elif API_HOST == "ollama":
+    client = openai.OpenAI(base_url=os.environ["OLLAMA_ENDPOINT"], api_key="nokeyneeded")
+    MODEL_NAME = os.environ["OLLAMA_MODEL"]
+
+elif API_HOST == "github":
+    client = openai.OpenAI(base_url="https://models.inference.ai.azure.com", api_key=os.environ["GITHUB_TOKEN"])
+    MODEL_NAME = os.getenv("GITHUB_MODEL", "gpt-4o")
+
+else:
+    client = openai.OpenAI(api_key=os.environ["OPENAI_KEY"])
+    MODEL_NAME = os.environ["OPENAI_MODEL"]
+
+
+class CalendarEvent(BaseModel):
+    name: str
+    date: str
+    participants: list[str]
+
+
+completion = client.beta.chat.completions.parse(
+    model=MODEL_NAME,
+    messages=[
+        {"role": "system", "content": "Extrae la info del evento."},
+        {"role": "user", "content": "Alice y Bob van a ir a una feria de ciencias el viernes."},
+    ],
+    response_format=CalendarEvent,
+)
+
+
+message = completion.choices[0].message
+if message.refusal:
+    rich.print(message.refusal)
+else:
+    event = message.parsed
+    rich.print(event)
diff --git a/spanish/structured_outputs_description.py b/spanish/structured_outputs_description.py
@@ -0,0 +1,60 @@
+import os
+
+import azure.identity
+import openai
+import rich
+from dotenv import load_dotenv
+from pydantic import BaseModel, Field
+
+# Setup the OpenAI client to use either Azure, OpenAI.com, or Ollama API
+load_dotenv(override=True)
+API_HOST = os.getenv("API_HOST", "github")
+
+if API_HOST == "azure":
+    token_provider = azure.identity.get_bearer_token_provider(
+        azure.identity.DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
+    )
+    client = openai.AzureOpenAI(
+        api_version=os.environ["AZURE_OPENAI_VERSION"],
+        azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
+        azure_ad_token_provider=token_provider,
+    )
+    MODEL_NAME = os.environ["AZURE_OPENAI_DEPLOYMENT"]
+
+elif API_HOST == "ollama":
+    client = openai.OpenAI(base_url=os.environ["OLLAMA_ENDPOINT"], api_key="nokeyneeded")
+    MODEL_NAME = os.environ["OLLAMA_MODEL"]
+
+elif API_HOST == "github":
+    client = openai.OpenAI(base_url="https://models.inference.ai.azure.com", api_key=os.environ["GITHUB_TOKEN"])
+    MODEL_NAME = os.getenv("GITHUB_MODEL", "gpt-4o")
+
+else:
+    client = openai.OpenAI(api_key=os.environ["OPENAI_KEY"])
+    MODEL_NAME = os.environ["OPENAI_MODEL"]
+
+
+class CalendarEvent(BaseModel):
+    name: str
+    date: str = Field(..., description="A date in the format YYYY-MM-DD")
+    participants: list[str]
+
+
+completion = client.beta.chat.completions.parse(
+    model=MODEL_NAME,
+    messages=[
+        {
+            "role": "system",
+            "content": "Extrae la info del evento. Si no dice el año, asumí que es este año (2025).",
+        },
+        {"role": "user", "content": "Alice y Bob van a ir a una feria de ciencias el 1 de abril."},
+    ],
+    response_format=CalendarEvent,
+)
+CalendarEvent(name="Feria de Ciencias", date="2025-04-01", participants=["Alice", "Bob"])
+message = completion.choices[0].message
+if message.refusal:
+    rich.print(message.refusal)
+else:
+    event = message.parsed
+    rich.print(event)
diff --git a/spanish/structured_outputs_enum.py b/spanish/structured_outputs_enum.py
@@ -0,0 +1,69 @@
+import os
+from enum import Enum
+
+import azure.identity
+import openai
+import rich
+from dotenv import load_dotenv
+from pydantic import BaseModel
+
+# Setup the OpenAI client to use either Azure, OpenAI.com, or Ollama API
+load_dotenv(override=True)
+API_HOST = os.getenv("API_HOST", "github")
+
+if API_HOST == "azure":
+    token_provider = azure.identity.get_bearer_token_provider(
+        azure.identity.DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
+    )
+    client = openai.AzureOpenAI(
+        api_version=os.environ["AZURE_OPENAI_VERSION"],
+        azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
+        azure_ad_token_provider=token_provider,
+    )
+    MODEL_NAME = os.environ["AZURE_OPENAI_DEPLOYMENT"]
+
+elif API_HOST == "ollama":
+    client = openai.OpenAI(base_url=os.environ["OLLAMA_ENDPOINT"], api_key="nokeyneeded")
+    MODEL_NAME = os.environ["OLLAMA_MODEL"]
+
+elif API_HOST == "github":
+    client = openai.OpenAI(base_url="https://models.inference.ai.azure.com", api_key=os.environ["GITHUB_TOKEN"])
+    MODEL_NAME = os.getenv("GITHUB_MODEL", "gpt-4o")
+
+else:
+    client = openai.OpenAI(api_key=os.environ["OPENAI_KEY"])
+    MODEL_NAME = os.environ["OPENAI_MODEL"]
+
+
+class DayOfWeek(str, Enum):
+    DOMINGO = "Domingo"
+    LUNES = "Lunes"
+    MARTES = "Martes"
+    MIÉRCOLES = "Miércoles"
+    JUEVES = "Jueves"
+    VIERNES = "Viernes"
+    SÁBADO = "Sábado"
+
+
+class CalendarEvent(BaseModel):
+    name: str
+    date: DayOfWeek
+    participants: list[str]
+
+
+completion = client.beta.chat.completions.parse(
+    model=MODEL_NAME,
+    messages=[
+        {"role": "system", "content": "Extrae la info del evento."},
+        {"role": "user", "content": "Alice y Bob van a ir a una feria de ciencias el viernes."},
+    ],
+    response_format=CalendarEvent,
+)
+
+
+message = completion.choices[0].message
+if message.refusal:
+    rich.print(message.refusal)
+else:
+    event = message.parsed
+    rich.print(event)
diff --git a/spanish/structured_outputs_function_calling.py b/spanish/structured_outputs_function_calling.py
@@ -0,0 +1,50 @@
+import os
+
+import azure.identity
+import openai
+import rich
+from dotenv import load_dotenv
+from pydantic import BaseModel
+
+# Setup the OpenAI client to use either Azure, OpenAI.com, or Ollama API
+load_dotenv(override=True)
+API_HOST = os.getenv("API_HOST", "github")
+
+if API_HOST == "azure":
+    token_provider = azure.identity.get_bearer_token_provider(
+        azure.identity.DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
+    )
+    client = openai.AzureOpenAI(
+        api_version=os.environ["AZURE_OPENAI_VERSION"],
+        azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
+        azure_ad_token_provider=token_provider,
+    )
+    MODEL_NAME = os.environ["AZURE_OPENAI_DEPLOYMENT"]
+
+elif API_HOST == "ollama":
+    client = openai.OpenAI(base_url=os.environ["OLLAMA_ENDPOINT"], api_key="nokeyneeded")
+    MODEL_NAME = os.environ["OLLAMA_MODEL"]
+
+elif API_HOST == "github":
+    client = openai.OpenAI(base_url="https://models.inference.ai.azure.com", api_key=os.environ["GITHUB_TOKEN"])
+    MODEL_NAME = os.getenv("GITHUB_MODEL", "gpt-4o")
+
+else:
+    client = openai.OpenAI(api_key=os.environ["OPENAI_KEY"])
+    MODEL_NAME = os.environ["OPENAI_MODEL"]
+
+
+class GetDeliveryDate(BaseModel):
+    order_id: str
+
+
+response = client.chat.completions.create(
+    model=MODEL_NAME,
+    messages=[
+        {"role": "system", "content": "Eres un bot de atención al cliente. Usá las herramientas para ayudar al usuario."},
+        {"role": "user", "content": "Hola, ¿me puedes decir cuándo llegará mi pedido #12345?"},
+    ],
+    tools=[openai.pydantic_function_tool(GetDeliveryDate)],
+)
+
+rich.print(response.choices[0].message.tool_calls[0].function)
diff --git a/spanish/structured_outputs_nested.py b/spanish/structured_outputs_nested.py
@@ -0,0 +1,63 @@
+import os
+
+import azure.identity
+import openai
+import rich
+from dotenv import load_dotenv
+from pydantic import BaseModel
+
+# Setup the OpenAI client to use either Azure, OpenAI.com, or Ollama API
+load_dotenv(override=True)
+API_HOST = os.getenv("API_HOST", "github")
+
+if API_HOST == "azure":
+    token_provider = azure.identity.get_bearer_token_provider(
+        azure.identity.DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
+    )
+    client = openai.AzureOpenAI(
+        api_version=os.environ["AZURE_OPENAI_VERSION"],
+        azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
+        azure_ad_token_provider=token_provider,
+    )
+    MODEL_NAME = os.environ["AZURE_OPENAI_DEPLOYMENT"]
+
+elif API_HOST == "ollama":
+    client = openai.OpenAI(base_url=os.environ["OLLAMA_ENDPOINT"], api_key="nokeyneeded")
+    MODEL_NAME = os.environ["OLLAMA_MODEL"]
+
+elif API_HOST == "github":
+    client = openai.OpenAI(base_url="https://models.inference.ai.azure.com", api_key=os.environ["GITHUB_TOKEN"])
+    MODEL_NAME = os.getenv("GITHUB_MODEL", "gpt-4o")
+
+else:
+    client = openai.OpenAI(api_key=os.environ["OPENAI_KEY"])
+    MODEL_NAME = os.environ["OPENAI_MODEL"]
+
+
+class Participant(BaseModel):
+    name: str
+    job_title: str
+
+
+class CalendarEvent(BaseModel):
+    name: str
+    date: str
+    participants: list[Participant]
+
+
+completion = client.beta.chat.completions.parse(
+    model=MODEL_NAME,
+    messages=[
+        {"role": "system", "content": "Extrae la info del evento."},
+        {"role": "user", "content": "Alice, que es carpintera, y Bob, que es plomero, van a ir a una feria de ciencias el viernes."},
+    ],
+    response_format=CalendarEvent,
+)
+
+
+message = completion.choices[0].message
+if message.refusal:
+    rich.print(message.refusal)
+else:
+    event = message.parsed
+    rich.print(event)
diff --git a/structured_outputs_basic.py b/structured_outputs_basic.py
@@ -0,0 +1,58 @@
+import os
+
+import azure.identity
+import openai
+import rich
+from dotenv import load_dotenv
+from pydantic import BaseModel
+
+# Setup the OpenAI client to use either Azure, OpenAI.com, or Ollama API
+load_dotenv(override=True)
+API_HOST = os.getenv("API_HOST", "github")
+
+if API_HOST == "azure":
+    token_provider = azure.identity.get_bearer_token_provider(
+        azure.identity.DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
+    )
+    client = openai.AzureOpenAI(
+        api_version=os.environ["AZURE_OPENAI_VERSION"],
+        azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
+        azure_ad_token_provider=token_provider,
+    )
+    MODEL_NAME = os.environ["AZURE_OPENAI_DEPLOYMENT"]
+
+elif API_HOST == "ollama":
+    client = openai.OpenAI(base_url=os.environ["OLLAMA_ENDPOINT"], api_key="nokeyneeded")
+    MODEL_NAME = os.environ["OLLAMA_MODEL"]
+
+elif API_HOST == "github":
+    client = openai.OpenAI(base_url="https://models.inference.ai.azure.com", api_key=os.environ["GITHUB_TOKEN"])
+    MODEL_NAME = os.getenv("GITHUB_MODEL", "gpt-4o")
+
+else:
+    client = openai.OpenAI(api_key=os.environ["OPENAI_KEY"])
+    MODEL_NAME = os.environ["OPENAI_MODEL"]
+
+
+class CalendarEvent(BaseModel):
+    name: str
+    date: str
+    participants: list[str]
+
+
+completion = client.beta.chat.completions.parse(
+    model=MODEL_NAME,
+    messages=[
+        {"role": "system", "content": "Extract the event information."},
+        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
+    ],
+    response_format=CalendarEvent,
+)
+
+
+message = completion.choices[0].message
+if message.refusal:
+    rich.print(message.refusal)
+else:
+    event = message.parsed
+    rich.print(event)