Skip to content

Commit b4939c0

Browse files
authoredMar 10, 2025··
Merge pull request #17 from pamelafox/ragdemos
Add way more RAG demos
2 parents 2584b63 + e70ba13 commit b4939c0

13 files changed

+225688
-2
lines changed
 

‎README.md

+30-2
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
This repository contains a collection of Python scripts that demonstrate how to use the OpenAI API to generate chat completions.
44

5+
## OpenAI package
6+
7+
These scripts use the OpenAI package to demonstrate how to use the OpenAI API.
58
In increasing order of complexity, the scripts are:
69

710
1. [`chat.py`](./chat.py): A simple script that demonstrates how to use the OpenAI API to generate chat completions.
@@ -13,8 +16,33 @@ Plus these scripts to demonstrate additional features:
1316

1417
* [`chat_safety.py`](./chat_safety.py): The simple script with exception handling for Azure AI Content Safety filter errors.
1518
* [`chat_async.py`](./chat_async.py): Uses the async clients to make asynchronous calls, including an example of sending off multiple requests at once using `asyncio.gather`.
16-
* [`chat_langchain.py`](./chat_langchain.py): Uses the langchain SDK to generate chat completions. [Learn more from Langchain docs](https://python.langchain.com/docs/get_started/quickstart)
17-
* [`chat_llamaindex.py`](./chat_llamaindex.py): Uses the LlamaIndex SDK to generate chat completions. [Learn more from LlamaIndex docs](https://docs.llamaindex.ai/en/stable/)
19+
20+
## Popular LLM libraries
21+
22+
These scripts use popular LLM libraries to demonstrate how to use the OpenAI API with them:
23+
24+
* [`chat_langchain.py`](./chat_langchain.py): Uses the Langchain package to generate chat completions. [Learn more from Langchain docs](https://python.langchain.com/docs/get_started/quickstart)
25+
* [`chat_llamaindex.py`](./chat_llamaindex.py): Uses the LlamaIndex package to generate chat completions. [Learn more from LlamaIndex docs](https://docs.llamaindex.ai/en/stable/)
26+
* [`chat_pydanticai.py`](./chat_pydanticai.py): Uses the PydanticAI package to generate chat completions. [Learn more from PydanticAI docs](https://ai.pydantic.dev/)
27+
28+
## Retrieval-Augmented Generation (RAG)
29+
30+
These scripts demonstrate how to use the OpenAI API for Retrieval-Augmented Generation (RAG) tasks, where the model retrieves relevant information from a source and uses it to generate a response.
31+
32+
First install the RAG dependencies:
33+
34+
```bash
35+
python -m pip install -r requirements-rag.txt
36+
```
37+
38+
Then run the scripts (in order of increasing complexity):
39+
40+
* [`rag_csv.py`](./rag.py): Retrieves matching results from a CSV file and uses them to answer user's question.
41+
* [`rag_multiturn.py`](./rag_multiturn.py): The same idea, but with a back-and-forth chat interface using `input()` which keeps track of past messages and sends them with each chat completion call.
42+
* [`rag_queryrewrite.py`](./rag_queryrewrite.py): Adds a query rewriting step to the RAG process, where the user's question is rewritten to improve the retrieval results.
43+
* [`rag_documents_ingestion.py`](./rag_ingestion.py): Ingests PDFs by using pymupdf to convert to markdown, then using Langchain to split into chunks, then using OpenAI to embed the chunks, and finally storing in a local JSON file.
44+
* [`rag_documents_flow.py`](./rag_pdfs.py): A RAG flow that retrieves matching results from the local JSON file created by `rag_documents_ingestion.py`.
45+
* [`rag_documents_hybrid.py`](./rag_documents_hybrid.py): A RAG flow that implements a hybrid retrieval with both vector and keyword search, merging with Reciprocal Rank Fusion (RRF), and semantic re-ranking with a cross-encoder model.
1846

1947
## Setting up the environment
2048

‎data/Aphideater_hoverfly.pdf

255 KB
Binary file not shown.

‎data/California_carpenter_bee.pdf

316 KB
Binary file not shown.

‎data/Centris_pallida.pdf

970 KB
Binary file not shown.

‎data/Western_honey_bee.pdf

1010 KB
Binary file not shown.

‎rag_csv.py

+73
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
import csv
2+
import os
3+
4+
import azure.identity
5+
import openai
6+
from dotenv import load_dotenv
7+
from lunr import lunr
8+
9+
# Setup the OpenAI client to use either Azure, OpenAI.com, or Ollama API
10+
load_dotenv(override=True)
11+
API_HOST = os.getenv("API_HOST")
12+
13+
if API_HOST == "azure":
14+
token_provider = azure.identity.get_bearer_token_provider(
15+
azure.identity.DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
16+
)
17+
client = openai.AzureOpenAI(
18+
api_version=os.environ["AZURE_OPENAI_VERSION"],
19+
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
20+
azure_ad_token_provider=token_provider,
21+
)
22+
MODEL_NAME = os.environ["AZURE_OPENAI_DEPLOYMENT"]
23+
24+
elif API_HOST == "ollama":
25+
client = openai.OpenAI(base_url=os.environ["OLLAMA_ENDPOINT"], api_key="nokeyneeded")
26+
MODEL_NAME = os.environ["OLLAMA_MODEL"]
27+
28+
elif API_HOST == "github":
29+
client = openai.OpenAI(base_url="https://models.inference.ai.azure.com", api_key=os.environ["GITHUB_TOKEN"])
30+
MODEL_NAME = os.environ["GITHUB_MODEL"]
31+
32+
else:
33+
client = openai.OpenAI(api_key=os.environ["OPENAI_KEY"])
34+
MODEL_NAME = os.environ["OPENAI_MODEL"]
35+
36+
# Index the data from the CSV
37+
with open("hybrid.csv") as file:
38+
reader = csv.reader(file)
39+
rows = list(reader)
40+
documents = [{"id": (i + 1), "body": " ".join(row)} for i, row in enumerate(rows[1:])]
41+
index = lunr(ref="id", fields=["body"], documents=documents)
42+
43+
# Get the user question
44+
user_question = "how fast is the prius v?"
45+
46+
# Search the index for the user question
47+
results = index.search(user_question)
48+
matching_rows = [rows[int(result["ref"])] for result in results]
49+
50+
# Format as a markdown table, since language models understand markdown
51+
matches_table = " | ".join(rows[0]) + "\n" + " | ".join(" --- " for _ in range(len(rows[0]))) + "\n"
52+
matches_table += "\n".join(" | ".join(row) for row in matching_rows)
53+
54+
print("Found matches:")
55+
print(matches_table)
56+
57+
# Now we can use the matches to generate a response
58+
SYSTEM_MESSAGE = """
59+
You are a helpful assistant that answers questions about cars based off a hybrid car data set.
60+
You must use the data set to answer the questions, you should not provide any info that is not in the provided sources.
61+
"""
62+
63+
response = client.chat.completions.create(
64+
model=MODEL_NAME,
65+
temperature=0.3,
66+
messages=[
67+
{"role": "system", "content": SYSTEM_MESSAGE},
68+
{"role": "user", "content": f"{user_question}\nSources: {matches_table}"},
69+
],
70+
)
71+
72+
print(f"\nResponse from {API_HOST}: \n")
73+
print(response.choices[0].message.content)

‎rag_documents_flow.py

+70
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
import json
2+
import os
3+
4+
import azure.identity
5+
import openai
6+
from dotenv import load_dotenv
7+
from lunr import lunr
8+
9+
# Setup the OpenAI client to use either Azure, OpenAI.com, or Ollama API
10+
load_dotenv(override=True)
11+
API_HOST = os.getenv("API_HOST")
12+
13+
if API_HOST == "azure":
14+
token_provider = azure.identity.get_bearer_token_provider(
15+
azure.identity.DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
16+
)
17+
client = openai.AzureOpenAI(
18+
api_version=os.environ["AZURE_OPENAI_VERSION"],
19+
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
20+
azure_ad_token_provider=token_provider,
21+
)
22+
MODEL_NAME = os.environ["AZURE_OPENAI_DEPLOYMENT"]
23+
24+
elif API_HOST == "ollama":
25+
client = openai.OpenAI(base_url=os.environ["OLLAMA_ENDPOINT"], api_key="nokeyneeded")
26+
MODEL_NAME = os.environ["OLLAMA_MODEL"]
27+
28+
elif API_HOST == "github":
29+
client = openai.OpenAI(base_url="https://models.inference.ai.azure.com", api_key=os.environ["GITHUB_TOKEN"])
30+
MODEL_NAME = os.environ["GITHUB_MODEL"]
31+
32+
else:
33+
client = openai.OpenAI(api_key=os.environ["OPENAI_KEY"])
34+
MODEL_NAME = os.environ["OPENAI_MODEL"]
35+
36+
# Index the data from the JSON - each object has id, text, and embedding
37+
with open("rag_ingested_chunks.json") as file:
38+
documents = json.load(file)
39+
documents_by_id = {doc["id"]: doc for doc in documents}
40+
index = lunr(ref="id", fields=["text"], documents=documents)
41+
42+
# Get the user question
43+
user_question = "where do digger bees live?"
44+
45+
# Search the index for the user question
46+
results = index.search(user_question)
47+
retrieved_documents = [documents_by_id[result["ref"]] for result in results]
48+
print(f"Retrieved {len(retrieved_documents)} matching documents, only sending the first 5.")
49+
context = "\n".join([f"{doc['id']}: {doc['text']}" for doc in retrieved_documents[0:5]])
50+
51+
# Now we can use the matches to generate a response
52+
SYSTEM_MESSAGE = """
53+
You are a helpful assistant that answers questions about Maya civilization.
54+
You must use the data set to answer the questions,
55+
you should not provide any info that is not in the provided sources.
56+
Cite the sources you used to answer the question inside square brackets.
57+
The sources are in the format: <id>: <text>.
58+
"""
59+
60+
response = client.chat.completions.create(
61+
model=MODEL_NAME,
62+
temperature=0.3,
63+
messages=[
64+
{"role": "system", "content": SYSTEM_MESSAGE},
65+
{"role": "user", "content": f"{user_question}\nSources: {context}"},
66+
],
67+
)
68+
69+
print(f"\nResponse from {MODEL_NAME} on {API_HOST}: \n")
70+
print(response.choices[0].message.content)

‎rag_documents_hybrid.py

+143
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
# pip install sentence-transformers
2+
import json
3+
import os
4+
5+
import azure.identity
6+
import openai
7+
from dotenv import load_dotenv
8+
from lunr import lunr
9+
from sentence_transformers import CrossEncoder
10+
11+
# Setup the OpenAI client to use either Azure, OpenAI.com, or Ollama API
12+
load_dotenv(override=True)
13+
API_HOST = os.getenv("API_HOST")
14+
15+
if API_HOST == "azure":
16+
token_provider = azure.identity.get_bearer_token_provider(
17+
azure.identity.DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
18+
)
19+
client = openai.AzureOpenAI(
20+
api_version=os.environ["AZURE_OPENAI_VERSION"],
21+
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
22+
azure_ad_token_provider=token_provider,
23+
)
24+
MODEL_NAME = os.environ["AZURE_OPENAI_DEPLOYMENT"]
25+
26+
elif API_HOST == "ollama":
27+
client = openai.OpenAI(base_url=os.environ["OLLAMA_ENDPOINT"], api_key="nokeyneeded")
28+
MODEL_NAME = os.environ["OLLAMA_MODEL"]
29+
30+
elif API_HOST == "github":
31+
client = openai.OpenAI(base_url="https://models.inference.ai.azure.com", api_key=os.environ["GITHUB_TOKEN"])
32+
MODEL_NAME = os.environ["GITHUB_MODEL"]
33+
34+
else:
35+
client = openai.OpenAI(api_key=os.environ["OPENAI_KEY"])
36+
MODEL_NAME = os.environ["OPENAI_MODEL"]
37+
38+
# Index the data from the JSON - each object has id, text, and embedding
39+
with open("rag_ingested_chunks.json") as file:
40+
documents = json.load(file)
41+
documents_by_id = {doc["id"]: doc for doc in documents}
42+
index = lunr(ref="id", fields=["text"], documents=documents)
43+
44+
45+
def full_text_search(query, limit):
46+
"""
47+
Perform a full-text search on the indexed documents.
48+
"""
49+
results = index.search(query)
50+
retrieved_documents = [documents_by_id[result["ref"]] for result in results[:limit]]
51+
return retrieved_documents
52+
53+
54+
def vector_search(query, limit):
55+
"""
56+
Perform a vector search on the indexed documents
57+
using a simple cosine similarity function.
58+
"""
59+
60+
def cosine_similarity(a, b):
61+
return sum(x * y for x, y in zip(a, b)) / ((sum(x * x for x in a) ** 0.5) * (sum(y * y for y in b) ** 0.5))
62+
63+
query_embedding = client.embeddings.create(model="text-embedding-3-small", input=query).data[0].embedding
64+
similarities = []
65+
for doc in documents:
66+
doc_embedding = doc["embedding"]
67+
similarity = cosine_similarity(query_embedding, doc_embedding)
68+
similarities.append((doc, similarity))
69+
similarities.sort(key=lambda x: x[1], reverse=True)
70+
71+
retrieved_documents = [doc for doc, _ in similarities[:limit]]
72+
return retrieved_documents
73+
74+
75+
def reciprocal_rank_fusion(text_results, vector_results, alpha=0.5):
76+
"""
77+
Perform Reciprocal Rank Fusion on the results from text and vector searches.
78+
"""
79+
text_ids = {doc["id"] for doc in text_results}
80+
vector_ids = {doc["id"] for doc in vector_results}
81+
82+
combined_results = []
83+
for doc in text_results:
84+
if doc["id"] in vector_ids:
85+
combined_results.append((doc, alpha))
86+
else:
87+
combined_results.append((doc, 1 - alpha))
88+
for doc in vector_results:
89+
if doc["id"] not in text_ids:
90+
combined_results.append((doc, alpha))
91+
combined_results.sort(key=lambda x: x[1], reverse=True)
92+
return [doc for doc, _ in combined_results]
93+
94+
95+
def rerank(query, retrieved_documents):
96+
"""
97+
Rerank the results using a cross-encoder model.
98+
"""
99+
encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
100+
scores = encoder.predict([(query, doc["text"]) for doc in retrieved_documents])
101+
scored_documents = [v for _, v in sorted(zip(scores, retrieved_documents), reverse=True)]
102+
return scored_documents
103+
104+
105+
def hybrid_search(query, limit):
106+
"""
107+
Perform a hybrid search using both full-text and vector search.
108+
"""
109+
text_results = full_text_search(query, limit * 2)
110+
vector_results = vector_search(query, limit * 2)
111+
combined_results = reciprocal_rank_fusion(text_results, vector_results)
112+
combined_results = rerank(query, combined_results)
113+
return combined_results[:limit]
114+
115+
116+
# Get the user question
117+
user_question = "cute gray fuzzsters"
118+
119+
# Search the index for the user question
120+
retrieved_documents = hybrid_search(user_question, limit=5)
121+
print(f"Retrieved {len(retrieved_documents)} matching documents.")
122+
context = "\n".join([f"{doc['id']}: {doc['text']}" for doc in retrieved_documents[0:5]])
123+
124+
# Now we can use the matches to generate a response
125+
SYSTEM_MESSAGE = """
126+
You are a helpful assistant that answers questions about Maya civilization.
127+
You must use the data set to answer the questions,
128+
you should not provide any info that is not in the provided sources.
129+
Cite the sources you used to answer the question inside square brackets.
130+
The sources are in the format: <id>: <text>.
131+
"""
132+
133+
response = client.chat.completions.create(
134+
model=MODEL_NAME,
135+
temperature=0.3,
136+
messages=[
137+
{"role": "system", "content": SYSTEM_MESSAGE},
138+
{"role": "user", "content": f"{user_question}\nSources: {context}"},
139+
],
140+
)
141+
142+
print(f"\nResponse from {MODEL_NAME} on {API_HOST}: \n")
143+
print(response.choices[0].message.content)

‎rag_documents_ingestion.py

+61
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
import json
2+
import os
3+
import pathlib
4+
5+
import azure.identity
6+
import openai
7+
import pymupdf4llm
8+
from dotenv import load_dotenv
9+
from langchain_text_splitters import RecursiveCharacterTextSplitter
10+
11+
# Setup the OpenAI client to use either Azure, OpenAI.com, or Ollama API
12+
load_dotenv(override=True)
13+
API_HOST = os.getenv("API_HOST")
14+
15+
if API_HOST == "azure":
16+
token_provider = azure.identity.get_bearer_token_provider(
17+
azure.identity.DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
18+
)
19+
client = openai.AzureOpenAI(
20+
api_version=os.environ["AZURE_OPENAI_VERSION"],
21+
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
22+
azure_ad_token_provider=token_provider,
23+
)
24+
MODEL_NAME = os.environ["AZURE_OPENAI_DEPLOYMENT"]
25+
26+
elif API_HOST == "ollama":
27+
client = openai.OpenAI(base_url=os.environ["OLLAMA_ENDPOINT"], api_key="nokeyneeded")
28+
MODEL_NAME = os.environ["OLLAMA_MODEL"]
29+
30+
elif API_HOST == "github":
31+
client = openai.OpenAI(base_url="https://models.inference.ai.azure.com", api_key=os.environ["GITHUB_TOKEN"])
32+
MODEL_NAME = os.environ["GITHUB_MODEL"]
33+
34+
else:
35+
client = openai.OpenAI(api_key=os.environ["OPENAI_KEY"])
36+
MODEL_NAME = os.environ["OPENAI_MODEL"]
37+
38+
data_dir = pathlib.Path(os.path.dirname(__file__)) / "data"
39+
filenames = ["California_carpenter_bee.pdf", "Centris_pallida.pdf", "Western_honey_bee.pdf", "Aphideater_hoverfly.pdf"]
40+
all_chunks = []
41+
for filename in filenames:
42+
# Extract text from the PDF file
43+
md_text = pymupdf4llm.to_markdown(data_dir / filename)
44+
45+
# Split the text into smaller chunks
46+
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
47+
model_name="gpt-4o", chunk_size=500, chunk_overlap=0
48+
)
49+
texts = text_splitter.create_documents([md_text])
50+
file_chunks = [{"id": f"{filename}-{(i + 1)}", "text": text.page_content} for i, text in enumerate(texts)]
51+
52+
# Generate embeddings using openAI SDK for each text
53+
for file_chunk in file_chunks:
54+
file_chunk["embedding"] = (
55+
client.embeddings.create(model="text-embedding-3-small", input=file_chunk["text"]).data[0].embedding
56+
)
57+
all_chunks.extend(file_chunks)
58+
59+
# Save the documents with embeddings to a JSON file
60+
with open("rag_ingested_chunks.json", "w") as f:
61+
json.dump(all_chunks, f, indent=4)

0 commit comments

Comments
 (0)
Please sign in to comment.