Skip to content

Commit 3f117db

Browse files
committed
Improving example text: embedding
1 parent 9e2c249 commit 3f117db

File tree

2 files changed

+97
-0
lines changed

2 files changed

+97
-0
lines changed

examples/ExampleEmbeddings.m

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
%% Information Retrieval Using OpenAI Document Embedding
2+
% This example shows how to find documents to answer queries using the 'text-embedding-3-small'
3+
% document embedding model. Embeddings are used to represent documents and queries
4+
% in a high-dimensional space, allowing for the efficient retrieval of relevant
5+
% information based on semantic similarity.
6+
%
7+
% The example consists of four steps:
8+
%%
9+
% * Download and preprocess text from several MATLAB documentation pages.
10+
% * Embed query document and document corpus using the "text-embedding-3-small"
11+
% document embedding.
12+
% * Find the most documentation page most relevant to the query using cosine
13+
% similarity scores.
14+
% * Generate an answer to the query based on the most relevant documentation
15+
% page.
16+
%%
17+
% This process is sometimes referred to as Retrieval-Augmented Generation (RAG),
18+
% similar to the application found in the example <./ExampleRetrievalAugmentedGeneration.mlx
19+
% ExampleRetrievalAugmentedGeneration.mlx>.
20+
%
21+
% This example requires Text Analytics Toolbox™.
22+
%
23+
% To run this example, you need a valid API key from a paid OpenAI API account.
24+
25+
loadenv(".env")
26+
addpath('..')
27+
%% Embed Query Document
28+
% Convert the query into a numerical vector using the extractOpenAIEmbeddings
29+
% function. Specify the model as "text-embedding-3-small".
30+
31+
query = "What is the best way to store data made up of rows and columns?";
32+
[qEmb, ~] = extractOpenAIEmbeddings(query, ModelName="text-embedding-3-small");
33+
qEmb(1:5)
34+
%% Download and Embed Source Text
35+
% In this example, we will scrape content from several MATLAB documentation
36+
% pages.
37+
%
38+
% This requires the following steps:
39+
%%
40+
% # Start with a list of websites. This examples uses pages from MATLAB documentation.
41+
% # Extract the context of the pags using |extractHTMLText|.
42+
% # Embed the websites using |extractOpenAIEmbeddings|.
43+
44+
metadata = ["https://www.mathworks.com/help/matlab/numeric-types.html";
45+
"https://www.mathworks.com/help/matlab/characters-and-strings.html";
46+
"https://www.mathworks.com/help/matlab/date-and-time-operations.html";
47+
"https://www.mathworks.com/help/matlab/categorical-arrays.html";
48+
"https://www.mathworks.com/help/matlab/tables.html"];
49+
id = (1:numel(metadata))';
50+
document = strings(numel(metadata),1);
51+
embedding = [];
52+
for ii = id'
53+
page = webread(metadata(ii));
54+
tree = htmlTree(page);
55+
subtree = findElement(tree,"body");
56+
document(ii) = extractHTMLText(subtree, ExtractionMethod="article");
57+
try
58+
[emb, ~] = extractOpenAIEmbeddings(document(ii),ModelName="text-embedding-3-small");
59+
embedding = [embedding; emb];
60+
catch
61+
end
62+
end
63+
vectorTable = table(id,document,metadata,embedding);
64+
%% Generate Answer to Query
65+
% Define the system prompt in |openAIChat| to answer questions based on context.
66+
67+
chat = openAIChat("You are a helpful MATLAB assistant. You will get a context for each question");
68+
%%
69+
% Calculate the cosine similarity scores between the query and each of the documentation
70+
% page using the |cosineSimilarity| function.
71+
72+
s = cosineSimilarity(vectorTable.embedding,qEmb);
73+
%%
74+
% Use the most similar documentation content to feed extra context into the
75+
% prompt for generation.
76+
77+
[~,idx] = max(s);
78+
context = vectorTable.document(idx);
79+
prompt = "Context: " ...
80+
+ context + newline + "Answer the following question: " + query;
81+
wrapText(prompt)
82+
%%
83+
% Pass the question and the context for generation to get a contextualized answer.
84+
85+
response = generate(chat, prompt);
86+
wrapText(response)
87+
%% Helper Function
88+
% Helper function to wrap text for easier reading in the live script.
89+
90+
function wrappedText = wrapText(text)
91+
wrappedText = splitSentences(text);
92+
wrappedText = join(wrappedText,newline);
93+
end
94+
%%
95+
% _Copyright 2024 The MathWorks, Inc._
96+
%
97+
%

examples/ExampleEmbeddings.mlx

-174 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)