Skip to content

Commit 2f77f65

Browse files
committed
fixed spellings
1 parent c7b1e67 commit 2f77f65

File tree

3 files changed

+42
-36
lines changed

3 files changed

+42
-36
lines changed

projects/ml/llm/notebooks/llms.clj

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,19 @@
33
[org.httpkit.client :as hk-client]
44
[cheshire.core :as json]))
55

6-
;; # using Large Language Models from Clojure
7-
;; LLMs often come as APIs, as they require computing power (GPUs), which most users do not have
8-
;; localy.
9-
;; OpenAI offers their models behind an (paid) API for example. In the following we will see three
10-
;;diferent ways to use the GPT-4 model from OpenAI
6+
;;# using Large Language Models from Clojure
7+
;;LLMs often come as APIs, as they require computing power (GPUs), which most users do not have
8+
;;localy.
9+
;;OpenAI offers their models behind an (paid) API for example. In the following we will see three
10+
;;different ways to use the GPT-4 model from OpenAI
1111

12-
;; get the openai API key either from environemnt or a specific file
12+
;; Get the openai API key either from environemnt or a specific file
1313
(def open-ai-key
1414
(or (System/getenv "OPEN_AI_KEY")
1515
(slurp "open_ai_secret.txt")
1616
)
1717
)
1818

19-
(or "hello" (slurp "aa"))
2019

2120
;## Use OpenAI API directly
2221
;; OpenAI offers a rather simple API, text-in text-out for "chatting" with GPT
@@ -42,9 +41,9 @@
4241
(json/decode keyword))
4342

4443
; ## use Bosquet
45-
; [bosquet](https://github.com/zmedelis/bosquet) abstracts some of the concepts of LLMs
44+
; [Bosquet](https://github.com/zmedelis/bosquet) abstracts some of the concepts of LLMs
4645
; on a higher level API. Its has further notions of "memory" and "tools"
47-
; and has feature we find for exampl in python "LangChain"
46+
; and has other features we find for exampl in python "LangChain"
4847

4948
;; Bosque wants the API key in a config file
5049
(spit "secrets.edn"
@@ -54,19 +53,21 @@
5453

5554
(require '[bosquet.llm.generator :refer [generate llm]])
5655

56+
;; Call GPT from Bosquet
57+
5758
(generate
5859
[[:user "What is Clojure"]
5960
[:assistant (llm :openai
6061
:llm/model-params {:model :gpt-4
6162
})]])
6263

6364

64-
;# use langchain4j
65-
;; We can use LLMs as well via a Java Interop and teh library
65+
;## Use langchain4j
66+
;; We can use LLMs as well via a Java Interop and the library
6667
;; [lnagchain4j](https://github.com/langchain4j/langchain4j) which aims
67-
;; to be a copy of the pythin langcahin, and offers support or
68-
;; build blcoks for several consept arround LLMs (model, vecstorstores, document loaders)
69-
;; We see it used in te following chapters
68+
;; to be a copy of the python library langchain, and offers support or
69+
;; building blocks for several concepts arround LLMs (model, vectorstores, document loaders, etc.)
70+
;; We see it used in the following chapters
7071

7172
(import '[dev.langchain4j.model.openai OpenAiChatModel OpenAiChatModelName])
7273

projects/ml/llm/notebooks/rag.clj

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@
1212

1313

1414
;; # Simple RAG (Retrieval-Augmented Generation) System
15-
;; This is a Clojure / langchain4j adaption of
16-
;; https://github.com/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/simple_rag.ipynb
15+
;; This is a Clojure / langchain4j adaption of a
16+
;; (simple_rag)[https://github.com/NirDiamant/RAG_Techniques/blob/main/all_rag_techniques/simple_rag.ipynb]
1717

1818
;; ## Overview
1919
;; This code implements a basic Retrieval-Augmented Generation (RAG) system for processing and
@@ -61,12 +61,16 @@
6161
;;Flexibility: Easy to adjust parameters like chunk size and number of retrieved results.
6262

6363
;;## Conclusion
64-
;;This simple RAG system provides a solid foundation for building more complex information retrieval and question-answering systems. By encoding document content into a searchable vector store, it enables efficient retrieval of relevant information in response to queries. This approach is particularly useful for applications requiring quick access to specific information within
64+
;;This simple RAG system provides a solid foundation for building more complex information retrieval and question-answering systems.
65+
;;
66+
;;By encoding document content into a searchable vector store, it enables efficient retrieval of relevant information in response to queries.
67+
;;
68+
;;This approach is particularly useful for applications requiring quick access to specific information within
6569
;;large documents or document collections.
6670

6771
;; # Implementation
6872

69-
;; helper to replace abs by space
73+
;; A helper to replace tabs by space:
7074
(defn replace-t-with-space [list-of-documents]
7175
(map
7276
(fn [text-segment]
@@ -76,25 +80,26 @@
7680
list-of-documents))
7781

7882

79-
;; convert PDF to text document
83+
;; Convert PDF to text document:
8084
(def document (.parse (ApachePdfBoxDocumentParser.) (io/input-stream "Understanding_Climate_Change.pdf")))
8185

82-
;; split document into chunks of max 1000 chars and overlaping of 200
86+
;; Split document into chunks of max 1000 chars and overlaping of 200:
8387
(def texts
8488
(.split
8589
(DocumentSplitters/recursive 1000 200)
8690
document))
87-
;; clean textx
91+
;; Clean texts:
8892
(def cleaned-texts (replace-t-with-space texts))
8993

94+
;; Create embedding for clean texts:
9095
(def embedding-model (AllMiniLmL6V2EmbeddingModel.))
9196
(def embedding-store (InMemoryEmbeddingStore.))
9297

93-
;; create embedding for clean texts
98+
9499
(def embeddings
95100
(.embedAll embedding-model cleaned-texts))
96101

97-
;; add embeddings to vector store
102+
;; Add all embeddings to vector store:
98103
(run!
99104
(fn [ [text-segment embedding]]
100105
(.add embedding-store embedding text-segment))
@@ -103,15 +108,15 @@
103108
cleaned-texts
104109
(.content embeddings)))
105110

106-
;; encode retriever
111+
;; Encode the retriever text:
107112
(def retriever
108113
(.content (.embed embedding-model
109114
"What is the main cause of climate change?")))
110115

111-
;; find top 5 relevant texts
116+
;; Find top 5 relevant texts:
112117
(def relevant (.findRelevant embedding-store retriever 5))
113118

114-
;; put 5 results in table
119+
;; Put 5 results in table:
115120
(tc/dataset
116121
(map
117122
(fn [a-relevant]

projects/ml/llm/notebooks/vectorstore.clj

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@
77
[dev.langchain4j.store.embedding.inmemory InMemoryEmbeddingStore]))
88

99
;; # Use a vectorstore from langchain4j
10-
;; In thios example we will create embeddings for some
11-
;; phansaty food items, and find teh closest one to a query.
10+
;; In this example we will create embeddings for some
11+
;; fantasy food items, and find the closest one to a query sentence.
1212

13-
;; Create the data, so a list of 1000 food descriptions
13+
;; Firt wereate the data, so a list of 1000 food descriptions
1414

1515

1616
(def food-items
@@ -47,7 +47,7 @@
4747
"that brings comfort and joy" "which excites the palate"
4848
"that enchants every taste bud" "which is pure indulgence"])
4949

50-
;; Generate 1000 unique descriptions as a dataset
50+
;; Generate 1000 unique descriptions as a dataset:
5151
(def food-descriptions
5252
(tc/dataset {:food-description
5353
(->>
@@ -64,24 +64,24 @@
6464
(take 1000))}))
6565

6666
;; Now we create the embedding store, which is able to calculate vector distances
67-
;; (fast)
67+
;; (fast).
6868
(def embedding-store (InMemoryEmbeddingStore.))
69-
;; Create an instance of the embedding model, which can calculate an emebdiing for a piece of text
69+
;; Create an instance of the embedding model, which can calculate an emebedding for a piece of text.
7070
(def embedding-model (AllMiniLmL6V2EmbeddingModel.))
7171

72-
;; And we embbed all food description
72+
;; And we embbed all food descriptions:
7373
(run!
7474
#(let [segment (TextSegment/from %)
7575
embedding (.content (.embed embedding-model %))]
7676
(.add embedding-store embedding segment))
7777
(:food-description food-descriptions))
7878

7979

80-
;; Embed the query text
80+
;; Now we embedd the query text:
8181
(def query-embedding (.content (.embed embedding-model "Which spicy food can you offer ?")))
8282

83-
;; Find the 5 most relevant embedding which are sematically the closest to the query.
84-
;; Its using a certain vector distance (cosine) between the embedding vectors of query and texts)
83+
;; And finally we find the 5 most relevant embedding which are sematically the closest to the query.
84+
;; It's using a certain vector distance (cosine) between the embedding vectors of query and texts.
8585
(def relevant (.findRelevant embedding-store query-embedding 5))
8686

8787
(tc/dataset

0 commit comments

Comments
 (0)