Skip to content

Commit 8a42598

Browse files
committed
fixed spelling
1 parent f1c8515 commit 8a42598

File tree

6 files changed

+49
-15
lines changed

6 files changed

+49
-15
lines changed

projects/ml/llm/.devcontainer/devcontainer.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@
1717
"vscode": {
1818
"extensions": [
1919
"betterthantomorrow.calva",
20-
"vscjava.vscode-java-pack"
20+
"vscjava.vscode-java-pack",
21+
"streetsidesoftware.code-spell-checker"
2122
]
2223
}
2324
}
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
^:kindly/hide-code
2+
(ns index)
3+
4+
;;# LLMs and Clojure
5+
6+
;;LLMs (Large Language Models) are a class of predictive models which can create "content"
7+
;;in various forms, original "text" content.
8+
;;
9+
;;They are ultimately based on the completion of "text" a user is giving them. The quality of this got so good lately,
10+
;;that we interpret this "text completions" as artificial intelligence, as they imitate with a very high quality what a human might generate.
11+
;;
12+
;;These models come "pre-trained", so they have learned a probability distribution of word sequences, which enables them to predict the next word
13+
;; base on any sequence of words.
14+
;;
15+
;; The inner working of LLMs have as well the concept of embeddings, which means to represent text as high dimensional vectors,
16+
;; where mathematical vector distance is correlated with semantic similarity.
17+
;;
18+
;; In their most popular form they are presented to users as "chat bots" with which a user can have a
19+
;; a coherent conversion with questions and answers.
20+
;;
21+
;; This being a "conversation" is an illusion from the technical level. The model itself is stateless, it uses previous parts of the conversation
22+
;; as input for its prediction, which creates the illusion of coherence.
23+
;;
24+
;; The following chapters show three examples for using LLMs from Clojure:
25+
;;
26+
;;- a simple chat completion
27+
;;- using a vector store and embeddings to perform a semantic search
28+
;;- show case a simple RAG (Retrieval-Augmented Generation) use case
29+

projects/ml/llm/notebooks/llms.clj

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,13 @@
33
[org.httpkit.client :as hk-client]
44
[cheshire.core :as json]))
55

6-
;;# using Large Language Models from Clojure
6+
;;# Using Large Language Models from Clojure
77
;;LLMs often come as APIs, as they require computing power (GPUs), which most users do not have
8-
;;localy.
8+
;;locally.
99
;;OpenAI offers their models behind an (paid) API for example. In the following we will see three
1010
;;different ways to use the GPT-4 model from OpenAI
1111

12-
;; Get the openai API key either from environemnt or a specific file
12+
;; Get the openai API key either from environment or a specific file
1313
(def open-ai-key
1414
(or (System/getenv "OPEN_AI_KEY")
1515
(slurp "open_ai_secret.txt")
@@ -20,7 +20,7 @@
2020
;## Use OpenAI API directly
2121
;; OpenAI offers a rather simple API, text-in text-out for "chatting" with GPT
2222
;;
23-
;; The following shows how to ask a simple question, and getting the answer using an http libray,
23+
;; The following shows how to ask a simple question, and getting the answer using an http library,
2424
;; [http-kit](https://github.com/http-kit/http-kit). The API is based on JSON, so easy to use
2525
;; from Clojure
2626

@@ -43,7 +43,7 @@
4343
; ## use Bosquet
4444
; [Bosquet](https://github.com/zmedelis/bosquet) abstracts some of the concepts of LLMs
4545
; on a higher level API. Its has further notions of "memory" and "tools"
46-
; and has other features we find for exampl in python "LangChain"
46+
; and has other features we find for example in python "LangChain"
4747

4848
;; Bosque wants the API key in a config file
4949
(spit "secrets.edn"
@@ -66,7 +66,7 @@
6666
;; We can use LLMs as well via a Java Interop and the library
6767
;; [lnagchain4j](https://github.com/langchain4j/langchain4j) which aims
6868
;; to be a copy of the python library langchain, and offers support or
69-
;; building blocks for several concepts arround LLMs (model, vectorstores, document loaders, etc.)
69+
;; building blocks for several concepts around LLMs (model, vector stores, document loaders, etc.)
7070
;; We see it used in the following chapters
7171

7272
(import '[dev.langchain4j.model.openai OpenAiChatModel OpenAiChatModelName])

projects/ml/llm/notebooks/rag.clj

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@
8383
;; Convert PDF to text document:
8484
(def document (.parse (ApachePdfBoxDocumentParser.) (io/input-stream "Understanding_Climate_Change.pdf")))
8585

86-
;; Split document into chunks of max 1000 chars and overlaping of 200:
86+
;; Split document into chunks of max 1000 chars and overlapping of 200:
8787
(def texts
8888
(.split
8989
(DocumentSplitters/recursive 1000 200)

projects/ml/llm/notebooks/render.clj

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@
44
(clay/make! {:format [:quarto :html]
55
:show false
66
:base-source-path "notebooks"
7-
:source-path ["llms.clj"
7+
:source-path ["index.clj"
8+
"llms.clj"
89
"vectorstore.clj"
910
"rag.clj"
1011
]

projects/ml/llm/notebooks/vectorstore.clj

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,12 @@
66
[dev.langchain4j.model.embedding.onnx.allminilml6v2 AllMiniLmL6V2EmbeddingModel]
77
[dev.langchain4j.store.embedding.inmemory InMemoryEmbeddingStore]))
88

9-
;; # Use a vectorstore from langchain4j
9+
;; # Use a vector store from langchain4j
1010
;; In this example we will create embeddings for some
1111
;; fantasy food items, and find the closest one to a query sentence.
1212

13-
;; Firt wereate the data, so a list of 1000 food descriptions
13+
;; ## Create dummy data
14+
;; First we create the data, so a list of 1000 food descriptions
1415

1516

1617
(def food-items
@@ -63,24 +64,26 @@
6364
shuffle
6465
(take 1000))}))
6566

67+
;; ## Add food description to vector store
6668
;; Now we create the embedding store, which is able to calculate vector distances
6769
;; (fast).
6870
(def embedding-store (InMemoryEmbeddingStore.))
69-
;; Create an instance of the embedding model, which can calculate an emebedding for a piece of text.
71+
;; Create an instance of the embedding model, which can calculate an embedding for a piece of text.
7072
(def embedding-model (AllMiniLmL6V2EmbeddingModel.))
7173

72-
;; And we embbed all food descriptions:
74+
;; And we embed all food descriptions:
7375
(run!
7476
#(let [segment (TextSegment/from %)
7577
embedding (.content (.embed embedding-model %))]
7678
(.add embedding-store embedding segment))
7779
(:food-description food-descriptions))
7880

7981

80-
;; Now we embedd the query text:
82+
;; Now we embed the query text:
8183
(def query-embedding (.content (.embed embedding-model "Which spicy food can you offer ?")))
8284

83-
;; And finally we find the 5 most relevant embedding which are sematically the closest to the query.
85+
;; ## Query vector store
86+
;; And finally we find the 5 most relevant embedding which are semantically the closest to the query.
8487
;; It's using a certain vector distance (cosine) between the embedding vectors of query and texts.
8588
(def relevant (.findRelevant embedding-store query-embedding 5))
8689

0 commit comments

Comments
 (0)