You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository features the testing code (and probably final code) we used for extracting the embeddings out of video transcripts for [Bizarro-Devin](https://github.com/CodingTrain/Bizarro-Devin/). There are a few files in this repository, all having their own purpose
4
+
5
+
[embeddings-transformers.js](/embeddings-transformers.js) is the file that generates embeddings from transcripts in the `transcripts` directory
6
+
[semantic-retrieval.js](/semantic-retrieval.js) can be used for retrieving from the embeddings based on a query
7
+
[semantic-retrieval-benchmark.js](/semantic-retrieval-benchmark.js) is used for benchmarking the retrieval, during my own tests it was ~180ms / retrieval
8
+
9
+
## How to use
10
+
11
+
### Generating embeddings
12
+
13
+
1. Make sure you've installed all dependencies by running `npm install`
14
+
2. Create a directory called `transcripts` and insert all json transcript files in here. Each file being a transcript of a video. The transcript json should be in the following format:
15
+
16
+
```json
17
+
{
18
+
"text": "full transcript text",
19
+
"chunks": [
20
+
{
21
+
"timestamp": [0.48, 7.04],
22
+
"text": "..."
23
+
}
24
+
]
25
+
}
26
+
```
27
+
28
+
However, the chunks array is currently not used. So this can be left out. 3. Create a `embeddings` directory for the embeddings of each transcript to be written to 4. Run `node embeddings-transformers.js` to run the script that generates the embeddings.
29
+
All embeddings should now be in the embeddings folder, as well as an `embeddings.json` file being present in the current working directory. This `embeddings.json` file is the combination of all embeddings generated from the transcripts.
30
+
31
+
### Semantic retrieval from embeddings
32
+
33
+
1. Make sure you've installed all dependencies by running `npm install`
34
+
2. Make sure you have the embeddings you want to retrieve from in an `embeddings.json` file. This file is usually already generated if you've generated them using the previous [generating embeddings](#generating-embeddings) section.
35
+
3. Open up the `semantic-retrieval.js` file and edit your query on line `25`.
36
+
4. Save the file and run `node semantic-retrieval.js` to retrieve the top 5 results from the embeddings.
0 commit comments