Skip to content

Commit 37ad3ab

Browse files
Update README.md
1 parent ead048e commit 37ad3ab

File tree

1 file changed

+61
-0
lines changed

1 file changed

+61
-0
lines changed

README.md

+61
Original file line numberDiff line numberDiff line change
@@ -1 +1,62 @@
11
# Document-similarity-using-doc2vec-and-gensim
2+
Python implementation of a document similarity checking using Doc2Vec.
3+
## File structure
4+
```bash
5+
Document-similarity-using-doc2vec-and-gensim/
6+
├── data/
7+
│ ├── 20news-bydate.tar.gz
8+
│ ├── 20news-bydate-test
9+
│ └── 20news-bydate-train
10+
├── models/
11+
│ ├── doc2vec_model.bin
12+
│ ├── doc2vec_model.model
13+
│ ├── doc2vec_vector.txt
14+
│ └── doc2vec_model.bin.dv.vectors.npy
15+
├── dataset_preprocess.py
16+
├── inference.py
17+
├── README.md
18+
├── requirements.txt
19+
└── train.py
20+
```
21+
- **data/train_data.txt**: Training data file
22+
- **models/doc2vec_model.bin**: Trained Doc2Vec model file
23+
- **models/doc2vec_model.bin.dv.vectors.npy**: Document vectors file for the trained model
24+
- **README.md**: Project documentation file
25+
- **requirements.txt**: Required Python packages
26+
- **inference.py**: Script to check similarity between two documents
27+
- **train.py**: Script to train the Doc2Vec model
28+
29+
30+
## Installation
31+
Install the dependencies using pip:
32+
```bash
33+
gensim==4.2.0
34+
nltk==3.5
35+
numpy==1.23.1
36+
numpy==1.23.2
37+
pandas==1.2.0
38+
scikit_learn==0.23.2
39+
```
40+
Install the required packages:
41+
42+
```bash
43+
pip install -r requirements.txt
44+
```
45+
46+
## Training the Doc2Vec model
47+
```bash
48+
python train.py
49+
```
50+
51+
## Inference
52+
Check the similarity between two documents
53+
```bash
54+
python inference.py
55+
```
56+
57+
<!--
58+
Test data 1: ['bird', 'is', 'beautiful']
59+
Test data 2: ['bird', 'is', 'beautiful']
60+
Cosine similarity: [[0.9203991]]
61+
-->
62+

0 commit comments

Comments
 (0)