Introduction

This code implements relatively fast cosine similarity computation and kNN classification for large matrices, so I never have to worry about it again. This outperforms GenSim's function for vector comparison. It makes clever use of einsums to speed up computation. It's also possible to batch the computations to save space. Vectors are classified using a kNN majority-vote approach. The main algorithm is implemented in src/knn.py.

Getting Started

You can test locally in a Docker container with an ES index, if you have credentials for an ElasticSearch cluster:

Port-forward the ElasticSearch cluster to port 9200.
Set the ElasticSearch environment variables in src/local.env: ES_USERNAME and ES_PASSWORD if the cluster requires authentication, and ES_INDEX with the name of the index that you want to process. It should contain documents with a field named full_text.
Run make run. This will train the model, and add a similar_docs field to the documents in the index. Note that this Make command limits the CPU and memory usage; you can adjust this with the variables set in the Makefile.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Getting Started

About

Releases

Packages

Languages

LuukSuurmeijer/kNN

Folders and files

Latest commit

History

Repository files navigation

Introduction

Getting Started

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages