The program works by running a Docker container in Node.js. Node.js is responsible for adding, deleting, and updating the person descriptions, while Python handles finding the best match using bag of words and TF-IDF. Two servers are launched: one for Python and another for Node.js
Search Request
graph LR;
id1(Search Request)-->id2(Nodejs Server)-->id3(Python Server)-->id4(Search Function)-->id5(Return Response);
Process Text
graph LR;
id1(txt to lower)-->id2(Remove special chars)-->id3(Remove stop words)-->id4(Spell check)-->id5(Lemmatize)-->id6(Replace synonyms)-->id7(Stem);
The project was developed using Docker To build the project, you need to run the following commands:
Build & Run the project:
npm run build
Run the project:
npm run start
Down the project:
npm run down
For lemmatization, we used a not-public API getted by web scraping from a Spanish linguistic page, this is the API. The main problem is that the API is limited to n requests hourly, this can be solved by using a proxy, however in this project it was not implemeted.