Skip to content

A search engine that finds individuals based on descriptive text. Using MongoDB, Docker, and bag of words, it enables users to input attributes and retrieve matching profiles through advanced text analysis

License

Notifications You must be signed in to change notification settings

moraxh/ProfileMatcher-A-Semantic-Search-Engine

Repository files navigation

ProfileMatcher: A semantic search engine

How it works?

The program works by running a Docker container in Node.js. Node.js is responsible for adding, deleting, and updating the person descriptions, while Python handles finding the best match using bag of words and TF-IDF. Two servers are launched: one for Python and another for Node.js

Preview

Preview Preview

Workflow

Search Request

graph LR;
    id1(Search Request)-->id2(Nodejs Server)-->id3(Python Server)-->id4(Search Function)-->id5(Return Response);
Loading

Process Text

graph LR;
  id1(txt to lower)-->id2(Remove special chars)-->id3(Remove stop words)-->id4(Spell check)-->id5(Lemmatize)-->id6(Replace synonyms)-->id7(Stem);
Loading

How to use?

The project was developed using Docker To build the project, you need to run the following commands:

Build & Run the project:

npm run build

Run the project:

npm run start

Down the project:

npm run down

Limitations

For lemmatization, we used a not-public API getted by web scraping from a Spanish linguistic page, this is the API. The main problem is that the API is limited to n requests hourly, this can be solved by using a proxy, however in this project it was not implemeted.

About

A search engine that finds individuals based on descriptive text. Using MongoDB, Docker, and bag of words, it enables users to input attributes and retrieve matching profiles through advanced text analysis

Resources

License

Stars

Watchers

Forks