The objective of the project was to build a Search Engine from the first 300 pages of the Goodreads website.
The repository contains the following files, together with all the pickle files in which they are saved in our data structures and with the folders of the downloaded books:
-
DataCollection&DataStructure(Point1).ipynb:
> you can find the Point 1 (not run because of the time to download all the books) and there are the definitions of the structures we used from Point 2 and beyond such as inverted index, vocabularies, and so on.
-
> you can find the solutions from Point 2 of the assignment.
-
field_functions.py
:> contains the defintion of the functions and of the classes used in order in the notebooks.
-
Search_engines.py
:> contains the functions used to perform the Search Engines in Point 2 and 3.