A full text search engine implemented as part of HUJI's Web Information Retrieval course.
The engine currently supports a specific dataset - Amazon product review data taken from here, using
a line-oriented data format (see the .txt files under datasets for an example)
The main classes of this library are:
-
webdata.IndexWriter, for constructing the index given a dataset file -
webdata.IndexReaderfor querying the index -
webdata.ReviewSearchfor performing various text search operations
-
Click here for an explanation and visualization of the index structure, as well as theoretical runtime analysis of index operations.
-
Click here for various benchmarks of index construction and querying.
-
Click here for an explanation of a custom product ranking function I've implemented for product search.
Most of the classes and methods were also documented, see below on how to create javadocs.
Requires Java 11+ and Maven.
-
Type
mvn packageto compile, test and package this library, and generate docs.The resulting jars will be located attarget.Documentation can be found at
target/apidocs/index.html(Skip testing by adding
-Dmaven.test.skip=true)