This is a project in CPSC5330 Big Data Analytics of Seattle University Author: Hao Li
This program employs Hadoop MapReduce to generate a tfidf score for each terms of the target documents.
Then the system prompts users for input and uses the data collected from the tfidf form to get the top 5 most related documents against the query.
For now, the HiveQL language needs to be modified to generate more accurate result.