GitHub - umangmehta12/netFlix_PrizeDataset: Exploring the NetFlix Prize Dataset as final project for course CS6240 on Parallel Data Processing in MapReduce

#Exploring NetFlix Prize Dataset Our objective is to perform three main analysis in this project.

Identify the top 5 movies of each year

a. Analysis task: For the years ranging from 1890 to 2005, we aim to identify the top 5 movies of a year based on the average rating of each movie. This would be an interesting task as we hope to see which were the best movies released during different years. Years closer to the 19th century, with more movie releases, would reveal some interesting competition while movies in the earlier 18th century would reveal classics that are still considered worthy by an audience that rated them from1998 to 2005.

b. Main task: Using Hive, we join the data set, perform the analysis task and compare the performance in plain Java MapReduce

c. Helper task: Build HBase tables to perform Hive operation
Create a recommendation system

a. Analysis task: We aim to create a recommendation system based on clusters of movies. We would cluster movies based on the average rating. We believe this is interesting as we are basically creating a recommendation system. The audience would have the opportunity to select movies to view based on different ratings. Some of us do enjoy watching a “1 star” movie just to critique it.

b. Main task: Using K-mean clustering we aim to create multiple clusters concurrently in a MapReduce program
Gauge audience response in release year

a. Analysis task: For the release year of the movie, we would gauge the audience response received for the movie irrespective the rating received. This would show the “opening strength” of a movie and is irrespective of how users reviewed it over the years.

b. Main task: Comparative performance analysis of HBase and PigLatin

c. Helper task: Design HBase tables and MapReduce program for the same

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Task 1		Task 1
Task 2		Task 2
Task 3		Task 3
weka_implementation		weka_implementation
Final Presentation.pptx		Final Presentation.pptx
Project Final Report.pdf		Project Final Report.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

umangmehta12/netFlix_PrizeDataset

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages