Skip to content

Files

Latest commit

fefd5c5 · Nov 2, 2022

History

History

Music Rating

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Nov 2, 2022
Nov 2, 2022
Nov 2, 2022
Nov 2, 2022

(Partially Complete)

There are three main objectives that I want to achieve when completing this project. I want to create a model (machine learning or neural networks) to accurately predict Amazon music ratings based on user reviews. To choose the best model, I use the mean squared error (MSE) as the performance metric. For the next stage, I will implement three types of recommender systems (popularity model, item-based collaborative filtering, and matrix factorization).

For current work:

The most popular music category is “Pop” with about 50% of the total reviews. “Alternative Rock” is the second popular with 28.5%, while “Dance & Electronic” has only 6.3% which is the least popular among all.

I implemented word frequency (WF) and TF-IDF separately for all models. For the Multilayer perceptron model, I also tried using the word2vec approach, but the vectorizer’s output was not compatible with the model. As a result, all models performed better with the TF-IDF approach. This is due to the fact that WF only generates vectors containing the count of occurrences for each word, while TFIDF takes into account the potential importance or the lack of it for each word when vectorizing the texts. The multilayer perceptron model achieved the lowest MSE.

After comparing the MSE results of all the models, the models that performed best include the Light Gradient Boost machine model (0.587), the XGBoost model (0.5597), and the MLP model (0.4955). These three models gave good results and accuracies because they were suitable for our data, which was sparse with an imbalanced class distribution. For example, class 5 had more than 100,000 samples while class 1 only had less than 20,000 samples. In the Light GBM model, instead of using level-wise splitting methods, leaf-wise splitting was used to grow trees so that the loss could be minimized more quickly with the same number of leaves. MLPRegressor performed well because multilayer perceptron networks could better fit nonlinear data with higher accuracy than tree-based models. As a feedforward artificial neural network, MLP generalized better as it could outperform other models in extracting patterns and detecting trends from complex input data. With the help of adaptive learning rate and activation functions, MLP converged to the global minimum more quickly and consistently. The multilayer architecture also helped the model process large datasets.