Sentimental Analysis done on the reviews of various properties listed on AirBNB in Boston.
Data collected from Data World.
https://data.world/taise/boston-reviews-2019/workspace/file?filename=reviews%202020.csv
First the data was imported into R.
This is our original Dataset.
There are 4993 unique properties listed in the dataset. We arranged them by the number of reviews for each property and chose to do the analysis on the one which had the most number of reviews.
So, the property with listing id = 66288 is the one with maximum reviews (622), and so we try to analyse the sentiment of people about this property.
We will extract the comments section because we need only the text (Reviews). First we clean our data and create a final corpus where the data does not have uppercase letters, punctuations, stopwords, and white spaces.
Our final corpus looks like this.
Now we create a term document matrix.
And then we plot this matrix to see a general view of our data.
So we can see some positive words in bold, and so it seems like the sentiment might be positive.
So we see that some words like "Sean", "Apartment", "Place" and "Night" appear many times, but they don't add any sentiment. So we'll remove these words from our term document matrix and create a new one.
Next we'll create a Word Cloud of our final Term Document Matrix, just to get a visual idea of the sentiments in our data.
Now we'll create a NRC Emotion Lexicon and find out the sentiment score.
So, the first 10 comments have positive score, which means the sentiment is positive in these. Let's see the overall sentiment of all reviews.
As we can see, the score is positive and a large value at that, which means that the overall sentiment of all the reviews for this property is largely positive. We'll plot it and see the result of our analysis.
So we can see that the scores for most of the negative sentiments like anger, disgust, and sadness are pretty low. Whereas most of the positive sentiments like joy and trust have high scores.
So we found out that the sentiment for the property with listing_id = '66288' is Largely Positive.
Now we'll do the same analysis for 10 more properties, (the top 10 with most number of reviews) and visualize our results in Tableau.
Tableau Visulization of properties with most reviews (Top 10)