Welcome to the public repository for the DBL Data Challenge project! This README file serves as a guide to help you understand our project, its purpose, and how it can provide valuable insights into airline data. Our team, Group 16, was responsible for analyzing data from the social media platform "Twitter" for our client, "Lufthansa." We compared the Twitter data of Lufthansa with their competitor, "American Airlines." The goal of our analysis was to gain insights into customer sentiments, preferences, and overall performance of the two airlines on social media This project was developed as part of the Data Science program at the Technical University of Eindhoven.
- Python
- MongoDB
Before you proceed, ensure that you have the following installed in your local machine:
- Git: a version control system for tracking changes in computer files and coordinating work on those files among multiple people.
- Python: a popular programming language. This project is built with Python, ensure you have version 3.x installed.
- pip: a package installer for Python. You can usually install it alongside Python.
To get started with our project, we have provided an installation guide that outlines the required dependencies and steps for setting up the environment. Following these instructions can ensure a smooth setup process and avoid any potential compatibility issues.
- Clone the repository on your machine using
git clone
command. Usegit pull
to fetch the latest updates. - Set up the local environment using a venv module
python3 -m venv env
. Use the following prompt to activate it.\env\Scripts\activate
for windows orsource env/bin/activate
for macOS. - Download the airline files that have been filtered on MongoDB (No code for this. Queries that were used in the MongoDB Compass are given)
* MongoDB Queries - One csv file for each Airline
* DATA CLEANING.py
* JSON load.py
* Plots-Extras.py
* extra_task.py
* data_cleaning.py
* json_load.py
* Download the airline files before running these tasks
* cleaning-csv.py (Airline Cleaning and Prep for conversation Extraction)
* conversation-extract.py (Use path of each airline file thats been cleaned and and use corresponding
airline id in the functions)
* MeanSentiment.py
* TextBlob-testing.py
* Vader-testing.py
* extra 1 pres 2.py
* Response Time Sentiment ExtraSprint2.py
* sentiment analysis.py
* statistics_convo.py
* extras 1 pres 2.py
* sentiment analysis on conversations_S3_t1.py
* sprint 3_task2.py
* one sided convo extra.py
* reply words polina demo.py
* sentiment flight related tweets.py
* sentiment over reply count.py
* First Response Sentiment by Client.py (Preps conversation file for response time vs sentiment graphs)
* Sent vs Response Time Lufthansa First Reply.py
* Covid Times, response and sentiment.py