Skip to content

Natural Language Processing (NLP) on Lufthansa Tweets to measure user satisfaction

Notifications You must be signed in to change notification settings

ummtushar/NLP-Tweets

 
 

Repository files navigation

DBL Data Challenge

About the project

Welcome to the public repository for the DBL Data Challenge project! This README file serves as a guide to help you understand our project, its purpose, and how it can provide valuable insights into airline data. Our team, Group 16, was responsible for analyzing data from the social media platform "Twitter" for our client, "Lufthansa." We compared the Twitter data of Lufthansa with their competitor, "American Airlines." The goal of our analysis was to gain insights into customer sentiments, preferences, and overall performance of the two airlines on social media This project was developed as part of the Data Science program at the Technical University of Eindhoven.

Built with

  • Python
  • MongoDB

Prerequisites

Before you proceed, ensure that you have the following installed in your local machine:

  1. Git: a version control system for tracking changes in computer files and coordinating work on those files among multiple people.
  2. Python: a popular programming language. This project is built with Python, ensure you have version 3.x installed.
  3. pip: a package installer for Python. You can usually install it alongside Python.

Getting started

To get started with our project, we have provided an installation guide that outlines the required dependencies and steps for setting up the environment. Following these instructions can ensure a smooth setup process and avoid any potential compatibility issues.

  1. Clone the repository on your machine using git clone command. Use git pull to fetch the latest updates.
  2. Set up the local environment using a venv module python3 -m venv env. Use the following prompt to activate it .\env\Scripts\activate for windows or source env/bin/activate for macOS.
  3. Download the airline files that have been filtered on MongoDB (No code for this. Queries that were used in the MongoDB Compass are given)

Roadmap of python files for the sprints

Sprint 1 - extraction and cleaning of data using MongoDB/MySQLite and fundamental analysis

* MongoDB Queries - One csv file for each Airline  
* DATA CLEANING.py
* JSON load.py
* Plots-Extras.py
* extra_task.py
* data_cleaning.py
* json_load.py

Sprint 2 - the refinement of data and basic sentiment analysis

* Download the airline files before running these tasks
* cleaning-csv.py (Airline Cleaning and Prep for conversation Extraction)
* conversation-extract.py (Use path of each airline file thats been cleaned and and use corresponding
airline id in the functions)
* MeanSentiment.py
* TextBlob-testing.py
* Vader-testing.py
* extra 1 pres 2.py
* Response Time Sentiment ExtraSprint2.py
* sentiment analysis.py
* statistics_convo.py

Sprint 3 - sentiment analysis of conversations

* extras 1 pres 2.py
* sentiment analysis on conversations_S3_t1.py
* sprint 3_task2.py

Sprint 4 - deep sentiment analysis of conversations

* one sided convo extra.py
* reply words polina demo.py
* sentiment flight related tweets.py
* sentiment over reply count.py
* First Response Sentiment by Client.py (Preps conversation file for response time vs sentiment graphs)
* Sent vs Response Time Lufthansa First Reply.py
* Covid Times, response and sentiment.py

About

Natural Language Processing (NLP) on Lufthansa Tweets to measure user satisfaction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%