This repository contains the code for my project, where I predict the listing prices of Airbnb properties in London by leveraging both textual and tabular data. The goal of this project was to develop a robust machine learning pipeline capable of accurately estimating prices using advanced models and a stacked ensemble approach. The data was sourced from a kaggle competition.
The project follows a structured approach:
-
Exploratory Data Analysis (EDA):
A thorough examination of the dataset to identify features with strong predictive power and understand patterns in the data. -
Data Cleaning and Preprocessing:
Includes handling missing values, encoding categorical variables, and preparing text and tabular features for modeling. -
Model Building and Evaluation:
Various models were tested and evaluated to identify the best-performing ones:For Tabular Data:
- Linear Regression
- Random Forest
- XGBoost
- Deep Neural Network
For Text Data:
- LSTM
- GRU
- Bidirectional GRU
- BERT
-
Stacked Ensemble Model:
The final solution employs a stacked ensemble consisting of:- XGBoost for tabular data.
- BERT for textual data.
- A DNN to merge the predictions from XGBoost and BERT into a final output.