The Cost of Popularity: Analyzing Tourism’s Effect on Medellín’s Housing Market (2011–2021)

This repository contains data engineering and data science projects and exercises using open data sources as part of the Methods of Advanced Data Engineering (MADE) course, taught by the FAU Chair for Open-Source Software (OSS) in the Winter 24/25 semester. This repo has been forked from the made-template repository repository.

Image credit: www.medellinadvisors.com

Overview

Tourism is a vital driver of economic activity, with significant implications for local housing markets. Medellín, Colombia, has seen a remarkable rise in tourism over the past decade, understanding the relationship between tourism influx and housing market dynamics is essential.This study investigates how tourism, characterized by both domestic and inbound travel, impacts housing sales and rental prices. By analyzing data from 2011 to 2021, this research aims to uncover trends and correlations that provide insight into the economic and social effects of tourism on Medellín’s housing sector.

Key Questions

To what extent do domestic and inbound tourism trends correlate with fluctuations in housing marketprices in Medellín?
What is the impact of tourism (domestic and inbound) on housing sales and rental prices?

Data Sources

Dataset	Description	License
Tourism Data	Monthly passenger arrivals in Medellín, segmented by nationality and origin.	CC BY-SA 4.0
Housing Market Data	Monthly sales and rental offers, including property type, condition, price, and geospatial details.	CC BY-SA 4.0

For a more detailed explanation of the sources used, including metadata and corresponding links, please refer to the Data Sources section in the project plan. This document provides additional insights into the datasets, their scope, and how they were utilized for the analysis.

Acknowledgment

This project uses data sourced from the following organizations:

MEData
Data used under the CC BY-SA 4.0 license.
Medellín Real Estate Observatory (OIME)
Data used under the CC BY-SA 4.0 license.

Changes and adaptations have been made to the original data for use in this project. The derived works are also licensed under the CC BY-SA 4.0 license.

Modifications to Original Data

The following changes were made to the original datasets:

Merged datasets from multiple years (2007–2022).
Truncated data for the period 2011–2021.
Renamed columns for consistency.
Performed data cleaning, including standardizing formats, filling missing values where possible, and filtering data by property type (Apartment and House).

Note:

This work is purely non-commercial and is used for only a semester project at FAU to implement the ETL pipeline and provide valuable insights.

Highlights

Comprehensive ETL (Extract, Transform, Load) Data Pipeline:
- Compiled and cleaned data from various sources into a unified SQLite database (Housing_Tourism_Data.sqlite), containing detailed information on tourism and housing metrics.
- Pipeline Code - Code for data extraction, transformation, and loading.
- Data Report - Detailed description of the data cleaning and preparation process.
In-Depth Analysis:
- Conducted exploratory time series, correlation analysis, and Granger causality tests.
- Visualized trends and relationships through charts, scatter plots, and lagged correlations.
- Analysis Code - Jupyter Notebook containing the analysis process and code.
Key Findings:
- Moderate correlations between foreign tourist arrivals and sales prices, with lagged effects suggesting delayed impacts.
- Limited evidence of causality between tourism and housing prices.
- Analysis Report - Summary of analysis, results, limitations, and conclusions of the study.

File Structure

MADE_Project_23272477/
├── /data                                  # Processed data directory
│   └── Housing_Tourism_Data.sqlite        # Unified SQLite database
├── /project                              
│   ├── pipeline.py                        # Data pipeline script (ETL)
│   └── data-report.pdf                    # Data sources, ETL process, pipeline.py output
│   └── analysis-code.py                   # Analysis and visualization code
│   └── analysis-report.pdf                # Analysis, visualizations and insights
│   └── tests.py                           # Executes data pipeline and validates output
├── LICENSE.md                             # Project license details
└── README.md                              # Project documentation

Run Pipeline Locally

Before you begin, make sure you have Python3 installed and set up VSCode with the Jupyter extension.

Clone the project

  git clone https://github.com/mateorual/MADE_Project_23272477.git

Go to the project directory

  cd MADE_Project_23272477

Installing Dependencies

  pip install -r requirements.txt

Run the bash script project/pipeline.sh

  bash project/pipeline.sh

This will start a virtual environment and finally create a SQL database out of data sources named Housing_Tourism_Data.sqlite in the \data directory.

Running Tests

To run tests, run the following command

  bash project/tests.sh

Author

💻 Mateo Ruiz Alvarez ⭐ contact me: [email protected]

[Optional] 🌟 Give a Star! 😄

If you like this project, please ⭐ it to show your support!
Give the Jayvee project a GitHub star. ⭐

Exercises

During the semester you will need to complete exercises using Jayvee. You must place your submission in the exercises folder in your repository and name them according to their number from one to five: exercise<number from 1-5>.jv.

In regular intervals, exercises will be given as homework to complete during the semester. Details and deadlines will be discussed in the lecture, also see the course schedule.

Exercise Feedback

We provide automated exercise feedback using a GitHub action (that is defined in .github/workflows/exercise-feedback.yml).

To view your exercise feedback, navigate to Actions → Exercise Feedback in your repository.

The exercise feedback is executed whenever you make a change in files in the exercise folder and push your local changes to the repository on GitHub. To see the feedback, open the latest GitHub Action run, open the exercise-feedback job and Exercise Feedback step. You should see command line output that contains output like this:

Found exercises/exercise1.jv, executing model...
Found output file airports.sqlite, grading...
Grading Exercise 1
	Overall points 17 of 17
	---
	By category:
		Shape: 4 of 4
		Types: 13 of 13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Cost of Popularity: Analyzing Tourism’s Effect on Medellín’s Housing Market (2011–2021)

Overview

Key Questions

Data Sources

Acknowledgment

Modifications to Original Data

Note:

Highlights

File Structure

Run Pipeline Locally

Running Tests

Author

[Optional] 🌟 Give a Star! 😄

Exercises

Exercise Feedback

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.github/workflows		.github/workflows
data		data
examples		examples
exercises		exercises
project		project
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cover_image.jpg		cover_image.jpg
requirements.txt		requirements.txt

License

mateorual/MADE_Project_23272477

Folders and files

Latest commit

History

Repository files navigation

The Cost of Popularity: Analyzing Tourism’s Effect on Medellín’s Housing Market (2011–2021)

Overview

Key Questions

Data Sources

Acknowledgment

Modifications to Original Data

Note:

Highlights

File Structure

Run Pipeline Locally

Running Tests

Author

[Optional] 🌟 Give a Star! 😄

Exercises

Exercise Feedback

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages