Skip to content

This project demonstrates the application of real-world data wrangling techniques using Python. The goal was to analyze the relationship between global temperature anomalies and CO₂ emissions over time.

Notifications You must be signed in to change notification settings

IAmJuniorB/D497-Data-Wrangling-Final

Repository files navigation

Real-World Data Wrangling with Python

Project Overview

This project demonstrates the application of real-world data wrangling techniques using Python. The goal was to analyze the relationship between global temperature anomalies and CO₂ emissions over time. By gathering, assessing, cleaning, and combining datasets from different sources, we aimed to uncover trends and correlations that provide insights into climate change.


Problem Statement

The research question for this project is:

"How do global temperature anomalies relate to CO₂ emissions over time?"

To answer this question:

  • Dataset 1: Global temperature anomalies were programmatically downloaded from NASA's GISTEMP dataset.
  • Dataset 2: CO₂ emissions data were manually downloaded from Our World in Data.

The analysis involved cleaning and combining these datasets to explore trends in global warming and greenhouse gas emissions.


Datasets

Dataset 1: Global Temperature Anomalies

  • Type: CSV File
  • Source: NASA GISTEMP
  • Method: Programmatically downloaded.
  • Variables:
    • Year: The year of the observation.
    • Temperature Anomaly: The deviation in global temperature from the baseline average (°C).
    • Monthly temperature anomalies (Jan, Feb, etc.).

Dataset 2: CO₂ Emissions by Country

  • Type: CSV File
  • Source: Our World in Data
  • Method: Manually downloaded.
  • Variables:
    • Year: The year of the observation.
    • Entity: The country or region.
    • Annual CO₂ emissions: Total CO₂ emissions in metric tons.

Steps in the Project

1. Gather Data

Two datasets were gathered using different methods:

  1. NASA's GISTEMP dataset was programmatically downloaded using Python.
  2. Our World in Data's CO₂ emissions dataset was manually downloaded as a CSV file.

Both datasets were loaded into a Jupyter Notebook for analysis.


2. Assess Data

The datasets were assessed visually (df.head()) and programmatically (df.info(), df.isnull().sum()), revealing:

  • Quality Issues:
    1. Invalid placeholder values (***) in the temperature dataset.
    2. Missing values in the Code column of the CO₂ dataset.
  • Tidiness Issues:
    1. Wide format in the temperature dataset (monthly columns needed reshaping).
    2. Redundant summary columns (J-D, D-N, etc.) in the temperature dataset.

3. Clean Data

The identified issues were cleaned as follows:

  1. Replaced invalid placeholder values (***) with NaN in the temperature dataset.
  2. Dropped the unnecessary Code column from the CO₂ dataset.
  3. Reshaped the temperature dataset from wide to long format, creating a single column for months.
  4. Removed redundant summary columns (J-D, D-N, etc.) from the temperature dataset.

The datasets were then combined on the common column Year.


4. Update Data Store

Both raw and cleaned datasets were saved locally:

  • Raw datasets:
    • raw_temperature_data.csv
    • raw_co2_emissions.csv
  • Cleaned datasets:
    • cleaned_temperature_data.csv
    • cleaned_co2_emissions.csv
    • Combined cleaned dataset: combined_cleaned_data.csv

5. Answer Research Question

Two visualizations were created to answer the research question:

Visualization 1: Global Temperature Anomalies Over Time

Visualization 1
Insight: This line plot shows a clear upward trend in global temperature anomalies since the mid-20th century, indicating rising global temperatures due to human activities.

Visualization 2: Global CO₂ Emissions Over Time

Visualization 2
Insight: This line plot demonstrates a steady increase in global CO₂ emissions since the Industrial Revolution, aligning with observed rises in global temperatures.

Optional Visualization: Correlation Between Temperature Anomalies and CO₂ Emissions

Visualization Optional
Insight: The scatter plot reveals a strong positive correlation between rising temperatures and increasing CO₂ emissions, suggesting that greenhouse gas emissions are a key driver of global warming.


Reflection

If given more time, I would:

  1. Investigate additional greenhouse gases (e.g., methane or nitrous oxide) to understand their impact on climate change.
  2. Explore regional trends in temperature anomalies and emissions to identify specific countries contributing most to global warming.
  3. Incorporate other factors like deforestation or renewable energy adoption rates for a more comprehensive analysis.

How to Run This Project

  1. Clone this repository or download it as a .zip file.
  2. Extract all files into a working directory.
  3. Open the Jupyter Notebook (data_wrangling_project_three.ipynb) using JupyterLab or Jupyter Notebook.
  4. Run all cells sequentially to reproduce the results.

Files Included

  • data_wrangling_project_three.ipynb: The main Jupyter Notebook containing all code and analysis.
  • Raw Datasets:
    • raw_temperature_data.csv
    • raw_co2_emissions.csv
  • Cleaned Datasets:
    • cleaned_temperature_data.csv
    • cleaned_co2_emissions.csv
    • combined_cleaned_data.csv

Technologies Used

  • Python
  • Pandas
  • Matplotlib
  • Jupyter Notebook

Feel free to reach out if you have any questions about this project!


Let me know if you’d like any additional sections or modifications! 😊

About

This project demonstrates the application of real-world data wrangling techniques using Python. The goal was to analyze the relationship between global temperature anomalies and CO₂ emissions over time.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published