Project: Data Engineering Capstone Project

Overview

The purpose of the data engineering capstone project is to give you a chance to combine what you've learned throughout the program. This project will be an important part of your portfolio that will help you achieve your data engineering-related career goals.

In this project, you can choose to complete the project provided for you, or define the scope and data for a project of your own design. Either way, you'll be expected to go through the same steps outlined below.

Udacity Provided Project

In the Udacity provided project, you'll work with four datasets to complete the project. The main dataset will include data on immigration to the United States, and supplementary datasets will include data on airport codes, U.S. city demographics, and temperature data. You're also welcome to enrich the project with additional data if you'd like to set your project apart.

Running this code

Prerequisites

Python 3.6 and above
GIT setup and configured for SSH
Docker (If running locally)

Running locally

Clone repository by running git clone [email protected]:seetdev/dend-capstone.git
Go into the cloned folder
Create folders for model_data, raw_sas_data, sas_data and staging_data
Setup the docker image by running docker build --tag udacity-dend/pyspark-notebook .
Start docker container by runnin docker run --rm -d -p 8888:8888 -e JUPYTER_ENABLE_LAB=yes -v $PWD:/home/jovyan/work --name spark udacity-dend/pyspark-notebook

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
Capstone Project Template.ipynb		Capstone Project Template.ipynb
Dockerfile		Dockerfile
I94_SAS_Labels_Description.csv		I94_SAS_Labels_Description.csv
I94_SAS_Labels_Descriptions.SAS		I94_SAS_Labels_Descriptions.SAS
LICENSE		LICENSE
README.md		README.md
airline.csv		airline.csv
capstone-dend.drawio		capstone-dend.drawio
capstone-dend.png		capstone-dend.png
country_code.csv		country_code.csv
immigration_data_sample.csv		immigration_data_sample.csv
non_immigrant_class_of_admission.csv		non_immigrant_class_of_admission.csv
ports.csv		ports.csv
us-cities-demographics.csv		us-cities-demographics.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project: Data Engineering Capstone Project

Overview

Udacity Provided Project

Running this code

Prerequisites

Running locally

About

Releases

Packages

Languages

License

bochap-udacity/dend-capstone

Folders and files

Latest commit

History

Repository files navigation

Project: Data Engineering Capstone Project

Overview

Udacity Provided Project

Running this code

Prerequisites

Running locally

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages