ETL Movie Ratings

About the Project

This project was completed using SQL, SQLAlchemy, Postgres, and Python. The test dataset is a freely available dataset from https://github.com/prust/wikipedia-movie-data

Installation

Clone the git repo from

https://github.com/roeggealissa/Movies_ETL.git

Enter your postgres password in config.py

db_password = 'YOUR PASSWORD HERE'

Usage

This is a basic demonstration of utilizing SQL and Python to extract, transform, and load data, as well as data quality assurance. The data used in this is focused on movies however the concept can be applied to situations where ETL is applicable. The dataset used in the test case contains a large amount of natural language so regex is used to standardize the data

Above is an example of the regex used to clean up the column "release date". The regex used will depend on the specific files that are to extracted, transformed, and loaded. Each ETL_clean_().ipynb can be used as a basis to understand what sort of data should undergo a transformation and what data can be left as is.

Roadmap

Extract data to ipython notebook
Validate data
- Apply regex to columns with strings
Create database with SQLAlchemy
Upload database to Postgres
- Check database in Postgres

Contact

Alissa Roegge - [email protected]

https://github.com/roeggealissa/Movies_ETL

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
.DS_Store		.DS_Store
ETL_clean_kaggle_data.ipynb		ETL_clean_kaggle_data.ipynb
ETL_clean_wiki_movies.ipynb		ETL_clean_wiki_movies.ipynb
ETL_create_database.ipynb		ETL_create_database.ipynb
ETL_function_test.ipynb		ETL_function_test.ipynb
ETL_regex.png		ETL_regex.png
Movies demo.ipynb		Movies demo.ipynb
README.md		README.md
config.py		config.py
movies_query.png		movies_query.png
ratings_query.png		ratings_query.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL Movie Ratings

About the Project

Installation

Usage

Roadmap

Contact

About

Releases

Packages

Languages

roeggealissa/Movies_ETL

Folders and files

Latest commit

History

Repository files navigation

ETL Movie Ratings

About the Project

Installation

Usage

Roadmap

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages