GitHub - agredyaev/de-project-final

Data Loading Pipeline

This project provides a data loading pipeline that allows you to load data from a PostgreSQL database to a Vertica database. It includes classes for loading data into the staging layer and populating the common data marts layer. The pipeline utilizes SQL files for executing the necessary queries.

The project is designed to run in the cloud, where the entire environment is set up and configured.

Technologies

Project Overview

The data engineering project consists of the following components:

Staging Layer: The staging layer is responsible for extracting data from the source systems and performing initial transformations. It includes tasks for loading transaction data and currency data from the source PostgreSQL database into the staging layer.
Common Data Mart (CDM): The common data mart is a centralized repository for storing pre-aggregated and transformed data. It includes tasks for loading global metrics data into the CDM layer.
Data Pipeline: The data pipeline is implemented using Apache Airflow, an open-source platform for orchestrating workflows. The DAGs (Directed Acyclic Graphs) in Airflow define the sequence of tasks and dependencies for data extraction, transformation, and loading.
SQL Templates: SQL templates are used for generating dynamic SQL queries. These templates can be customized to include specific date ranges, filters, and transformations as per the project requirements.
Data Storage: The data is stored in the Vertica database, which provides a scalable and high-performance analytics platform for data analysis and reporting.

Project Structure

The project structure is organized as follows:

img/: Images for documentation and architecture.
src/: Contains the source code and modules for the project.
- src/sql/: Includes the SQL templates for data extraction and transformation.
- src/utils/: Contains utility functions and common modules used in the project.
- src/dags/: Includes the DAG (Directed Acyclic Graph) files that define the data pipelines.

Archetecture

Output mart

NEYBYANDEXRU__DWH.global_metrics

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
img		img
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Loading Pipeline

Technologies

Project Overview

Project Structure

Archetecture

Output mart

Dashboard

About

Releases

Packages

Languages

agredyaev/de-project-final

Folders and files

Latest commit

History

Repository files navigation

Data Loading Pipeline

Technologies

Project Overview

Project Structure

Archetecture

Output mart

Dashboard

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages