Skip to content

Latest commit

 

History

History
170 lines (129 loc) · 6.75 KB

File metadata and controls

170 lines (129 loc) · 6.75 KB

Welcome to My AWS Data Pipeline and Analytics Project!

> Hey There!, I am Shubham Dalvi


「 I am a data engineer with a passion for big data, distributed computing, cloud solutions, and data visualization 」

Typing SVG

yourprofile


About Me

Coding gif

✌️   Enjoy solving data problems

❤️   Passionate about big data technologies, cloud platforms, and data visualizations

📧   Reach me: [email protected]


AWS to snowflake

Skills and Technologies

Python PySpark Pandas Matplotlib Jupyter AWS Git VSCode


Project Overview

This project showcases the implementation of an AWS-based ETL pipeline for extracting, transforming, and analyzing data using modern cloud tools. By leveraging AWS Lambda, AWS S3, AWS Glue, and Snowflake, the pipeline provides an efficient and scalable solution for data processing and analytics.

The architecture is designed to handle raw data ingestion, schema inference, transformation, and storage while enabling advanced analytics through platforms like Power BI and Amazon Athena.

Table of Contents

Technologies Used

  • AWS S3: For storing raw and processed data.
  • AWS Glue: For orchestrating ETL workflows and schema inference.
  • AWS Lambda: For event-driven processing and transformations.
  • Amazon Athena: For querying processed data on demand.
  • Snowflake: As the data warehouse for structured storage and analytics.
  • Power BI: For interactive visualizations and business intelligence.
  • Python: For custom processing and scripting.

Skills Demonstrated

  • Cloud Integration: Leveraging AWS services to build scalable data solutions.
  • ETL Automation: Automating data ingestion and transformation workflows using AWS Glue and Lambda.
  • Data Engineering: Implementing data pipelines for structured and unstructured datasets.
  • Data Visualization: Creating dashboards and insights using Power BI.
  • Schema Management: Using Glue Data Catalog and Crawlers for schema inference.
  • Analytics Enablement: Querying datasets through Amazon Athena and Snowflake.

AWS Architecture

The architecture consists of the following components:

  1. Extract:

    • Spotify API/Source: Raw data fetched using Python scripts.
    • AWS S3 (Raw): Stores the raw data in a dedicated S3 bucket.
    • AWS Lambda: Automates data ingestion and triggers subsequent processes.
    • AWS CloudWatch/EventBridge: Monitors and triggers ETL workflows.
  2. Transform:

    • AWS S3 (Transformed): Stores intermediate and transformed data.
    • AWS Glue: Performs schema inference and data cataloging.
  3. Load:

    • AWS Glue Catalog: Maintains metadata and schema for querying.
    • Snowflake: Acts as the central data warehouse for structured storage and analysis.
  4. Analytics:

    • Power BI: Enables interactive dashboards and reporting.
    • Amazon Athena: Provides on-demand SQL querying of transformed data.

Data Flow

  1. Data Ingestion:

    • Raw data is fetched and uploaded to an AWS S3 bucket.
    • AWS Lambda triggers preprocessing workflows.
  2. Transformation and Schema Management:

    • AWS Glue performs schema inference and data cleaning.
    • Transformed data is stored back into S3.
  3. Storage and Analytics:

    • Final data is loaded into Snowflake for analysis.
    • Power BI is used to create insights and visualizations.

Usage Instructions

  1. Set up AWS resources:

    • Create S3 buckets for raw and processed data.
    • Configure AWS Glue jobs and Crawlers.
    • Deploy Lambda functions for data ingestion.
  2. Integrate with Snowflake:

    • Configure Snowflake as the data warehouse.
    • Set up ETL workflows to load data into Snowflake.
  3. Visualize data:

    • Use Power BI to connect to Snowflake or Athena.
    • Build interactive dashboards for analysis.
  4. Monitor pipeline:

    • Use AWS CloudWatch/EventBridge to monitor pipeline activity and performance.

Lambda functions

  1. Spotify-API-Data-Extract <br

image

  1. Spotify-API-Data-Transformation

image

S3 Buckets


image
image
image

Snowflake Data warehouse


image


Feel free to contribute or reach out if you have any suggestions or improvements!