Skip to content

DivineSamOfficial/InsightFlow-Secure-Data-Transformation-Visualization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

InsightFlow: Secure Data Transformation & Visualization Project

Overview

InsightFlow is a robust ETL (Extract, Transform, Load) pipeline that ingests user data from the RandomUser API, applies advanced data transformation and security practices, and stores the data securely in MongoDB. The project also includes data visualization using Tableau, turning raw data into meaningful insights while maintaining high standards of data privacy and security.

Table of Contents

  1. Project Architecture
  2. Features
  3. Technologies Used
  4. Data Flow
  5. Data Security & Masking
  6. Scheduling & Automation

Project Architecture

Architecture Diagram

The project architecture ensures modularity and data security at every stage, from ingestion to final visualization.

Features

  • Data Ingestion:

    • Fetches user data from the RandomUser API.
    • Utilizes MAGE AI for efficient data loading.
  • Data Transformation:

    • Block 1: Removal of nulls and duplicate records.
    • Block 2: Data type transformations and adjustments.
    • Block 3: Data masking of sensitive information, including phone numbers, emails, and addresses, to protect user privacy.
    • Block 4: Encryption of user passwords and usernames, with templates provided for future decryption.
    • Block 5: Securely loads the transformed data into MongoDB.
  • Data Visualization:

    • Tool: Tableau.
    • Connection: MongoDB BI Connector.
    • Outcome: Visualized insights from the transformed user data.

Technologies Used

  • MAGE AI for data ingestion, transformation, scheduling, and automation.
  • Python for scripting and data processing.
  • MongoDB for data storage.
  • MongoDB BI Connector for connecting MongoDB to Tableau.
  • Tableau for data visualization.

Data Flow

[RandomUser API]
       |
       v
[MAGE AI: Ingestion & Transformation]
       |
       v
[Local MongoDB]
       |
       v
[Tableau: Visualization]

Data Security & Masking

Data Masking is a critical aspect of this project, ensuring that sensitive user information is protected throughout the ETL process. Here's how it's handled:

  • Phone Numbers & Emails:

    • Masked to obscure personal details while maintaining data integrity.
    • Example: +91 99XXXXXXXX for phone numbers, d*********[email protected] for emails.
  • Addresses:

    • Partial masking of street information to prevent full disclosure of residential addresses.
  • Encryption:

    • User passwords and usernames are encrypted using secure algorithms, ensuring they remain protected at rest.
    • Decryption templates are provided for future use cases where access to original data might be necessary.

This approach ensures compliance with data privacy regulations while allowing meaningful analysis and visualization.

Scheduling & Automation

The ETL pipeline is scheduled to run daily using MAGE AI's built-in scheduling and triggering features. This automation ensures that the data is always up-to-date, with transformations and load operations executed at regular intervals without manual intervention.

About

An ETL pipeline that ingests user data from the RandomUser API, transforms it in Mage AI, and visualises it using Tableau.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published