DataStreamX

This project showcases a robust real-time data processing pipeline leveraging a combination of modern technologies for efficient data ingestion, streaming, processing, and storage. The pipeline begins with data ingestion through an API, where the incoming data is stored in a PostgreSQL database. Apache Airflow orchestrates the entire workflow, managing the streaming of data from the API to Apache Kafka. Kafka serves as the central hub for streaming data, distributing it to various consumers, with Zookeeper handling Kafka’s configuration management and broker coordination. Once the data is streamed through Kafka, it is processed in real-time using Apache Spark, which operates across multiple worker nodes to ensure distributed and scalable data processing. The processed data is then stored in Cassandra, a distributed NoSQL database, which allows for high availability and easy access to the processed data for further analysis. The pipeline also includes a Control Centre for monitoring and managing Kafka clusters, along with a Schema Registry to manage the data schema across Kafka topics, ensuring data compatibility throughout the pipeline. The entire architecture is containerized using Docker, which simplifies deployment and scaling, making the system both flexible and efficient in handling large-scale, real-time data processing tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dags		dags
script		script
README.md		README.md
docker-compose.yml		docker-compose.yml
log4j.properties		log4j.properties
requirements.txt		requirements.txt
spark_stream.py		spark_stream.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DataStreamX

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Gagan-KM/DataStreamX-Real-time-data-streaming-powered-by-Airflow-Kafka-and-Spark.

Folders and files

Latest commit

History

Repository files navigation

DataStreamX

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages