Skip to content

Using Kafka and Spark to realize real-time Bitcoin data streaming and analysis.


Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



7 Commits

Repository files navigation

Real-time Data-streaming & Analysis Project

A project is underway to implement Kafka-Python, a client designed for the Apache Kafka distributed stream processing system, to stream BitCoin price and to use Apache Spark for real-time price analytics.

Install Spark

  1. Go to official website Spark and and extract it in a new empty folderC:\Spark.

  2. Create a foder C:\Hadoop and copy the bin file with your specific Hadoop version from this repository.

  3. Download Java if not yet installed.

  4. Set SPARK_HOMEasC:\Spark\spark-3.5.0-bin-hadoop3, HadoopasC:\Hadoop , JAVA_HOMEasC:\Program Files\Java\jre-1.8, and SPARK_LOCAL_IPas127.0.0.1 for system variables.

  5. Set paths in system environment: %SPARK_HOME%\bin and %HADOOP_HOME%\bin.

  6. Go to Command Prompt Window in administrator mode and execute C:\Spark\spark-3.5.1-bin-hadoop3\bin\spark-shell command.

  7. If you see Spark logo appears, then you successfully installed it.

Test Spark

  1. Open a web browser and navigate to http://localhost:4040/. An Apache Spark shell Web UI will show up.
  2. To test Spark on command prompt, execute:
    • val data = List("Test")
    • var t = sc.parallelize(data): will return t: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:24.
    • t.collect(): will return res0: Array[String] = Array(Test).

Install Kafka

  1. Go to Kafka and click on Binary downloads.
  2. Extract the zipped file into a new created empty folder C:\Kafka.
  3. Go to config folder and choose:
    • file and change dataDir to dataDir=C:/Kafka/kafka_2.13-3.7.0/zookeeper-data.
    • file and change log.dirs to log.dirs=C:/Kafka/kafka_2.13-3.7.0/kafka-logs .

Use Kafka

Open three seperate Command Prompt, all under C:\Kafka\kafka_2.13-3.7.0 folder.

  1. Execute Zoo-keeper .\bin\windows\zookeeper-server-start.bat .\config\
  2. After Zoo-keeper is completely executed, execute Kafka Server .\bin\windows\kafka-server-start.bat .\config\
  3. Execute .\bin\windows\kafka-topics.bat --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic bitcoin-prices to create a topic. It will return Created topic bitcoin-prices message.

Setup Python

  1. Use Anaconda to create a new environment for this project: conda create -n kafka-project python=3.9.
  2. After installing all the dependencies in requirement.txt, add a environment variable PYSPARK_PYTHON with your desired Python path to your system environment.

Run Code

  1. Run Spark (see 6. in Install Spark section).
  2. Run Kafka Server (see 2. in Use Kafka section)
  3. Execute file and it will scrape and stream Bitcoin Price in real-time.
  4. Execute file and it will receive the data and do data processing.

It will show like the following in the console:

|              window|       mean_price|
|{2024-03-22 01:47...|65759.99907678064|


Using Kafka and Spark to realize real-time Bitcoin data streaming and analysis.







No releases published


No packages published
