You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 29, 2025. It is now read-only.
Jörn Franke edited this page Nov 12, 2017
·
2 revisions
You need to download and verify Ethereum blockchain data using a tool. Geth (use the most recent version!) has been used for this project. Once you have installed it then simply start it and let it download the whole blockchain (can take several hours). You need at least 40-50 GB free space (depending on the size of the blockchain more).
Once you have downloaded the blockchain, you will need to export it in binary RLP format. It is recommended not to export all blocks to a single files, but for performance reasons export the data to files containing each 200000 blocks (roughly 128M). By doing this you can process them in parallel on Hadoop, Flink, Hive and Spark.
You can export it by executing the following command after you have downloaded the full blockchain:
geth export eth0-200000.bin 0 200000
Afterwards you can export the following blocks analogous (200001 .. 400000 etc.).
You can put it on your HDFS cluster by executing the following commands:
hadoop fs -mkdir -p /user/ethereum/input
hadoop fs -put *.bin /user/ethereum/input
After it has been copied you are ready to analyze it.