Skip to content
This repository was archived by the owner on Sep 29, 2025. It is now read-only.

Using Hive to analyze Ethereum Blockchain data

Jörn Franke edited this page Nov 12, 2017 · 3 revisions

This is a Hive SQL application demonstrating some of the capabilities of the hadoopcryptoledger library. It enables transparent access to Ethereum Blockchain data via Hive SQL. It can be linked with any other table in any other format in Hive, such as ORC or Parquet. It has successfully been tested with the Cloudera Quickstart VM 5.5 and HDP 2.5, but other Hadoop distributions should work equally well.

Getting blockchain data

See here how to fetch Ethereum blockchain data. After it has been copied you are ready to use the example.

Deploying the Hive Serde

Instructions to deploy the Hive Serde for the Ethereum Blockchain.

Example SQL Commands

These are some examples. We assumed that you deployed the Hive Serde and created in the database 'blockchains' the table 'EthereumBlockChain':

CREATE EXTERNAL TABLE EthereumBlockchain ROW FORMAT SERDE 'org.zuinnote.hadoop.ethereum.hive.serde.EthereumBlockSerde' STORED AS INPUTFORMAT 'org.zuinnote.hadoop.ethereum.format.mapred.EthereumBlockFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.mapred.lib.NullOutputFormat' LOCATION '/user/ethereum/input' TBLPROPERTIES("hadoopcryptoledger.ethereumblockinputformat.useDirectBuffer"="false");

The following example counts the number of blocks:

select count(*) from EthereumBlockchain;

The following example counts the number of transactions:

select count(*) from EthereumBlockchain LATERAL VIEW explode(ethereumTransactions) exploded_transactions;

The following example calculates the total output of all Ethereum transactions:

select SUM(exptran.value) FROM (select * from EthereumBlockchain LATERAL VIEW explode(ethereumTransactions) exploded_transactions as exptran) ttable;  

Example SQL Commands using Ethereum UDFs

The following examples illustrate the Ethereum UDFs that enable you derive additional data from the EthereumBlockchain data (e.g. sendAddress or transaction hash) You need to deploy the Hive UDF beforehand.

The following example extracts the sendaddress (from) of an Ethereum transaction:

SELECT hex(hclEthereumGetSendAddress(ethereumTransactions[0])) FROM EthereumBlockchain LIMIT 1;

Example output:

39424BD28A2223DA3E14BF793CF7F8208EE9980A

You can search for this hash in Ethereum block explorers, such as https://etherscan.io/

This example calculates the transaction hash of an Ethereum transaction

SELECT hex(hclEthereumGetTransactionHash(ethereumTransactions[0])) FROM EthereumBlockchain LIMIT 1;  

Example output:

E27E9288E29CC8EB78F9F768D89BF1CD4B68B715A38B95D46D778618CB104D58

You can search for this hash in Ethereum block explorers, such as https://etherscan.io/

Find more examples here: https://github.com/ZuInnoTe/hadoopcryptoledger/tree/master/examples/hive-ethereum

and

https://github.com/ZuInnoTe/hadoopcryptoledger/tree/master/examples/hive-ethereum-udf

More Information

Understanding the structure of Ethereum data:

Ethereum Yellow paper: http://yellowpaper.io/

Clone this wiki locally