Skip to content
This repository was archived by the owner on Sep 29, 2025. It is now read-only.

Use the HadoopCryptoLedger UDF to ease processing of Bitcoin specific data in Hive

Jörn Franke edited this page Sep 21, 2017 · 4 revisions

We assume you have followed the instructions on Using Hive to analyze Bitcoin Blockchain data and deployed the Hive UDF.

Example SQL Commands using the HadoopCryptoLedger UDF

We assumed that you created in the database 'blockchains' the table 'BitcoinBlockChain' using the Hive Serde and make the UDF as temporary functions available. If you make them as permanent functions available you need to add the prefix "hcl." to the function

The following example display all outputs of transactions to determine their destinations (depending on the amount of available Blockchain data this might return a long list!)

SELECT hclBitcoinScriptPattern(expout.txoutscript) FROM (select * from BitcoinBlockchain LATERAL VIEW explode(transactions) exploded_transactions as exptran) transaction_table LATERAL VIEW explode (exptran.listofoutputs) exploded_outputs as expout;

Please see here what are the outputs of the function.

The following example displays the transaction hash of a given transaction (the so-called txid). You can use the transaction hashes to determine which output has been used as input in another transaction and ultimately to build a transaction graph to trace the route of Bitcoins in the blockchain

SELECT hclBitcoinTransactionHash(transactions[0]) FROM BitcoinBlockChain LIMIT 1;

This transaction hash can be joined with the field prevTransactionHash within the input of a BitcoinTransaction to build the transaction graph describing transactions from one Bitcoin address to another. For full graph-analytics we recommend to use a dedicated graph solution as described in this example. If you want to search for the hash in popular Bitcoin explorers (e.g. https://blockchain.info), you need to reverse the order of bytes in the array (see examples in the Hive examples of this library).

The following example displays the transaction hash including Segwit data (the so-called wtxid):

SELECT hclBitcoinTransactionHashSegwit(transactions[0]) FROM BitcoinBlockChain LIMIT 1;

Find more examples here: https://github.com/ZuInnoTe/hadoopcryptoledger/blob/master/examples/hive-bitcoin/hive-bitcoin-udf.sql

More Information

Some standard Script patterns to evaluate the output script of a Bitcoin transaction: https://en.bitcoin.it/wiki/Script#Scripts

Understanding the destination of Bitcoin addresses and why they cannot be compared with transactions occuring in the current financial system: https://en.bitcoin.it/wiki/From_address

I will write soon more about this in a blog on analysing the Bitcoin Ecosystem using the HadoopCryptoLedger library.

Understanding the structure of Bitcoin data:

Blocks: https://en.bitcoin.it/wiki/Block

Transactions: https://en.bitcoin.it/wiki/Transactions

Clone this wiki locally