-
Notifications
You must be signed in to change notification settings - Fork 48
Use the HadoopCryptoLedger UDF to ease processing of Bitcoin specific data in Hive
We assume you have followed the instructions on Using Hive to analyze Bitcoin Blockchain data and deployed the Hive UDF.
We assumed that you created in the database 'blockchains' the table 'BitcoinBlockChain' using the Hive Serde and make the UDF as temporary functions available. If you make them as permanent functions available you need to add the prefix "hcl." to the function
The following example display all outputs of transactions to determine their destinations (depending on the amount of available Blockchain data this might return a long list!)
SELECT hclBitcoinScriptPattern(expout.txoutscript) FROM (select * from BitcoinBlockchain LATERAL VIEW explode(transactions) exploded_transactions as exptran) transaction_table LATERAL VIEW explode (exptran.listofoutputs) exploded_outputs as expout;
Please see here what are the outputs of the function.
The following example displays the transaction hash of a given transaction (the so-called txid). You can use the transaction hashes to determine which output has been used as input in another transaction and ultimately to build a transaction graph to trace the route of Bitcoins in the blockchain
SELECT hclBitcoinTransactionHash(transactions[0]) FROM BitcoinBlockChain LIMIT 1;
This transaction hash can be joined with the field prevTransactionHash within the input of a BitcoinTransaction to build the transaction graph describing transactions from one Bitcoin address to another. For full graph-analytics we recommend to use a dedicated graph solution as described in this example. If you want to search for the hash in popular Bitcoin explorers (e.g. https://blockchain.info), you need to reverse the order of bytes in the array (see examples in the Hive examples of this library).
The following example displays the transaction hash including Segwit data (the so-called wtxid):
SELECT hclBitcoinTransactionHashSegwit(transactions[0]) FROM BitcoinBlockChain LIMIT 1;
Find more examples here: https://github.com/ZuInnoTe/hadoopcryptoledger/blob/master/examples/hive-bitcoin/hive-bitcoin-udf.sql
Some standard Script patterns to evaluate the output script of a Bitcoin transaction: https://en.bitcoin.it/wiki/Script#Scripts
Understanding the destination of Bitcoin addresses and why they cannot be compared with transactions occuring in the current financial system: https://en.bitcoin.it/wiki/From_address
I will write soon more about this in a blog on analysing the Bitcoin Ecosystem using the HadoopCryptoLedger library.
Understanding the structure of Bitcoin data:
Blocks: https://en.bitcoin.it/wiki/Block
Transactions: https://en.bitcoin.it/wiki/Transactions