As I described previously, cryptographic hash functions have the ability to compress an arbitrarily large block of data into a fixed size. The result, or hash, then acts as a unique identifier for the original data. Merkle trees do all this and more. The idea is that, as with a hash function, a large block of data can be represented by a single fixed-size identifier, known as the Merkle root or hash root. This much is exactly the same as with a regular hash function. However, if the data is broken down into a collection of individual parts, then it is possible to verify each part individually. This is different from regular hash functions, where the entire block of data is required in order to recompute the hash and verify the data correctness. Some example uses of Merkle trees are:
- Transmission of large collections of files or databases. Instead of downloading a large database over the internet, we can first download its Merkle root. Then, the individual entries of the database can be downloaded as needed, possibly from different sources, and separately verified to be correct.
- Efficient storage of large amounts of user permissions. Only the Merkle root needs to be stored then, when a user wants to verify his or her permissions, this can be done individually rather than requiring access to the entire database.
- Bitcoin stores the Merkle root of the transaction data in the block header. This can be used by thin clients to verify that individual transactions are included in a block without having to download and store the entire blockchain. Only the headers are required to be stored.
Suppose that we have an array of N data blocks. Then, the procedure is as follows:
- The Merkle root of the array of data is computed, which is a binary string of fixed length (256 bits is sufficient).
- For any data block in the array, we can transmit this along with a short piece of information known as the Merkle proof, to any other party in possession of the Merkle root. Using this, they are able to verify that the data block is correct and was indeed in the original data.
- The other party does not need to trust the person transmitting the data. They only need to be confident that they have the correct Merkle root. Then, the proof is sufficient for the verification, since it is impossible to create a valid proof for any data which was not in the original array.
- The size of the Merkle proof is of the order of the logarithm of N. This is bounded for practical purposes. For example, it is very unlikely that we would ever have arrays longer than 2100 so, effectively, the base-2 logarithm of N is bounded by 100.
The magic here is that the other party only needs to have the 256 bit Merkle root stored locally. Then, to prove that a block of data is in the array, we only have to transmit this block along with a short proof, regardless of the size of the original array of data.
Constructing the Merkle Tree
I now go into the mechanics of a Merkle tree. The first thing that we need is a cryptographic hash function, which maps any block of data to an essentially unique binary string of fixed size. For example, the SHA256 hash function can be used, which creates hashes of 256 bits (Bitcoin actually uses a double, or repeated, SHA256 hash). Throughout this post, I assume a hash function has been chosen, which will be denoted by H.
We build a Merkle tree from a sequence of blocks of data as follows. First, take the hash of each item. Then, arrange them as the leaves of a binary tree, where each internal node is assigned a value equal to the hash of its two children concatenated together. For a sequence of 16 data items, labelled t0 through t15, this is as in figure 1. Here, I have labelled each node by its hash value hb for a binary string b indicating the position of the node in the tree. So, for example, we have h1001 = H(t9) and h100 = H(h1000h1001). These hash values can all be computed by starting at the leaf nodes and, iteratively, working up the tree. Finally, the Merkle root is the hash value h at the very top.
