The application of hash functions that I want to describe in this post, is as the glue used to hold together the blocks in a blockchain. Put another way, it is a hash function that puts the chain into blockchain.
A blockchain is just an ordered sequence of blocks of data, with new blocks constantly getting added. While it needs to be flexible enough that new blocks can be added, it also requires immutability, so that already existing blocks are never altered. This is achieved with the help of hash functions. The idea is that every block contains a field which is set to the hash of the previous block. The initial or genesis block is the only one without a previous block to refer to so, for that one only, its previous block hash field is set to zero. An example of a blockchain containing only 5 blocks is shown in figure 1 below.
The rectangles labelled Bn refer to the blocks, which contain data including one field taking the value h(Bn-1), which is the hash of Bn-1. Finally, we compute the hash of last block, h(B5), which serves as an identifier for the blockchain. In the case of Bitcoin, these are double (repeated) SHA256 hashes, and the blocks used here are actually the block headers. The header contains only a hash (specifically, the Merkle root) of the transactions instead of the transaction data itself, so is much smaller than the entire block. Otherwise, it makes no conceptual difference.
While the final block hash h(B5) does not enable us to reconstruct the blockchain, it does act as a unique identifier. Given a hash, then we can verify that it corresponds to the final block of a chain B1,B2,…,Bn by the following steps
- Check that B1 has previous block hash 0.
- For each i = 1,2,…n - 1, compute the hash h(Bi) and check that this equals the previous block hash stored in Bi+1.
- Compute the hash h(Bn) and check that this equals the given final block hash.
Assuming that no hash collisions occur (which we do assume), then this is the only blockchain with the given final block hash.
When creating a new block to add on to the end of the chain, it is straightforward to ensure that the hashes are all correct. We simply set the previous hash for the new block equal to the hash of the currently final block on the chain. No earlier blocks need to be modified, and we obtain a new blockchain longer by one.
Making changes to blocks which are not close to the end of the chain is difficult, as is adding or removing such blocks. This is one of the strengths of this method, as it helps ensure immutability. That is, once a block has been added, and sufficiently many others have been added on top, it should be there in perpetuity without ever changing. In practice, blocks very close to the end of the chain can change. Considering figure 1, suppose a miner adds a new block increasing the length of the chain to 6 but, shortly afterwards, another miner also adds an alternative 6th block. Which do we accept as correct? It is possible that nodes validating the blockchain could switch, and accept the new alternative block. This is a blockchain reorganisation, and results in the final block being changed. Transactions existing in the block submitted by the first miner may not be present in the one submitted by the second miner or, even worse, alternative transactions for the same coins may exist, but pay out to different addresses. This is fine, and is just the way that blockchains are supposed to work. It is why we do not treat bitcoin transactions as having confirmed just because we see them in a block. Instead, we wait for several confirmations to have occurred, so that there are already several blocks on top of it in the chain.
The method of previous block hashes does help ensure immutability of all blocks in the chain except, possibly, for the few near the end. Looking at figure 1 again, suppose that we make a change to the second block. This changes its hash so that the third block no longer refers to it and, hence, the previous block hash of B3 also needs to be changed. As this changes the hash of B3 the previous block hash of B4 needs to be updated and, similarly, the previous block hash of B5 also needs to be changed. The result of all this is that, if we try modifying a block in any way, then the previous block hashes of all blocks after it also need changing. However, the proof of work (or similar) protocol makes recreating blocks difficult. Even just changing the previous block hash field requires it to be re-done, so we need to redo the proof of work for all blocks in the chain after the one modified. This is practically impossible, so once a block is sufficiently far from the end of the chain, it is highly unlikely that it will ever be changed in the version of the blockchain accepted as valid.