Here’s an interesting experiment. Go to any Bitcoin block explorer, and look at some blocks. I chose the recent block numbers (or heights) 686710 through 686719. Next, look at their hashes. These are 256 bit numbers, usually expressed as length 64 hexadecimal strings and, for the heights I selected, they are:
000000000000000000044e63ddeb30d097e0b3ccd9cca2069137cb7297e424bd 0000000000000000000230e3d3dea9fbbb65723ca07a33c1b9b3089c994983d9 0000000000000000000074c2db5661b85341d10e106e2e773549a38e1ec4d16c 0000000000000000000688941d5031bd7167880b62b333f274e61cc8e381a028 00000000000000000004aa907cf6a1e9d44e60ec4638242be5d767a0df07f772 0000000000000000000cbf470e7497e224c598609d72c7921991aec121433883 0000000000000000000b1a18922652ebc989f899b59b050dd0a9f6a312fa5998 0000000000000000000655ec73770127f0d28937af9d6413237f46592a215340 0000000000000000000a6e8c366aa51b534ce5ab399e9e8c72df290873bb9f76 000000000000000000001ded7e91b5d29a08e8166044ef3fc06a0944e9877a95 
One thing immediately stands out. All of the hashes start with a lot of zeros. Statistically, this should be virtually impossible unless the blocks have been chosen specifically to have such unlikely hash values. Interpreting the 256 bit strings as binary integers and dividing through by 2^{256} gives a number between 0 and 1 (which I will denote by p). For the hash of an arbitrary block of data, the p values will be uniformly distributed on the unit interval but, for the blocks above, they are:
height  p 
686710  3.56e24 
686711  1.81e24 
686712  3.77e25 
686713  5.40e24 
686714  3.86e24 
686715  1.06e23 
686716  9.18e24 
686717  5.24e24 
686718  8.63e24 
686719  9.67e26 
All of these values are miniscule, being less than about 10^{23}. By chance, this can only happen with probability 10^{23} independently for each block. In fact, due to the pseudorandom properties of the SHA256 hash function, there is really only one way that this can happen. Many many different blocks must have been produced each time — of the order of 10^{23} of them — with only those having small hash values being transmitted to the network and accepted as valid.
This is, of course, the proof of work (POW) protocol, or consensus mechanism. It ensures that miners who create new Bitcoin blocks have to perform significant work, regulating the rate of block production and, vitally, ensuring immutability of the blockchain. Once a block has been added to the chain, and is already underneath several others (i.e., has several confirmations), then it is virtually impossible for it to be altered. This would require redoing all of the proof of work for the block in question and for all blocks above it, and doing this quickly enough to catch up with the main chain.
The Protocol
To describe how a proof of work blockchain works, I update figure 1 from the post on block hashes, by adding an additional field to each block (or, block header). This is the ‘nonce’ (number used only once), which is just a data field that does not contribute anything other than affecting the block hash. Hence, each block will contain the previous block hash, the nonce, and any additional data (including transaction data or hash).
Next, the protocol defines a target for each block which, in the case of 256 bit hashes as used by Bitcoin, will also be a 256 bit number. From now on, I assume 256 bit hashes and targets, although this could potentially differ for different blockchains.
A block is only considered to be valid if its hash is less than or equal to the target.
h_{n} ≤ target. 
When a miner creates a new block, then they need to ensure that this is satisfied in order for it to be accepted by the network. The only practical way of achieving this is to compute the hash and, if it is above the target, modify the block and try again. This is where the nonce is used. The miner can keep changing the nonce and recalculating the hash until, eventually, it is less than or equal to the target. As the hashes will be independent and uniformly distributed, then they each have probability
p* = 2^{256}(target+1) 
of satisfying the threshold. The total number of hashes required to produce a valid block is random, but exponentially distributed with mean
M = 
1
p*

= 
2^{256}
target+1

. 
Multiplying this by the hash rate that the miner is using gives the expected time for him to produce a valid block and, multiplying by the total hash rate of the entire network of miners gives the expected time for a new block to be produced.
The actual value of the target can vary from one block to the next and, for Bitcoin, uses a difficulty adjustment (that I describe below) to try and keep the block production rate at about one every 10 minutes on average.
The difficulty is defined to be inversely proportional to the target and, for Bitcoin, is usually set as
difficulty = 2^{32}M = 
2^{224}
target+1

. 
For example, blocks 686710 through 686719 that I looked at above have a difficulty of 21,047,730,572,451.56 according to the block explorer. This gives the expected number of hashes required to produce a block to be about M = 9.04e+22 and the corresponding probability per hash is p* = 1.11e23. This explains why all the p values in the table above were below this threshold.
Note 1: Bitcoin block headers have a nonce field (also referred to as the counter) that is 32 bits long, so there are 2^{32} possible values that it can take. This is much less than the expected number of hashes required to generate a block which, as we saw above for the example blocks, is about 10^{23} ≈ 2^{76}. The Bitcoin counter is too small to have even a reasonable chance of generating a block so, to increase the effective size of the nonce, mining algorithms often also make use of the dummy input script field for the first (coinbase) transaction of the block as an ‘extraNonce’.
Note 2: The hash function used for the proof of work algorithm does not have to be the same as that used for linking blocks together in the blockchain. Litecoin, for example, uses a double SHA256 for the previous block hash, but the scrypt hashing function for its proof of work. This is why, unlike with Bitcoin, its block hashes do not start with lots of zeros.
Longest Chain Wins
The explanation above shows how the protocol requires a proof of work in order for blocks to be considered as valid. However, what happens if a blockchain validator (node) receives two valid blockchains? It needs some way of comparing them in order to decide which one should be accepted as the true blockchain. The basic idea is that we should accept the longest chain, the one containing the most blocks. This is to ensure immutability, so that it is very difficult for an attacker to modify any blocks that are already accepted.
Consider the blockchain shown in figure 1 above, and suppose that a rogue miner modifies the second block to remove or change a transaction inside it. This changes its hash, so that the proof of work needs to be redone. As the hash has changed, it is no longer referred to by the third block, so his modified blockchain is only of length two and will not be accepted by the network. He needs to create another three blocks himself, performing the proof of work for each of them, to have any chance of his modified chain being accepted. Even if he is able to do this then, by that time, the public blockchain will likely have had new blocks added to make it longer than 5 blocks. So, the rogue miner needs to work faster than all the other miners combined, in order for his modified chain to catch up with the public one and be accepted in its place. This is the essence of the proof of work method and, so long as the network has a very large combined hashrate, makes it virtually impossible for anyone to alter any of the blocks other than, possibly, the ones near the very end of the chain which do not have many confirmations (i.e., not many blocks on top of them).
While measuring the longest chain by simply counting the number of blocks does seem reasonable, and is how Bitcoin worked when it was originally created, it does have issues. This is due to the difficulty adjustments which I will explain further below. If the rogue miner is able to construct his blocks in such a way that the protocol gives a lower difficulty level than for the public chain, it is possible that he could produce blocks faster than all the other miners combined, even if he has a lower hash rate. It is theoretically possible, then, that his private blockchain eventually has more blocks than the public one, so would be accepted by the validators. However, the rogue miner’s chain will still have a lower total difficulty, or chainwork.
The chainwork is the expected number of hashes that would be required to recreate the entire blockchain. Writing M_{n} and target_{n} for the expected number of hashes and target for the nth block, the chainwork is computed by summing over all blocks in the chain,
This is proportional to the total difficulty given by summing the difficulty of each block,
Then, the network validators will accept the valid blockchain with the greatest chainwork as the true public blockchain. This ensures that, regardless of difficulty levels, the rogue miner will have to generate more hashes, on average, than all other miners on the network combined. Due the infeasibility of this, the use of the chain with the largest chainwork ensures immutability of the blockchain.
Difficulty Adjustments
The proof of work method is explained above for given difficulty, or, target values. However, this is an incomplete description without also explaining how the protocol defines the difficulty. In Bitcoin, at least, every block header contains the following information.
 The hash target, stored in a compact floating point format.
 A timestamp recording when the block was created, stored as Unix time.
The idea is that the hash target is adjusted to keep the average rate of block production at a steady predefined value. Clearly, the miner does not get to choose its value, and is set according to clearly defined rules. For Bitcoin, this causes a difficulty adjustment every 2 weeks to keep the block production rate at one every 10 minutes on average. The idea is straightforward. Take the ratio between the recent average time of block production and the target production time, and use this to scale the difficulty. I give a more precise description of the difficulty calculation for Bitcoin further below.
The timestamp field does deserve further consideration. When validating a block, it is impossible for a node to know exactly when it was created, so cannot validate that the timestamp correctly records the production time of the block. Therefore, this cannot be enforced by the protocol. Instead, Bitcoin only requires the following for the block to be valid.
 It must be greater than the median timestamp of the previous 11 blocks.
 It must be less than the networkadjusted time plus two hours. Here, ‘network adjusted time’ is the median time of all other nodes connected to the one doing this validation.
Requiring timestamps to increase seems like a necessary condition but, instead, they are only compared to a median of previous timestamps. This is in case a block is produced with a timestamp too far in the future (i.e., 2 hours in the future) meaning that other miners would initially not be able to add a new block on top if strictly increasing timestamps were enforced.
The second condition is interesting. All other requirements for a Bitcoin blockchain to be valid are internal. That is, the validity of a chain depends deterministically on its data, and not on who is doing the validation or any other external factors. Here, however, it depends on the network adjusted time computed by the validator. Technically, it is possible for two nodes to disagree about the validity of a blockchain or, for a single validator to disagree with themselves if they redo the validation. However, the requirements are such that this is a relatively mild externality. Once a node has validated a blockchain then, as time increases, it should remain valid and, shortly, all other validators should also consider it valid. There are still potential issues regarding this part of the protocol though. See this blog post from 2011; Timejacking & Bitcoin.
To complete this explanation of the difficulty calculation, here is a more precise description of how it is done by Bitcoin.
 The first, or genesis, block has a difficulty of 1.
 Otherwise, if the block height (or, number) is not a multiple of 2016 then the difficulty is equal to that of the previous block.
 If the block height n is a multiple of 2016, which happens about once every 2 weeks, then a difficulty adjustment is applied. To do this, the average time to produce each of the previous 2015 blocks is computed (this should really be 2016, but Bitcoin has a small bug),
average = Timestamp of B_{n1} – Timestamp of B_{n2016}2016. As block production should target one every 10 minutes, a correction factor is computed,
correction = 10 minutesaverage. A factor greater than 1 means that blocks are being produced too quickly and, less than one means that they are too slow. To avoid sudden big changes in difficulty, this is capped and floored to be between 1/4 and 4. Finally, the difficulty of the block is set to the difficulty of the previous block multiplied by the correction factor.
The mentioned bug creates two idiosyncrasies in the Bitcoin blockchain,
 The production time for the first block after a difficulty adjustment (i.e., whose height is a multiple of 2016) does not contribute to any future difficulty calculations.
 Bitcoin actually targets a block production rate of 10 minutes multiplied by 2016/2015, or 10 minutes and 0.3 seconds.