The Lightning network is gaining in adoption, and is currently being used worldwide to enable fast and cheap bitcoin payments. I will describe some of the issues that led to its invention, and hopefully help to understand why it works in the way that it does.
Picture the scene. As the owner of a coffee shop, you make the decision to start accepting bitcoin payments. Your first customer, Alice, enters the shop and orders a coffee. She takes out her mobile phone, opens the app for her bitcoin wallet, and scans the QR code for your bitcoin address which you have displayed by the counter. All she needs to do now, is to send the correct payment. Easy!
Well…not quite so easy. To be certain that the transaction is secure, we should really wait for it to have the requisite six confirmations before giving Alice the coffee. That is, it should be added to the blockchain, and have a further five blocks added on top. As it takes an average of ten minutes for each block to be produced, we can expect to have to wait around one hour. This is rather a long time to keep people waiting.
What can be done? As it is just for a coffee — not a major payment — we might be ok with fewer confirmations. Maybe just one will be sufficient, meaning that we simply wait for the transaction to be included in a block. It still takes ten minutes though, and remember that this is just the average. The time taken for a block to be produced is random so, if we are unlucky, it could take considerably longer. An hour for a block is not unheard of.
We really want to be able to accept Alice’s payment within a few seconds. There is no way around it, we are going to have to accept zero-confirmation bitcoin transactions. Either Alice sends her transaction to the network, and we monitor Bitcoin nodes to check that it has been received, even if it has not made its way into the blockchain yet. Alternatively, Alice can send her transaction directly to us and we will make sure that it is transmitted to the network to be included in a block.
Accepting transactions with zero confirmations does leave us vulnerable to double spend attacks, which is one of the main things that the blockchain was invented to solve. This is where Alice, possibly with an accomplice, spends the same coins multiple times with different counterparties. Even if we were to do this, and risk losing payments to double spends, we still need to ensure that the bitcoin transaction fee is sufficiently large to be accepted by miners for inclusion in a block. So, Alice will be rather upset that this fee is several times larger than the cost of the coffee she is buying. More fundamentally, the Bitcoin blockchain simply does not have the capacity to include lots of such small every-day payments by millions of people around the world.
Summarizing the issues raised so far, the following three problems need to be solved in order for Bitcoin to be useful as a mechanism for small, quick and common payments such as buying a coffee.
- Speed. We should be able to accept payments in a matter of seconds.
- Cost. The fee for a transaction should be small. No more than a few cents at most.
- Capacity. The network should be able to handle millions of people around the world making such transactions regularly.
The final two points are directly related. It is because of the limited capacity of the blockchain that miners are selective about which transactions to include, only choosing those with the largest fees.
Being more precise, we can put some numbers to these issues. As discussed above, it takes on average one hour for a bitcoin transaction to achieve the required six confirmations, which is about a thousand times slower than ideal. At the time of writing, the 2nd October 2021, average transaction fees are currently about $2. They have been much higher than this at times of high activity and bitcoin prices, such as in April when they briefly peaked at over $60. In terms of capacity, the 1MB size limit allows for around two to three thousand transactions per block, averaging out to about four or five transactions per second. Compare this to Visa, which manages about an estimated 1,700 transactions per second.
The Lightning network uses zero-confirmation transactions to address all three of the problems above, without making any fundamental changes to base Bitcoin protocol. As this is not the only solution, I briefly mention some of the other approaches.
First, it would be possible to change the protocol and simply increase the maximum allowed block size above the 1MB limit. This would help solve the second and third issues above, and has been a long-standing and controversial argument in the Bitcoin community. Many argue that very large blocks would make it harder for people to run the nodes that we rely on to validate the blockchain state, increasing centralization and decreasing security. In 2017, Bitcoin Cash forked from the core Bitcoin protocol to create a new version with larger block sizes, although it has not achieved close to the level of adoption of the core blockchain. While there is no-doubt some flexibility to increase block sizes without having too much impact on decentralization and security, achieving thousands of transactions per second in this way is probably a bit much.
A similar idea is to increase the block frequency. While a block is added to the Bitcoin chain on average once every ten minutes, some blockchains are much faster. Ethereum has a block time of about 15 seconds, and some proof of stake chains are even faster. Solana, for example, has a block time under half a second. It is an ongoing debate as to whether such blockchains can achieve the same levels of security and decentralization as Bitcoin.
For proof of work blockchains, the general rule is that block production time should be considerably larger than the time taken for a block to propogate through the network of nodes. Otherwise, the network would not be able to reach a consistent state, causing many short-lived forks. Miners would have difficulty determining the correct leading block to build off, and would spend a lot of time producing blocks which are then orphaned (not included in the main chain). This would waste their work, reducing the security of the main chain.
For reasons including those described above, changes to the Bitcoin protocol to increase transaction rates are very rare, and relatively conservative. Really, the only example so far is the SegWit upgrade which, by separating out signature data from the transaction body, allowed a modest increase in the number of transactions per block. Solutions to the scalability problems listed above are left to layer 2 implementations, such as the Lightning network.
Finally, there is a rather obvious but unsatisfactory solution. This is for users to deposit their Bitcoin with certain trusted parties. These would be very much like banks, with which we hold checking or current accounts. In the scenario above, Alice would not send a bitcoin transaction to pay for her coffee. Instead, her bank would simply record the payment fom her account to ours. The respective banks would then settle at the end of each day, rather than sending transactions for each individual payment between account holders. This approach is unsatisfactory, since it throws away the benefits of Bitcoin as a decentralized peer-to-peer payment system without having to trust intermediaries and, instead, relies on centralized organizations.
Satoshi’s Snack Machine
The idea of using zero confirmations for fast bitcoin transactions was discussed as far back as 17 July 2010, by Satoshi Nakamoto himself. This appeared in the bitcointalk forum thread Bitcoin snack machine (fast transaction problem), in the context of purchasing from a vending machine. Here, we would not want to wait more than a few seconds for the machine to respond and dispense our choice of tasty snack. Quoting Satoshi directly:
I believe it’ll be possible for a payment processing company to provide as a service the rapid distribution of transactions with good-enough checking in something like 10 seconds or less.
The network nodes only accept the first version of a transaction they receive to incorporate into the block they’re trying to generate. When you broadcast a transaction, if someone else broadcasts a double-spend at the same time, it’s a race to propagate to the most nodes first. If one has a slight head start, it’ll geometrically spread through the network faster and get most of the nodes.
A rough back-of-the-envelope example:
1 0 4 1 16 4 64 16 80% 20%
So if a double-spend has to wait even a second, it has a huge disadvantage.
The payment processor has connections with many nodes. When it gets a transaction, it blasts it out, and at the same time monitors the network for double-spends. If it receives a double-spend on any of its many listening nodes, then it alerts that the transaction is bad. A double-spent transaction wouldn’t get very far without one of the listeners hearing it. The double-spender would have to wait until the listening phase is over, but by then, the payment processor’s broadcast has reached most nodes, or is so far ahead in propagating that the double-spender has no hope of grabbing a significant percentage of the remaining nodes.
This idea is not the approach currently used by the Lightning network, for reasons which I will get into in a moment. Thinking about how it would work, suppose that Alice uses bitcoin to purchase a snack. Then, we (the snack machine company) would not simply wait for her to send her transaction to the network. Instead, she would send it directly to us, and we would deal with submitting it. That way, we can send it to as many bitcoin nodes as possible, to maximise the chance that it will be seen by the next miner to create a block. Assuming that it is not possible to send it directly to every node, it is best to select them at random. That way, it would be difficult for an attacker to try and target the specific set of nodes that we sent the transaction to.
There is one more step before we dispense the snack. After a short delay of just a few seconds, we check to ensure that the transaction is been transmitted across the network, and has been received by a significant proportion of nodes. To do this, we monitor a large collection of bitcoin nodes. If any of them has received a double spend (i.e., a different transaction spending the same inputs as Alice’s) then this is evidence of fraud and the process is terminated without dispensing the goods. Otherwise, we check to see how many of the nodes have received Alice’s transaction. This will give us a good statistical estimate of the total proportion of nodes across the network that have received it. If it is a sufficiently large fraction, we dispense the snack and are done. The idea is that even if a double spend was attempted after this, it would not be able to reach sufficiently many nodes for the next miner to see and include it in a block. As a large majority of nodes would already have the original transaction, and would not accept a new one spending from the same inputs, it is much more likely that the miner would still include Alice’s original transaction.
The approach suggested here is not being used in either the Lightning network or any other fast transaction methods, for several reasons.
- The vending machine company is assuming that the transaction which has been propagated to the most nodes will be the one to be included in the blockchain. While we would usually expect this to be the case, it does go against the proof-of-work idea that the blocks are determined by the hashpower of miners, and not the number of nodes. By design, it is very difficult to attack the blockchain by gaining a majority of the hashpower (i.e., a 51% attack). However, it would be much easier to create a large number of nodes in order to facilitate a double spend of zero confirmation transactions.
- The approach above relies on the the node behaviour that once they have received Alice’s bitcoin transaction, they will not then replace it by a newer one spending the same inputs. If the newer one paid a larger fee then it would certainly be preferred by miners, and the replace by fee node policy specifically allows for this behaviour, with some constraints. For example, if a transaction had too small a fee, so was not being selected by miners for inclusion in blocks, it can be perfectly legitimate to try and replace this by a new transaction spending the same inputs, but with a larger fee.
- The method assumes that a transaction which has been transmitted to the network will soon be included in a block. If the total number of transactions in the mempool is small enough that they can all fit in a single block, then it is a safe assumption that a miner will include it. While this state of affairs may have been true in 2010, it is no longer the case.
- The idea attempts to speed up the time taken for a vendor to accept a bitcoin transaction, but it does nothing to either reduce the fees or address the limited capacity of the blockchain to handle such large numbers of small transactions.
Some supporters of Bitcoin forks which have increased block size limits, such as Bitcoin Cash, still advocate a solution along the lines described above. For the Bitcoin core blockchain, however, the obstacles mentioned above do seem to rule out this approach.
Let us go back to the situation where Alice is trying to pay for a coffee with bitcoin. An issue with accepting the transaction before it has any confirmations is that we would be vulnerable to double spends. There is one away around this. Alice could spend from a multisignature account, which requires both her and us, as the shop owner, to sign. Since we would not sign any fraudulent double spends, we are safe to accept this without any confirmations.
There is an obvious problem with using multisig accounts. Alice and the shop would need to cooperate beforehand, by sharing public keys, and Alice would need to pre-fund the account before it is available to spend. Also, we have replaced the single transaction required to pay for the coffee by two separate transactions — one to fund the account, and one to make the payment. If we do this cleverly, these problems can actually be a blessing. If the coffee shop owner just takes the payment transaction and saves it, without sending to the network, then the account can be reused as many times as Alice visits the shop. This means that the collection of all payments over an extended period of time can be collected together into just two transactions on the blockchain. One to fund the multisig account at the beginning, and one to close it out at the end.
Referring to the shop owner as ‘Bob’, the situation is shown in figure 1 below. At the start, Alice pays funds into the joint account. For the illustration here, I choose 100 ‘coins’ as a nice round number. These are just in whatever denomination makes sense. Clearly, not units of 1 bitcoin, which would be extremely expensive coffee at today’s prices! Units of 10,000 sats, or 0.0001 bitcoin makes sense. There is the problem that, once Alice pays into the joint account, she requires Bob’s permission to get her money back. So, before funding it, she asks Bob to sign a transaction returning the 100 coins back to her. She does not sign and transmit this transaction though, since it is just in case she wants to close the account before actually making any purchases.
On day 1, Alice buys a coffee, so signs a transaction paying 1 coin from the joint account to Bob and the remainder back to herself. This transaction is given to Bob just in case he needs to close out the account but, otherwise, he keeps hold of it without actually signing or sending it to the network. Bob also gives a version of this transaction to Alice containing his signature, in case she needs to close out the account.
On day 2, she buys another coffee, so adds this to the initial payment. As this is a total of 2 coins spent, she creates a new transaction paying 2 coins to Bob and the remainder to herself. This is handled in the same way as previously, with both Alice and Bob keeping a version of the transaction with the other one’s signature, so they can close out the account if needed.
This continues. On day 3, Alice buys two coffees and on day 4 she buys another. Then Alice decides to close the account, so Alice and Bob sign a final transaction to close out the account, giving the total paid amount of 5 coins to Bob and the remaining 95 returned back to Alice.
Note that, in all this, there are only two transactions being sent to the blockchain, regardless of how many time Alice visits the shop. This is the first, funding transaction, and the final one to close out the account. The remaining transactions are just exchanged between each of the counterparties without being added to the blockchain, so remain as zero-confirmation. In effect, Alice and Bob are merely keeping a tally of how much is owed to each party, and only settling this on-chain at the end. The commitment transactions will only ever be used in case there is a dispute or one of the two parties cannot be contacted, so the other can use it to unilaterally close the account.
While the use of a multisig account has removed much of the possibility of double spends, it has not been eliminated altogether. Bob would not sign a fraudulent double spend but, due to the reuse of the account, he has already signed alternative transactions. After day 4, for example, Alice has spent a total of 5 coins. However, she still has the old commitment transactions from previous days which have Bob’s signature. She could send one of these to the network. She could even submit the very first commitment that was created when the account was opened, and send all 100 coins back to herself with Bob receiving none.
Somehow, to eliminate these fraudulent double spends, Bob needs to invalidate the previous day’s transaction before accepting the new payment. Unfortunately it is a design feature of Bitcoin that, if a transaction is valid to spend, then it remains valid forever or until any of its inputs are spent. Furthermore, Bob does not want to have to do anything on-chain anyway, since this would be slow and expensive. He cannot stop Alice from spending an old commitment transaction. However, there is a way that he can recover the funds if Alice pulls such a stunt.
The idea is that Alice’s versions of the commitment transactions come with a timelock, so that if she does try to add it to the blockchain then she has to wait a set period of time before the funds returned to herself can be spent. Let’s suppose that this is set at one day. Also, it comes with a second spend method which comes without any time lock, but requires both Alice and Bob to sign. Then, when Alice pays for a coffee, she does not just sign a new commitment transaction. She also signs a transaction to pay all of her received funds from the previous commitment back to Bob. This is the breach remedy transaction, and the previous commitment is now considered to be revoked. This means that she can no longer use the previous commitment since, if she did, she would have to wait a day to be able to access her funds and, in the intervening time, Bob would see the transaction on the bitcoin and use the breach remedy to return all funds to himself, leaving Alice with nothing.
With this set-up, when Alice decides to close the account, then she would prefer not to do this unilaterally using her commitment transaction, since her funds would be locked up for a day. It is better to cooperate with Bob to create a new transaction paying the correct funds and with no such restriction. This is why, in figure 1, the final transaction is separate from the last commitment even though they pay the same funds. It is also important that Bob monitors the blockchain with a frequency greater than once a day, just in case Alice does try to spend a revoked commitment transaction, giving him time to apply the breach remedy before Alice withdraws the funds.
What I describe above is not intended to be a full description of Lightning channels, and is just explaining how it solves the problems in the scenario above with Alice making regular small payments at a coffee shop. The channel here is one-directional, meaning that there is no benefit to Bob in transmitting an old commitment transaction. This would simply result in loss of funds to himself. In bi-directional channels with payments made in both ways, Alice also needs to guard against Bob using an old commitment, so his versions of the transactions will also have a time-lock before he can recover his funds and he will have to sign breach-remedy transactions for Alice.
A single Lightning channel between Alice and the coffee shop is rather restrictive, since it only allows her to spend her funds at that one establishment. In practice, a network is used, which is made up of a lot of connected channels. For example, suppose that Alice tries to purchase a snack from a vending machine using her bitcoin. Rather than having to open a new channel with the vending company, the one that she already has with the coffee shop can be used. If the coffee shop also had a Lightning channel open with the vending company, then when she pays for the snack, this payment can be routed via the shop.
The Lightning network and channels as explained here were first suggested in 2015 by Joseph Poon and Thaddeus Dryja.
There are alternative approaches to have fast and cheap payments in a similar manner to the use of Lightning channels, but are set up differently. I will briefly mention some here, which also rely on a multisig account with commitment transactions as in figure 1 above, but use different methods of preventing either party from stealing funds by sending old commitments to the network. Each method starts by agreeing on a fixed future date by which time the channel must be closed. See also the Bitcoin Wiki for more information on alternative ideas.
First, as the channel described above is uni-directional, it can be implemented without Alice having to handle commitment transactions. Instead, Alice signs each commitment for Bob to hold, so that he can unilaterally close out the channel if he wants. There is no risk of Bob using an old commitment transaction, since this would only cost him money. Alice still needs some way of recovering her funds in case Bob becomes unresponsive. For this, we allow her the option of recovering all funds back to herself after the future agreed date, which can be done by setting up the multisig account such that it also allows a single sig spend by Alice after the given date. Alternatively, Bob can sign a transaction returning the funds to Alice, using the nLockTime field to specify that it is only valid after the stated date. In this case, they must agree to close out the channel by the final date or, otherwise, Bob will have to unilaterally close it using the last commitment transaction to avoid losing funds. This is similar to a suggestion of Jeremy Spillman in 2013.
Another approach was — according to Mike Hearn — originally suggested by Satoshi.
Here is how Satoshi explained it to me, in his words:
An unrecorded open transaction can keep being replaced until nLockTime. It may contain payments by multiple parties. Each input owner signs their input. For a new version to be written, each must sign a higher sequence number (see IsNewerThan). By signing, an input owner says “I agree to put my money in, if everyone puts their money in and the outputs are this.” There are other options in SignatureHash such as SIGHASH_SINGLE which means “I agree, as long as this one output (i.e. mine) is what I want, I don’t care what you do with the other outputs.”. If that’s written with a high nSequenceNumber, the party can bow out of the negotiation except for that one stipulation, or sign SIGHASH_NONE and bow out completely.
The parties could create a pre-agreed default option by creating a higher nSequenceNumber tx using OP_CHECKMULTISIG that requires a subset of parties to sign to complete the signature. The parties hold this tx in reserve and if need be, pass it around until it has enough signatures.
One use of nLockTime is high frequency trades between a set of parties. They can keep updating a tx by unanimous agreement. The party giving money would be the first to sign the next version. If one party stops agreeing to changes, then the last state will be recorded at nLockTime. If desired, a default transaction can be prepared after each version so n-1 parties can push an unresponsive party out. Intermediate transactions do not need to be broadcast. Only the final outcome gets recorded by the network. Just before nLockTime, the parties and a few witness nodes broadcast the highest sequence tx they saw.
For the payments between Alice and Bob above, this is as in figure 1. We first choose a date in the future by which the channel has to be closed, and each of the commitment transactions has nLockTime set so that they cannot be added to the blockchain before this date. This does mean that if either party unilaterally closes the channel using a commitment transaction, they have a significant wait before they can actually enact this. Next, each commitment has the nSequenceNumber field set on its input, and each subsequent commitment increases the value of nSequenceNumber. The idea is that nodes are supposed to accept the transaction with the largest sequence number, so that old commitment transactions will be over-ridden. The problem is that there is no incentive to miners to use the transaction with the largest nSequenceNumber, and it also leaves the network vulnerable to denial of service (DoS) attacks where a user sends massive numbers of transactions which are all the same, but with different sequence numbers. So, this use of nSequenceNumber has since been disabled in Bitcoin, and the approach given here would not work as intended.
There is another method that has been suggested using nLockTime, but not relying on sequence numbers. Again, we choose a date significantly far in the future by which time the channel must be closed. Then, as before, the set of transactions is as in figure 1. However, the first commitment transaction has nLockTime set to the chosen future date. The second commitment has it set to the day before this, and so on. Each commitment transaction has nLockTime set to the day before the previous commitment’s lock time.
With this process, nLockTime keeps getting moved earlier and earlier with each payment. The channel will then be closed out before the lock time reaches the current date. With this setup, if anyone tried to submit an old commitment transaction, then it would be over-ridden by a newer one with a lower value of nLockTime, since that would be valid for including on the blockchain first.
While this final approach would work, it seriously limits the number of transactions that the channel can be used for since each one moves nLockTime a day earlier. It also means that if either party wants to unilaterally close the account, they have a long wait. Conversely, Lightning channels can be used for unlimited payments so long as they have the required funds, and if any party unilaterally closes it then there is a relatively short wait before they can access the funds.