Lightning Channels

The Lightning network is a layer 2 solution allowing for cheap and near-instant bitcoin payments. As with standard ‘on-chain’ transactions, it works in a peer-to-peer fashion without requiring trust in third parties. This network is built up from a collection of connected payment channels between pairs of individuals, each of which operates by exchanging zero-confirmation bitcoin transactions which are not actually submitted for inclusion in the blockchain. This reduces load on the chain, and allows settlement to occur in fractions of a second — much less than the 10 minutes average block production time.

Here, I concentrate on explaining how a single payment channel between two parties operates. The method by which these channels are connected to route payments will be explained in a separate post. I did previously discuss the use of zero-confirmation transactions to speed up payments, and the considerations leading to the idea of Lightning channels. Now, I will give a more detailed description of how this works. For more information, see the Lightning whitepaper. Also, medium.com has a pretty good explanation of how and why payment channels work in the way that they do.

To aid the exposition, let us suppose that two parties named Alice and Bob set up a payment channel between themselves. It is initially funded with 100 coins, with 50 from each of the two parties. At today’s prices, 100 bitcoin would be rather a lot to commit to a channel. I choose this quantity simply for a nice round number, and you can suppose that it is in units of 0.0001 bitcoin or 10,000 Satoshi, for more reasonable values.

Next, Alice and Bob start making payments to each other. As they already have coins locked up in the channel, this can be done by simply updating the quantities of these coins belonging to each party. See figure 1 below where, once the channel is funded, one payment is made on each of four days. They keep a tally of the amount of coins owed to each person, updated after every transaction. After this, the channel is closed and Alice and Bob receive back their current balance of coins.

This is, of course, a much simplified example. In practice, many thousands of payments could be completed before the channel is closed. Also, multiple payments could be made per second, rather than just once per day as here.

How does this look, in terms of transactions actually submitted for inclusion in the blockchain? The situation is shown in figure 2. There are only two on-chain transactions. The first of these is to fund the channel and has two inputs, one from Alice and the other from Bob, so they both must sign for it to be valid. The output is to a multi-sig address requiring both Alice and Bob to sign to release the coins.

Nothing further is done on-chain until the channel is closed. At this time, a single transaction is made from the multi-sig address and paying each of Alice and Bob their closing channel balances. Since it is multi-sig, both Alice and Bob must sign this transaction. Continue reading “Lightning Channels”

Zero-Confirmation Transactions

The Lightning network is gaining in adoption, and is currently being used worldwide to enable fast and cheap bitcoin payments. I will describe some of the issues that led to its invention, and hopefully help to understand why it works in the way that it does.

Picture the scene. As the owner of a coffee shop, you make the decision to start accepting bitcoin payments. Your first customer, Alice, enters the shop and orders a coffee. She takes out her mobile phone, opens the app for her bitcoin wallet, and scans the QR code for your bitcoin address which you have displayed by the counter. All she needs to do now, is to send the correct payment. Easy!

Well…not quite so easy. To be certain that the transaction is secure, we should really wait for it to have the requisite six confirmations before giving Alice the coffee. That is, it should be added to the blockchain, and have a further five blocks added on top. As it takes an average of ten minutes for each block to be produced, we can expect to have to wait around one hour. This is rather a long time to keep people waiting.

What can be done? As it is just for a coffee — not a major payment — we might be ok with fewer confirmations. Maybe just one will be sufficient, meaning that we simply wait for the transaction to be included in a block. It still takes ten minutes though, and remember that this is just the average. The time taken for a block to be produced is random so, if we are unlucky, it could take considerably longer. An hour for a block is not unheard of.

We really want to be able to accept Alice’s payment within a few seconds. There is no way around it, we are going to have to accept zero-confirmation bitcoin transactions. Either Alice sends her transaction to the network, and we monitor Bitcoin nodes to check that it has been received, even if it has not made its way into the blockchain yet. Alternatively, Alice can send her transaction directly to us and we will make sure that it is transmitted to the network to be included in a block.

Accepting transactions with zero confirmations does leave us vulnerable to double spend attacks, which is one of the main things that the blockchain was invented to solve. This is where Alice, possibly with an accomplice, spends the same coins multiple times with different counterparties. Even if we were to do this, and risk losing payments to double spends, we still need to ensure that the bitcoin transaction fee is sufficiently large to be accepted by miners for inclusion in a block. So, Alice will be rather upset that this fee is several times larger than the cost of the coffee she is buying. More fundamentally, the Bitcoin blockchain simply does not have the capacity to include lots of such small every-day payments by millions of people around the world.

Scalability Problems

Summarizing the issues raised so far, the following three problems need to be solved in order for Bitcoin to be useful as a mechanism for small, quick and common payments such as buying a coffee.

• Speed. We should be able to accept payments in a matter of seconds.
• Cost. The fee for a transaction should be small. No more than a few cents at most.
• Capacity. The network should be able to handle millions of people around the world making such transactions regularly.

Alternative Signed Scripts?

The soon-to-be live Taproot upgrade introduces the possibility of associating a collection of ‘alternative’ scripts to a Bitcoin address. These allow coins to be locked into a contract containing a collection of different clauses by which they can be unlocked and, when spending, we only reveal the script corresponding to the specific clause being activated. So long as all parties to the address authorise the transaction, a standard Taproot spend can be used simply by providing the digital signature, in which case, none of the alternative scripts need to be used or represented on the blockchain at all. As I described in an earlier post, the scripts are represented by a Merkelized Alternative Script Tree (MAST, for short). The Merkle root is used to modify the public key and, when activating a particular script for spending, it, along with its Merkle proof, is included in the spending transaction’s witness field.

There is another, simpler, way in which alternative scripts could have been incorporated, which I will refer to as Alternative Signed Scripts (or ASS, if you prefer…). This is a suggestion as to how it could easily have been done, and not a description of the actual Taproot implementation as it is. Hence the question mark in the post title — why did Bitcoin devs not implement the method which I will describe here? Is there a particular reason, or was it just not considered?

First, recall that a Taproot address contains a public key in the scriptPubKey of the sending transaction, and the corresponding Schnorr signature in the witness field of the spending transaction.

 P2TR — Pay to Taproot witness `` scriptPubKey `OP_1 `

To be a valid spend, the signature needs to be valid for the provided public key and with the transaction as the message. To spend using an alternative signed script, we would have exactly the same Taproot format for the scriptPubKey associated with the Bitcoin address. The only difference is in the witness field of the transaction spending the coins.

 P2TR — Pay to Taproot witness `

Here, `<script>` is a serialization of the alternative spend script to be used, and `<signature>` is a Schnorr signature for the spend script. When validating the spend, we first use the public key to check that the signature is valid using the specified script as the message. Next, the items `<sig_1>` through `<sig_n>` are pushed onto the stack. Finally, `<script>` is deserialized and executed. As usual, if the value at the top of the stack is ` True` then the spend is valid, otherwise it is invalid.

That is all there is to my hypothetical ASS upgrade. Continue reading “Alternative Signed Scripts?”

Proof of Stake

I previously described how Bitcoin is secured by a Proof of Work (PoW) protocol. Now, I describe an alternative method of achieving the same goal — Proof of Stake (PoS). This is currently being used for some blockchains, such as Cardano, and there are also plans for Ethereum to move from its current PoW incarnation to a PoS system. Both methods have their benefits and drawbacks, with PoS requiring much less energy but, on the other hand, is more complex with more possible attack vectors. Opinion is divided on which is the better,with defenders of either system having very strong convictions. In this post, I concentrate on describing how proof of stake works and only briefly touch on the possible benefits of one method over another. There are many different implementations of proof of stake so, here, I give an overview of the methods used without concentrating on any one particular utilization.

As explained in an earlier post, whichever method is used to secure a blockchain, it should require miners to spend some resource in order to build blocks. For PoW chains like Bitcoin, this resource is computing power or hashrate. For PoS chains, the resource is the underlying tokens or cryptocurrency represented by the blockchain itself. More precisely, as it just requires miners to lock up coins for a period, it is the opportunity cost of these coins that they spend. This is an internal resource of the blockchain rather than an external one as with PoW. Before going any further, there is a slight matter of terminology. The people who build blocks for a proof of work chain are called miners whereas, in proof of stake, they are called validators. Other terms, such as forgers, are also in use. Personally, I think it would have been simpler to use the same word regardless of the protocol but, to be consistent with the common usage, I will refer to block builders as validators.

Let’s just refresh our understanding of what a protocol needs to do. It should;

• allow anyone with the necessary resources to participate in building the blockchain.
• allow the network to reach agreement on a single global blockchain state at each time.
• (immutability) ensure that, once a block is added and considered to be confirmed, then it will remain in the chain indefinitely.

Furthermore, these properties should continue to hold even in the event that a minority of validators are acting dishonestly or, even, are collaborating to attack the network.

In proof of stake, individuals register to become a validator by locking up a quantity of the blockchain asset. The protocol then divides up time into a sequence of successive slots, one for each block. It needs to assign one validator per slot, and do this in a way such that, on average, the number of slots assigned to each validator is in proportion to their stake size. Then, when their time slot is reached, it is the job of the validator to collect together transactions from the mempool, and construct a valid block to append to the chain. In return, they receive a reward which can be one of, or a combination of, the transaction fees in the block and the block reward.

Each block is appended to the one constructed in the previous time slot so that, as time progresses, the validators continually add blocks on top of blocks, building the chain. Figure 1 shows this procedure with the cryptopunks acting as validators. Unlike with PoW where anyone who is able to create a valid block can submit it for inclusion, here the protocol needs to verify that the block used in each time slot has indeed been created by the validator assigned to it. This can be achieved with public key cryptography where, when they purchase a stake, each validator has to supply a public key. They then sign their blocks using the associated private key, proving that they were created by or, at least, approved by the assigned validator. Continue reading “Proof of Stake”

Taproot

The highly anticipated Bitcoin upgrade Taproot is locked in and set to be activated in November 2021. This will have various advantages, such as supporting Schnorr signatures and incorporating multiple possible unlocking scripts in an efficient fashion. The aim is to increase scalability, privacy and security, especially with more complex ‘smart’ transactions. These updates are all about how bitcoin utxos (unspent transaction outputs) can be spent, by introducing new ways of parsing the scripts. I previously posted about Bitcoin Script, describing both the legacy and SegWit methods. The current post extends the description to include Taproot.

The first thing to note, is that Taproot is a kind of SegWit implementation. This means that the scriptPubKey in the transaction output takes a standard form, and the witness field of the corresponding transaction input is used to validate the spend. As with other SegWit methods, the scriptSig of the transaction input is not used, so is left empty. The standard Taproot spend method is then as follows.

 P2TR — Pay to Taproot witness `` scriptPubKey `OP_1 `

The Bitcoin core script recognizes a scriptPubKey of this special form, consisting only of the version number ` OP_1` followed by a 256 bit (32 byte) public key, and interprets as a Taproot spend. Then, for a witness field consisting of a single command, this is interpreted as the standard Taproot spend method with the witness field containing just the signature. To be valid, the signature needs to be a valid Schnorr signature using the provided public key and using the spending transaction as the message.

Note that this is very similar to the standard SegWit P2PKH method, with three main differences.

• The scriptPubKey contains the public key itself, rather than a hash of it.
• Schnorr signatures are used, instead of the legacy ECDSA signatures.
• The scriptPubKey has version number 1 instead of 0.

I note that Taproot is a soft fork, since any Bitcoin node which does not include the Taproot upgrade will see the P2TR scriptPubKey as being an ‘anyone can spend’ address. As it would be interpreted as putting the public key on the top of the stack, such spends will still be seen as valid. This avoids a fork of the blockchain where non-upgraded nodes would potentially disagree with upgraded nodes on which is the valid chain. However, non-upgraded nodes would not be completely validating Taproot spends (potentially accepting invalid spends from Taproot addresses), reducing their effectiveness in securing the integrity of the blockchain.

Already, the simple spend method described above has some advantages, due to the use of the Schnorr signature method which can incorporate key aggregation to allow multisig to be used, as well as allowing more efficient batch validation of a collection of transactions in one go. However, the power of Taproot is in it incorporating alternative spend scripts, described below. Continue reading “Taproot”

Schnorr Signatures

As part of the upcoming taproot upgrade, Bitcoin will support the Schnorr digital signature algorithm, in addition to the existing ECDSA implementation. In many ways, Schnorr signatures are superior to ECDSA. Not only are they simple to implement, they are also theoretically easier to understand, more amenable to mathematical proofs of security, have smaller signature strings, and are consistent with key and signature aggregation as well as batch verification. Probably, the main reason that Schnorr signatures were not used from the start, is likely due to the fact that it was under patent until 2008, so was not standard at the time of Bitcoin’s creation. Both the ECDSA and Schnorr implementations are based on the same secp256k1 elliptic curve. I give a description of Schnorr signatures, and how they are implemented by Bitcoin, in this post.

The Setup

The Schnorr signature method requires us to first agree on the following components, which define the implementation.

• A cyclic group E of prime order p. This is usually taken to be an elliptic curve over a finite field which, for Bitcoin, is taken to be the secp256k1 curve. This is the curve defined by the equation y2 = x3 + 7, with components (x,y) belonging to the field of integers modulo the 256 bit prime number
 q = 2256 – 232 – 29 – 28 – 27 – 26 – 24 – 1.

The group itself is not of size q. Rather, it is cyclic with order equal to another 256 bit prime number

 p=`0xfffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e8cd0364141`

To be precise, we should also specify exactly how elements of the group E are represented as binary strings, so that they can be transmitted across the internet and stored in Bitcoin transactions. I will describe this further below.

There is also a minor choice of notation to be made, although it does not impact the implementation at all. Groups can be written either multiplicatively or additively. In this post, I will use the additive notation. This is just a personal choice in writing out the formulas, but should be mentioned since some descriptions such as the Wikipedia page use multiplicative notation, meaning that the formulas look slightly different.

• A generator G for the group. As the group is of prime order, a generator is just any element other than the identity. Although it is arbitrary, for completeness I state the choice used by Bitcoin, which is (x,y) with,
 x=`0x79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798` y=`0x483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8`
• A cryptographic hash function h which, in the Bitcoin implementation, is SHA256.

With these choices made, we need to describe three things to have a complete digital signature algorithm. These are, the method of constructing a public/private key pair, the algorithm to verify the digital signature associated with a message, and an algorithm for the owner of the private key to construct such signatures.

Key Pairs

The first step in any digital signature algorithm is the selection of a public/private key pair. This is performed by the user who needs to be able to validate messages or transactions. They should choose a private key at random, and compute the public key from this. The public key is, as the name suggests, distribute publicly, while the private key must be kept secret. Anyone in possession of the private key would be able to sign messages, and spend the Bitcoin secured by the associated public key.

In the Schnorr algorithm, the private key is just an integer x modulo p. This is represented by a unique integer value in the range 0 ≤ x < p. For security, the private key should be chosen at random.

The associated public key is a point in the group E. It is computed by multiplying the base point G by the private key,

 P = x.G. (1)

As long as x is chosen at random, then P will be a completely random point of E.

Recall that the private key should be kept secret, since anyone in possession of this would be able to sign transactions. However, looking at equation (1), it appears that the private key can simply be backed out from the public one. By construction, it is the unique integer (mod p) which, when multiplied by G, gives the public key. In principle this can indeed be done, but the idea is that this is a hard problem for the choice of group E. This is the discrete logarithm problem (in our choice of additive group notation, it looks more like a discrete division problem). For elliptic curves such as the secp256k1 used by Bitcoin, there is no known way to solve for x other than a brute-force search which, given the huge size of the group, is computationally infeasible.

It is the hardness of the discrete logarithm problem that gives this method its security. Reconstructing private keys from public ones or, alternatively, producing valid signatures without knowledge of the private key, requires solving this problem which, for the Bitcoin elliptic curve, has no known feasible method of solution. More specifically, given an element P of the group, chosen uniformly at random, it should be impossible to solve (1) for x with anything other than a vanishingly small probability of success. Continue reading “Schnorr Signatures”

The Group Structure of Elliptic Curves

In the previous discussion of elliptic curves we saw that, once we know some points on the curve, then these can be used to algebraically construct new curve points. Specifically, from two points, we take the line joining them and solve for the third point of intersection. This construction gives a binary operator. In mathematics, one of the most useful type of binary operators are those defining a group. There is a lot of mathematical theory developed for groups so, if we have a group operator, then it is important to realise this. Unfortunately, the operator defined on an elliptic curve as just described does not satisfy the requirements for a group. Specifically, it is not associative, meaning that if we combine together more than two elements then the result depends on the order of the operations. However, it turns out that, with a slight modification, the binary operator on an elliptic curve does define a group. Such groups are used in the digital signature algorithms employed by Bitcoin and other cryptocurrencies.

Recall that an elliptic curve is the solution set to a third order polynomial equation. For now let’s start with the relatively simple case where the equation is in Weierstrass normal form,

 y2 = x3 + ax + b. (1)

Here, the constant coefficients a and b lie in some base field F, as do the solutions for (x,y). In addition, there is one point at infinity 𝒪, which was required so that the binary operator is everywhere defined, and comes from the fact that we should consider algebraic curves as lying in projective space and, hence, can have points on the line at infinity. Also, it is assumed that the curve does not have singular points which, for the one given by (1), means that the cubic on the right hand side does not have a double root. This is equivalent to 4a3 + 27b2 being nonzero.

As it is symmetric in y, the solutions to (1) are symmetric in reflection about the x-axis. Specifically, for a point P = (x,y) on the curve, then (x,-y) is also on the curve. I will denote this reflected point by Pr. By convention, the point at infinity is equal to its own reflection, 𝒪r = 𝒪. As in the post on elliptic curves, I use PQ to denote the third point of intersection of the line joining P and Q with the curve. Combining this with a reflection about the x-axis does indeed give a group operation.

Theorem 1 Let E be an elliptic curve in Weierstrass normal form, and point at infinity 𝒪. Then, it is a commutative group under the binary operator

 P + Q ≡ (P○Q)r

Furthermore, the group identity is 𝒪 and the inverse of an element P is -P = Pr.

The group operator ‘+’ is as shown in figure 1 below. It is not at all obvious from looking at the plot that this operator satisfies associativity. In fact this is a rather tricky property to prove, and I go into more detail on how to prove this further below.

Group Theory

Groups are a foundational algebraic concept used across many different branches of both pure and applied mathematics. In particular, the ECDSA and newer Schnorr digital signature algorithms used by Bitcoin are based on group theory. I will go over a small amount of the theory here, including the parts that are relevant for the signature algorithms.

Simple examples of groups include the real (or, rational, integer, complex, etc) numbers under the operation of addition, or the nonzero real (or rational, or complex) numbers under multiplication. Generally, a group is a set G together with a binary operator, which I will denote by ‘·’,

 G × G → G, (a, b) ↦ a · b.

For elements a,b of the group, the value a·b is referred to as their product or group product and, for brevity, it is also often written as ab. There are certain algebraic properties or axioms that are required in order for G to be a group. The first is associativity,

 (a · b) · c = a · (b · c), (1)

which must hold for all a,b,c in G. Continue reading “Group Theory”

Projective Curves

Here, I will fill in some of the gaps (quite literally) from the earlier post on algebraic and elliptic curves. There, I showed how we can, algebraically, construct points on a curve in terms of already known ones. For curves of degree 2, or conics, this led to a parameterisation of the entire curve. For degree 3, or elliptic curves, this gave a binary operator allowing us to construct one new point out of any two existing points which, as I will cover later, leads to the group structure used in the Bitcoin digital signature algorithms. If you have not already, I would suggest reading that post before coming back to this one. However, there was one small problem that we encountered. For some special cases, the construction is not well-defined and directly applying the algebraic formulas would give divide-by-zero errors. We interpreted this as giving points at infinity. However, it was not clear what the points at infinity are, or how they should be handled algebraically. I will state upfront, this post consists of the mathematical background to better understand elliptic curves. If you are happy with the explanation given in the previous post, and to take at face value how to handle the point at infinity of the curve used by Bitcoin, then this post can be skipped. On the other hand, if you want to understand the mathematics better, then read on.

As an example, consider the hyperbola in figure 1 given by the solutions to ${x^2-y^2=1}$. Most lines that intersect the curve at all, will intersect it at precisely two points. There are tangents to the curve, that will intersect at one point, but with multiplicity two. However, lines which are perpendicular to one of the asymptotes have gradient 1 or -1 and will intersect at most once. To fill in this missing point, we interpret these lines as also intersecting the curve at a point at infinity. All parallel lines intersect at the same point at infinity and, in the case of the hyperbola, the asymptotes themselves (which have gradient 1 or -1 and pass through the origin) are interpreted as intersecting a point at infinity with multiplicity 2.