No, I haven’t gone crazy. Named after a Harry Potter spell rendering the target incapable of speaking coherently, mimblewimble is a blockchain protocol which scrambles up the transaction quantities and addresses, giving both a high degree of privacy and an efficient representation allowing massive savings on block space. According to its anonymous inventor, “I call my creation Mimblewimble because it is used to prevent the blockchain from talking about all user’s information”. Remarkable properties include not just hiding transaction quantities, but also allowing all transactions to be merged into a single big transaction, and spent outputs combined with their corresponding inputs and stripped from the blockchain. This last point has potential huge savings on blockchain space and validation time.
The protocol was implemented by the Grin and Beam cryptocurrencies, launched in January 2019. More recently, it was introduced into Litecoin in the MWEB upgrade (Mimblewimble extension block). This formally locked in on 3 May 2022 and went live a couple of weeks later.
As this post is rather long, I include a list of contents:
▪ Benefits and Drawbacks
▪ Validating a Transaction
▪ Signing the Transaction
▪ Transaction Cut-Through
▪ Range Proofs
▪ Ring Signatures
The history of mimblewimble is rather mysterious, as really should be expected. It started on the morning of 2nd August 2016 at 4:35 UTC. A user going by the pseudonym Tom Elvis Jedusor logged into the Bitcoin wizards IRC page, under the id ‘majorplayer’, posted the comment “hi, i have an idea for improving privacy in bitcoin. my friend who knows technology says this channel would have interest,” along with a link to a text document on a hidden and now defunct Tor service, and left. Tom Elvis Jedusor is the original name of Lord Voldemort in the French translation of Harry Potter, and is an anagram of “je suis Voldemort”!
The aforementioned document was discussed in a reddit thread, Mimblewimble: Non-interactive CoinJoin and better scaling properties using Confidential Transactions. This also contains a link to the original paper by Jedusor. Mimblewimble is based on the confidential transactions of Greg Maxwell, originating in a 2013 Bitcoin forum post by Adam Back. In October 2016, Andrew Poelstra posted a paper fleshing out and expanding on the ideas contained in Jedusor’s post, as well as fixing at least one mistake in the original.
In November 2016 another character appeared on the same IRC chat. Ignotus Peverell, inventor of the invisibility cloak in the Harry Potter universe, posted a github link to the first implementation of mimblewimble. This was to eventually lead to the Grin cryptocurrency. In parallel, Beam was also developing its own implementation. Both of these launched in January 2019. After a couple of years in development and testing, the MWEB upgrade introducing mimblewimble to Litecoin went live in May 2022.
For some more information on this early history, see the post A Short History of Mimblewimble: From Hogwarts to Mobile Wallets, posted by Beni Issembert in October 2018. Also, Peverell posted a technical Introduction to Mimblewimble and Grin on github in March 2017. A high level overview with some technical detail is given by Tari Labs. In the current post, I attempt an overview of the technical ideas and mathematical techniques used.
Benefits and Drawbacks
Significant features and benefits of Mimblewimble, when compared to other protocols are:
- Privacy: the blockchain does not record quantities of the transactions. Rather, it associates a commitment with each transaction output. This is a bit like a private key (or address), but is mixed in with the quantity in a way in which the quantities are completely hidden. To an external observer, these commitment values are entirely random.
- Transactions themselves do not need to be stored in a block. Instead, only the list of inputs and outputs is required, in any order (along with some signature and range proof data). It is not necessary to know how they were originally grouped into transactions. This is both space saving and privacy enhancing.
- Spent transaction outputs can be entirely stripped from the blockchain, along with their matched inputs. This has potential huge savings on space since, if all this data is removed, all that remains is some proof of work and signature data along with the utxo (unspent transaction output) dataset. As a result, the space required for a blockchain implementing Mimblewimble will grow with the size of the utxo set, but not much with the total length of the chain and number of transactions.
- Even though it is not possible to determine the quantity associated with individual outputs, the utxo set does come along with a zero knowledge proof that the sum of the outputs does equal the correct value. So, it can be verified that there is no unexpected inflation of coins.
Along with these benefits, there are obviously also some drawbacks:
- Opaqueness: This is the flipside of the privacy feature. If the entire transaction history is not stored, then users could have less confidence if they cannot see how it is being used or that everything is working as claimed. In addition, privacy could be viewed as an issue to regulators or other entities who want the blockchain to be monitored.
- Creating a transaction requires the sender and receiver to communicate. So, it is not possible for someone to publish a public address and allow people to simply send coins to it without further interaction.
- The privacy feature requires all outputs to contain range proofs. These are zero knowledge proofs that the output does indeed contain a positive quantity within a required range. These will take additional time to validate, and also take up a significant amount of space. This means that a Mimblewimble based blockchain with a large utxo set could take up more space than one without the privacy features. It would be possible, in theory, to allow individual outputs to specify the quantity instead of having a range proof, so sacrificing the privacy in order to reduce space and while still retaining the remaining Mimblewimble features.
- It is not possible to have general locking scripts for outputs, as with Bitcoin Script. This makes it more difficult, or even impossible, to implement some more advanced blockchain features such as oracles, etc.
- The protocol is based, fundamentally, on elliptic curve cryptography. If this was ever broken, by quantum computers for example, then the entire blockchain would be compromised.
I also mention another significant difference between Mimblewimble and blockchains such as Bitcoin. In Bitcoin, it is possible to trace the history of a coin through its previous transactions all the way back to its creation as block rewards. Mimblewimble breaks with this idea by combining all transactions, so all that we see is the total unspent transaction outputs and block rewards so far, without making any assignment from individual utxos to specific inputs.
Mimblewimble is based on the utxo model, as used by Bitcoin, Litecoin, and various other blockchains. Each transaction contains a set of inputs and a set of outputs along with their quantities. Each input is linked to a previous unspent transaction output (utxo), with its quantity being given by that of the corresponding utxo. The sum of the output quantities can be no greater than the sum of the inputs, with the difference being the transaction fee paid to the miner.
Consider the following scenario. Alice is the owner of 100 units of a mimblewimble based asset. This means that she has access, via knowledge of a private key, to a utxo with quantity 100. She decides to send 30 units to Bob, while paying a transaction fee of 5, meaning that she receives 65 units change. This transaction consists of the following.
- an input of quantity 100, which is linked to Alice’s previous utxo.
- an output of quantity 30, going to Bob.
- an output of quantity 65, returned to Alice.
- a fee of quantity 5.
This is in figure 1 below. All quantities are integer valued, so we must make sure that the units are small enough to not require fractional valued transactions. The fee is a kind of output which, ordinarily, does not have to be explicitly listed in the transaction since it can be computed by the sum of the inputs minus the sum of the outputs. We include it here though, and will be needed once the quantities are hidden.
Denote the transaction quantities by ai, so that Alice’s input is a1 = 100, Bob’s output is a2 = 30, and Alice’s change is a3 = 65. Denote the fee by f = 5. The condition is that the sum of the ai plus the fee, using a negative sign for inputs, evaluates to zero.
|–a1 + a2 + a3 + f = 0.|
Next, we choose a private and public key pair for each input and output. For this, elliptic curve cryptography is used along similar lines to that described in the post on Schnorr signatures. We fix a cyclic group E with size equal to a huge prime number p, typically a 256 bit prime for Bitcoin related applications. In practice, E is an elliptic curve, although that is not really important for the description given here. A generator G is fixed, which is just any nonzero element of the group. Then, a private key x is just a number taken modulo p, and the associated public key is constructed by multiplying this by the generator, P = x.G. The idea is that, for generic points P of the group, it is practically impossible to ever back out the private key x. Although it is theoretically possible by a brute-force search, the size p of the group is so huge that we will never find the solution. So long as x is chosen randomly in the range 0 to p – 1, the associated public key can be revealed to the world while x is kept secret.
In mimblewimble, we do not reveal the ‘public key’ x.G either. Instead, it is used as a blinding factor to hide the quantity a of the output or input. Another generic group element H is fixed by the protocol. It is important that no-one is ever able to find a multiplier k such that H = k.G. If they could, the whole protocol would fall apart allowing people to generate transactions with arbitrary inputs and outputs, not even summing to zero. For example, we might choose H whose coordinates are given by an SHA256 hash of an agreed value, such as the bits of the generator G. Either way, users should be able to have confidence that H is a generic group point.
Now that the group points G and H have been fixed by the protocol, Pedersen commitments are defined for each input and output. If the quantity is a units and the ‘private key’ is x, then the commitment is given by combining these,
|C = a.H + x.G.||(1)|
Effectively, the commitment is the public key shifted by an amount proportional to the value associated with the input or output. The value x is also known as a blinding key since it hides the quantity a. This is as in figure 2 below, where instead of associating the quantity with each input or output, we use the commitment value Ci = ai.H + xi.G. There is no public address, as with Bitcoin transactions. Instead, both the quantities and keys are combined in the commitments created on-the-fly as part of the transaction construction.
Each participant chooses the private key xi for their output at random in the range 0 to p – 1, and keeps it secret. From this, they compute the commitment value associated with the output, and must also construct a range proof. This is a zero knowledge proof that the commitment C is indeed of the form (1) for some positive quantity a lying in a preset range defined by the protocol. More about these below.
For inputs, xi is the same as for the output which is being spent (so, the person spending an output must know its private key). This ensures that the commitment value of an input equals the output commitment being spent.
Each commitment is an entirely random element of the group E, hiding the quantities and private keys from prying eyes. Only the fee does not have a blinding factor, so is revealed to the outside world. This is necessary, since it needs to be known to the miner.
The condition that the signed quantities and fee sum to zero can no longer be directly verified by an outside validator since they are not known. Instead, consider the sum of the commitments and fee, using a negative sign for inputs.
|C||= –C1 + C2 + C3 + f.H|
|= –a1.H – x1.G + a2.H + x2.G + a3.H + x3.G + f.H|
|= (-a1 + a2 + a3 + f).H + (-x1 + x2 + x3).G|
As the signed transaction quantities sum to zero, the multiples of H cancel out and we are left with a multiple x = –x1 + x2 + x3 of the generator G.
This means that the summed or excess commitment C is effectively the public key associated with private key x. Although the value of x may not be known to any one individual, the participants in the transaction each know their blinding key xi, which can be combined, effectively generating a multi-party signature for the transaction.
I’ll finish this section by commenting that every commitment C appearing in any output, technically, does simultaneously commit to every possible quantity a! That is, for every quantity a, there does exist a secret blinding factor x satisfying (1). Indeed, since G is a generator then, by definition, C – a.H must equal x.G for some value of x. The reason why this is not a problem, and the transactions don’t collapse into a meaningless mess, is that computing a discrete logarithm is intractible. Actually computing the value of x is effectively impossible except for the specific known values of a and x used in building the output.
Validating a Transaction
For an outside party to be able to validate the transaction, the following needs to be provided.
- The utxo set of the current blockchain state. This is just the list of commitments of unspent transaction outputs in the blockchain.
- The list of input and output commitments of the transaction.
- The transaction fee.
- A range proof for each output commitment.
- A digital signature corresponding to the public key equal to the excess commitment C.
- A message to which the signature applies. As explained below, this can be fixed to be the empty string, in which case it does not need to be provided.
The first step is to check that the transaction input commitments correspond to outputs in the existing utxo set. Unlike with Bitcoin, there is no script or signature data associated with each input, since this is taken care of by the transaction signature.
Next, the range proofs are used to verify that the output commitments are all of the form C = a.H + x.G for the transaction quantity a being in a specific range fixed by the protocol. For example, we could require it to be in the range 0 < a ≤ 264. These are noninteractive zero knowledge proofs that the commitments are indeed of the claimed form, without revealing any further information about the values of a and x themselves. Range proofs are a vital part of the protocol since, without them, transactions could be constructed with some negative output quantities, allowing other output sizes to be greater than the total inputs. Similarly, it would be possible to create outputs with a massive quantity which, due to overflow, still apparently sum to the value of the inputs. Either way, this would allow arbitrary inflation of the supply through unrestricted creation of new coins. Restricting the quantities to a small enough range will ensure both that they are positive and that there can be no overflow when they are summed.
As range proofs can be up to several kilobytes in length, these will take up a significant amount of the blockchain space. Finding better range proofs is still an active area for development. I will discuss these more below.
The final part of the validation is to compute the excess commitment C by summing the inputs, outputs and fee, and checking that the signature is valid for the given public key C and provided message. In Bitcoin, the message being signed is equal to the transaction (excluding the signature data itself, as that would be circular). It is vital that the signature applies only to the transaction being validated, otherwise it would be insecure since any third party on seeing a transaction sent to the mempool could replace it with their own transaction using the same signature, stealing the coins. With mimblewimble this is not necessary, since the excess commitment used as the public key is already specific to the transaction. The signed message can be anything we want, so would usually take it to be the empty string for simplicity. This is especially important when aggregating and combining transactions, since the individual transaction data can be dropped from the blockchain altogether.
Recall that the example transaction above has a single output with no blinding factor — the fee. However, some transactions will have a single explicit input also with no blinding factor. This is the block reward plus fees paid to the miner. Such explicit inputs and outputs should be combined into a single value. Once all of the transactions in the block have been combined, this will lead to a single explicit input equal to the block reward defined by the protocol.
Signing the Transaction
I described how transactions are defined in mimblewimble above, but did not say how they can be constructed in the first place. Consider the example as in figure 1, with Alice paying a quantity of coins to Bob and returning a change amount to herself. Each party to the transaction can construct the commitment for their respective input or output simply by choosing a blinding factor x at random and applying equation (1).
In Bitcoin, the person spending money can create the transaction and send to the mempool which, in the example considered here is Alice. She just needs to know the address for Bob to send the coins to. Only Alice’s input needs to be signed, which she is capable of doing herself. Once the transaction is included in a block, Bob receives his money.
On the other hand, for a mimblewimble transaction, Alice is not able to sign on her own. This is because to find the private key (or, excess) of the transaction, she would need to know Bob’s private blinding key x2. Bob cannot reveal this secret value, since doing so would allow Alice to steal back the coins he receives.
However, it is possible for Alice to send the necessary information to Bob in order that he can create the transaction himself and send it for inclusion in a block, so that he can receive his payment. Alice sends him the following information.
- The commitments for her input and change output, C1 and C3.
- The quantity a2 which she is paying to Bob.
- The signed sum of her private keys, –x1 + x3.
Once he receives this, Bob can compute the private key x = –x1 + x2 + x3 and use this to generate the signature. Combining with his own commitment C2 and associated range proof generates the transaction.
This method of creating payments was described by Jedusor in his original paper introducing mimblewimble. It is non-interactive since Alice only has to send one package of data to Bob, and does not need to enter into two-way communication with him. It does require Alice to provide some private information to Bob, specifically the signed sum –x1 + x3 of her blinding keys. However, as long as they are chosen entirely at random, this does not reveal the individual values of x1 and x3. It is a bit restrictive in that it only applies to a payment to a single individual.
An alternative approach is to use a signature aggregation method, such as a variant of musig, for the parties to produce a joint signature without revealing any information on their private keys. Recall that a Schnorr signature for private key C and message m consists of an integer s taken modulo the size p of the group E and a public nonce R in E satisfying
|s.G = R + h(R‖m).C.||(2)|
Here, h() is a cryptographically secure hash function such as SHA256. This is easily solved for someone in possession of the private key x satisfying C = x.G. They simply choose a (secret) nonce r at random in the range 0 through p – 1 and set R = r.G and s = r + h(R‖m)x. On the other hand, without knowledge of the private key x, finding a valid signature is effectively impossible.
For the multi-signature case, we have a number n parties, each with a private key xi summing to x = x1 + x2 + ⋯ + xn, although no single individual is in possession of this joint private key. This is the situation above, where multiple parties to a mimblewimble transaction each know the private key used to generate the commitment for their inputs and outputs (reversing the signs for the transaction inputs), but not those of the others.
A joint signature can be created by performing the following steps.
- Each party chooses a secret nonce ri at random in the range and computes their public nonce Ri = ri.G.
- The parties share their public nonces Ri, and all compute the joint nonce R = R1 + R2 + ⋯ + Rn.
- Each party computes and shares their signature component si = ri + h(R‖m)xi.
- The combined signature is computed as (s, R) where s = s1 + s2 + ⋯ + sn.
It can be checked that the resulting signature does satisfy (2).
Some care needs to be taken sharing the public nonces in the second step above. For security, it is in each person’s interest that resulting combined nonce is entirely random. Any participant can enforce this by selecting their nonce uniformly at random, and independently of the other’s choices. However, this does not work if any of the other parties chooses their nonce after seeing the other ones, since this could affect their choice and independence is lost. For example, if one person chose their value to be negative the sum of everyone else’s, then the summed value R would simply be 0. To avoid this, before sharing, we can start by having everyone commit to their nonce without sharing its value. This can be done by sharing their hashes H(Ri) first, after which Ri are shared. Then, each person can verify that everyone’s nonce does have the correct hash and no-one has changed their mind upon seeing other people’s choices.
This simple aggregation method works, and allows a signature to constructed for the transaction without having anyone reveal their private key to anyone else. However, it does require some rounds of communication, in both directions, between all participants. It is not possible for Alice to simply send a block of data to Bob and be done with it. Unlike the method described by Jedusor above, the musig approach here is much more general, but is interactive.
Let the real mimblewimble magic begin — transaction cut-throughs! A typical block of a blockchain will contain many individual transactions. With mimblewimble, all of the transactions can be combined into a single big one in such a way that it is not possible to disentangle it back into the original list of discrete small transactions. This offers security benefits and saves some space.
Actually, we can go further and combine transactions across blocks. Even across the entire blockchain so that it consists of just one huge transaction! Going even further, whenever a transaction output is spent by the input of a later transaction, these inputs and outputs can be cancelled out, by a process known as transaction cut-through. Taking this to its conclusion, the blockchain will only contain a single massive transaction, whose outputs consist of the current utxo set. Its inputs will only contain the initial creation of coins, such as through block rewards.
Transaction cut-through offers potentially massive savings of blockchain space since all spent coins are stripped out, and the size of the blockchain grows linearly in the size of the current utxo set, but not in the number of historical transactions. Actually, as we will see, some signature data of individual transactions will be retained even after applying transaction cut-through, but this is relatively small (32 bytes per transaction).
The basic idea is simple enough. Each transaction consists of a list of inputs and outputs or, more precisely, the commitment values for these inputs and outputs. In addition, range proofs are provided for the outputs To combine transactions, then, we simply concatenate the lists of inputs and also the lists of outputs, together with their range proofs. If any output was spent by another transaction also in the list, then it is deleted from the combined transaction along with the corresponding output of the same commitment value.
This is all straightforward. If that was everything there was to it, all we would end up with is a list of inputs and outputs with range proofs. There would be no way to validate the result, so we cannot tell if the list of outputs really was derived from the inputs through a series of valid transactions. With Bitcoin, we need to validate the signature associated with each transaction input, using the transaction itself as the message being signed. We would still need to iterate through all of the original transactions validating all the signatures, so there would be no gain from applying transaction cut-through just described.
Mimblewimble is different! First, recall that there is just one signature per transaction, and the associated message is just the empty string, not the entire message. So, to validate the combined big transaction, we need to keep the original list of signatures and loop through these validating them against the empty message. Validating a signature also, obviously, involves the public key to which it is associated which, in our case, is the excess commitment value of the transaction. However, now there is no need to keep hold of the original transactions which added up to the combined big transaction being validated. By additivity, the excess commitment of the combined transaction is just the sum of the excesses of the original transactions, and this can be checked instead. Hence, after combining all original transactions and applying cut-through, we end up with the following
- A list of input commitments.
- A list of output commitments together with range proofs.
- A list of excess commitment values Ci, together with signatures.
In addition, due to transaction fees and block rewards, one of the inputs or outputs can be an explicit amount with no blinding factor. As these are easily combined, there should be at most one.
The master transaction can now be validated with the following steps.
- Iterate through and validate the inputs. They should be unspent outputs of earlier blocks (if we haven’t already combined all the blocks), an explicit amount not exceeding the block rewards, or possibly other inputs allowed by the protocol for minting new coins.
- Iterate through the outputs checking their range proofs.
- Compute the transaction excess C as the sum over output minus input commitment values.
- Verify that the provided list of excesses Ci add up to the combined transaction excess,
C1 + C2 + ⋯ + Cn = C.
- Iterate through the list of signatures using them to validate the empty message using the Ci as public keys.
This is as described by Jedusor in the original mimblewimble paper, so along with the list of inputs and outputs, we have a list of excess values (public keys) and signatures. In the description of a mimblewimble transaction above, they were required to have one signature. If this is generalised a bit to allow an arbitrary number of signatures and associated public keys, then when they are combined, the result looks just the same as an original individual transaction.
There are still a couple of issues remaining. First, one of the advantages of combining transactions is that it is no longer possible to split it back up into the original ones. However, using the provided excesses Ci, this is not quite true. As was pointed out by user andytoshi in a 2016 reddit thread, this is easily solved. We simply allow each transaction to contain an additional arbitrary ‘excess’ k.G, which is added to the transaction excess to be used as the public key to be signed. When combining transactions, public keys are stored, but only the sum of the k.G values are stored. So, the combined transaction still has a single additional k.G value, and the list of public keys cannot be related directly to the inputs and outputs of the original transactions. This may be unnecessary if the signature data is further reduced so that Ci are not stored, as I will now describe.
Another issue is that we are still storing one public key and signature for each of the original transactions, but it is possible to do better. As mentioned above, it is possible to get this down to 32 bytes per transaction (whereas a public key and signature will be about 3 times this).
Suppose that we have a set of n transactions, with excesses Ci summing to C. According to (2), a Schnorr signature for each of these consist of integers si and public nonces Ri lying in the elliptic curve group E satisfying
|si.G = Ri + h(Ri).Ci.|
To make this linear in Ci, divide through by h(Ri) and, after a slight change of variables, we obtain
|si.G = h(Ri)-1.Ri + Ci.||(3)|
Summing over all transactions, we obtain
|s.G = h(R1)-1.R1 + ⋯ + h(Rn)-1.Rn + C||(4)|
where s = s1 + ⋯ + sn. To validate, we only require the value s, the list of Ri, and the excess C of the combined transaction. If (4) is satisfied, then the transaction is valid. Per transaction, all that is retained is the nonce value Ri, which can be represented using 32 bytes. This also throws away the individual excesses Ci and, as such, removes information which could be used to decompose into the original transactions. The retained nonce values should be selected at random independently of the transaction, so do not contain any information.
This approach does not lose any security over retaining all the individual signature information since, given s and Ri satisfying (4), it is straightforward to choose a list si summing to s and back out Ci from (3). These would pass the validation check described above for signing all the individual excesses, so validating each individual excess can be no more secure than just checking (4).
There are still some issues that I have skipped over, but need to be addressed in any implementation. For example, we need to be sure that it is not possible for someone to modify an existing transaction to spend coins that they should not have access to. For example, we could try switching all inputs and outputs in a transaction, and reversing the sign of the signature, in order to undo the existing transaction. This is not a problem with Schnorr signatures, but does affect some other schemes. To avoid this, the paper by Andrew Poelstra suggests using sinking signatures which incorporate the block height. They also suggest using a variant of BLS signatures instead of Schnorr, since these can be aggregated without having to retain any per-transaction information such as the Ri above. Such signature schemes do, however, require some additional cryptographic assumptions to be secure.
One final point. So far, I have concentrated on transactions alone but, in any blockchain, the transactions have to be anchored to the blocks. Without this, we cannot be sure that the provided transaction or transaction list is indeed contained in the blockchain. Usually, this means that each block header contains a commitment to the transactions in the block. In Bitcoin, for example, this is done by including the Merkle root of the transaction list.
With mimblewimble, once transactions are combined, it would no longer be possible to validate the block header by checking that it correctly commits to the block transactions. A way around this is to, instead, have the block header commit to the entire utxo set at the given block height. This works, although it does mean that when validating a blockchain we would not want to validate the utxo commitment in every historical block header, as this would be computationally expensive and would require transaction information for every block. Instead, we just need to validate the utxo commitment at the current block height. In this case, the only remaining role played by the historical block headers is to check that they link all the way back to the first (genesis) block, and satisfy the necessary proof of work conditions. In the paper by Andrew Poelstra, it is argued that it is possible to also remove most historical block headers from the chain, resulting in a blockchain growing only logarithmically in the height instead of linearly.
Every output of a mimblewimble transaction should contain a commitment value C, and also a range proof. This is to verify that C is of the form (1)
|C = a.H + x.G.|
for some positive quantity a lying in a predetermined range, such as 0 ≤ a < 264. This is vital to ensure that users cannot create transactions with negative quantities or with huge quantities which overflow, as this would allow other outputs to be greater than the inputs and so arbitrarily generate new coins. We fix the upper bound 2N big enough to allow sufficiently large payments, yet small enough to avoid overflows. However, it is also important not to reveal the values of x and a. This requires a range proof to be a zero knowledge proof of the statement:
I know of integers a and x satisfying (1) and such that 0 ≤ a < 2N.
Using Bitcoin as an example, the minimum monetary unit is the satoshi, 100 million of which equals one bitcoin. The total number of bitcoin which can ever exist is less than 21 million — or 2.1 quadrillion satoshi. This is less than 251, so using N = 64 is more than big enough for a transaction quantity upper bound. On the other hand, assuming that we use an elliptic curve whose size is a 256 bit number, it would be impossible for overflow to occur unless the number of outputs exceeds 2192, which is a ridiculously high number. In practice, we may also want to exclude zero quantity transactions, but this is easily achieved by applying the range proof to C – H instead of C, so I ignore this point here.
As a range proof is required for every single unspent transaction output, they can easily make up most of the blockchain space. It is, therefore, important that they use as little space as possible. This is an area of ongoing research, with the paper Bulletproofs: Short Proofs for Confidential Transactions and More describing a particularly efficient method. See, also, Bulletproofs and mimblewimble for a more easily readable high-level description. In the current post, rather than looking at bulletproofs, I will detail one especially simple range proof construction. This is to give an idea of how it is possible, even though it does lead to much larger proofs than the state-of-the-art.
For the first step, let us consider a binary expansion,
|a = a0 + 2a1 + 22a2 + ⋯ + 2N - 1aN - 1|
where the ai are each equal to either 0 or 1. The idea is to convert our original range proof for a to a sequence of much simpler ones, one for each of the binary digits. As these digits can only take 2 different values, which is a very small range, it will be straightforward to build range proofs for these. Split the private key x as a sum of terms
|x = x0 + x1 + ⋯ + xN - 1|
The xi terms should be chosen uniformly at random in the range 0 to p – 1 subject only to this constraint. Defining new ‘commitment’ values
|Ci = 2iai.H + xi.G||(5)|
these automatically sum up to
|C = C0 + C1 + ⋯ + CN - 1|
As they are uniformly random points of the elliptic curve E subject only to this constraint, revealing the values of Ci does not leak any information, so is zero knowledge.
It is only required to produce a range proof for each Ci, verifying that it satisfies (5) for integers ai and xi satisfying 0 ≤ ai < 2. This is a vastly reduced range compared to what we had originally, so can be handled more directly. Summing (5) over i will then complete the required proof that 0 ≤ a < 2N in (1).
Note that, for any fixed index i, if we set P1 = Ci and P2 = Ci – 2i.H then a range proof for (5) is equivalent to proving that we know of a value x such that either P1 = x.G or P2 = x.G. This can be achieved using ring signatures described below.
To complete the construction of range proofs we need to produce a zero knowledge proof that, for two points (public keys) P1 and P2, we either know of a value x such that P1 = x.G or we know of an x satisfying P2 = x.G. Things would be easy if, instead, we were trying to show that we know of a value x such that and we know of an x satisfying . Just produce two digital signatures, one for each of P1 and P2. This is standard assuming, of course, that we do know both of the corresponding private keys.
So, how can we change the ‘and’ into ‘or’? In fact, it can be done in a very similar way to producing a pair of Schnorr signatures, one for each of the points P1 and P2. It only differs in that, for the equations validating these signatures, the hash values of the nonces R1 and R2 are exchanged.
s1.G = R1 + h(R2).P1,
s2.G = R2 + h(R1).P2.
The ring signature consists of the values (s1, s2, R1, R2). That is all! The first line involves h(R2) and the second h(R1). If these terms were exchanged, then it would just be the validation formulas for a pair of Schnorr signatures (s1, R1) and (s2, R2).
If we know of a private key x for P1 which, by definition, satisfies x.G = P1, creating a ring signature is easy. Just choose integers r1 and s2 uniformly at random modulo p and set,
R1 = r1.G,
R2 = s2.G – h(R1).P2,
s1 = r1 + h(R2)x.
Similarly, producing a signature is easy if we know the private key for P2.
The idea of ring signatures extends in a straightforward fashion to any finite number of public keys (points on the curve E) P1, P2, …, Pn. A zero-knowledge proof that we know of an integer x satisfying x.G = Pi for some value of i consists of pairs (si, Ri) satisfying
|si.G = Ri + h(Ri - 1).Pi|
for all i = 1, 2, …, n. Note that each equation depends on the preceding one through the h(Ri - 1) term and, for i = 1 we take R0 = Rn. This arranges the equations in a circle, with each depending on the previous one.
Finally, lets look at the size of a range proof constructed as above for 0 ≤ a < 2N. For each 0 ≤ i < N we only need to store the 4 values (s1, s2, R1, R2) as P1 and P2 can be backed out from (6). Supposing that we use an elliptic curve defined modulo a 256 bit prime, as used by Bitcoin, each of the curve points Ri and signature values si can be represented using 256 bits, or 32 bytes. This gives 32 × 4 × N = 128N bytes for storing the ring signatures, or 8kB of data. This can be reduced to 6kB either by aggregating the signatures in a similar fashion as was described above for combining transactions, or by using a base-4 expansion instead of binary. By contrast, according to the paper Bulletproofs: Short Proofs for Confidential Transactions and More, bulletproofs would only be 688 bytes in length.