First, let’s recap what a zkSTARK is. It is an acronym for zero knowledge Scalable Transparent Argument of Knowledge. While they are often phrased as proofs of existence of some quantity (e.g., existence of 3colorings of a graph), more properly, they are a proof that the person constructing the STARK has knowledge of this quantity. We can be a bit more precise. The setup must include some algorithm or computable function F and a value b which it may or may not return. This information will be publicly known, so that anyone can verify what F(a) does evaluate to on given input a. Anyone can also make a statement such as:
I know of a value a such that F(a) evaluates to b.
However, such a statement does not itself prove that they do really know of such a value. An argument of knowledge is a method of proving the truth of this statement. As an example, consider digital signatures. In this scenario, the value a is a private key and F is the calculation deriving the associated public key b. A digital signature is an argument of knowledge of a private key. For elliptic curve based schemes as used in ECDSA and Schnorr signatures, it can be shown mathematically that for any public key there does exist an associated private key, although it can be practically impossible to actually find it. Hence, a digital signature is not a proof that a private key exists, which we know already. Instead, it is a proof that it was known to the person constructing the signature.
One method of proving a statement like the one given above is to simply display the value of a, so that anyone can evaluate F(a) and verify that it does indeed equal b. There are two reasons why this may not be a desired or viable approach, corresponding to the other words making up the acronym zkSTARK. Firstly, the value of a may be a secret, such as with the digital signature example. It should be zero knowledge. Ideally, the proof does not reveal any information beyond the fact that the prover knows of a valid value for a. Secondly, it should be scalable.
By scalable, we mean that the size of a zkSTARK should remain small and reasonably quick to validate, even if F is very computationally expensive to compute. For example, we could take the value a and hash it billions of times to get the result b. If we want to convince others of the result, then expecting them to go through the whole tedious calculation might not be realistic. Instead, a STARK can be constructed, which anyone can check proves that the result is b as claimed without having to redo the entire calculation. This is especially useful for blockchains where the use of STARKS can greatly reduce the work required by validators and avoid using up a lot of the limited block space. A long list of transactions with their associated digital signatures could be replaced by a much shorter proof of the combined effect of these transactions.
The scalability property is really the magic part of STARKs and the closely related SNARKs. A very long calculation will take a long time to perform. However, once one person has done this, they are able to prove the result to others without requiring them to also go through the long calculation. We really do not have any right to expect that this is even possible, but luckily it is!
Next, STARKs are transparent, which means that there is no trusted setup. This is in contrast to the closely related SNARKs (scalable noninteractive argument of knowledge), where a procedure is required to initialize the contruction of the SNARKs, and involves data which must be kept secret by those involved. Revealing this data would allow ‘proofs’ to be constructed even though the prover does not have the knowledge being claimed. STARKs do not have this issue.
Finally, I mention that STARKs are noninteractive. Any argument of knowledge involves two parties, the prover and the verifier, with the former trying to convince the latter of the truth of a statement. Interactive proofs involve messages being passed backandforth between the two. Essentially, the verifier asks questions of the prover, and checks his responses, until she is convinced of the proof of the original statement. A noninteractive proof just consists of a single message sent from the prover to the verifier. This is what is needed in many blockchain applications, since the proof can simply be constructed and added to the chain. Anyone can be the verifier by reading the data from the blockchain, such as done by validators.
As with any argument of knowledge, STARKs are sound, meaning that it is practically impossible for someone without the claimed knowledge to construct a valid proof. However, I should point out that, in theory, it is always possible to just guess at a valid argument by blind luck. For this reason, any such construction will come with a soundness parameter. This is an upper bound on the probability that, without the claimed knowledge, any parameter choices made in its construction leads to a valid proof by chance. The idea is that this should be tiny to avoid such false positives. It is true that an untrustworthy prover could try over and over, choosing different parameters each time, to try and bruteforce a solution. As long as the soundness parameter is small enough — say, about 2^{100} or lower — then it becomes practically impossible to even bruteforce a solution.
The toy calculation used in this post is the elementary cellular automaton Rule 30. The idea is that we have a one dimensional array of cells, indexed by the integers. Each cell can be in one of two states, either set or unset, which I will label by the numbers 0 and 1. We then iteratively update this array, one step at a time. At every step of the calculation, each cell is updated according to the value of it and its immediate neighbours.
Using s_{i}(t) to represent the state of cell i at time t, I will also denote this by just s_{i}, suppressing the time variable for brevity. Its value s_{i}(t + 1) at time t + 1, which I denote by s′_{i}, is some function of s_{i  1}, s_{i}, s_{i + 1}.
s′_{i} = S(s_{i  1}, s_{i}, s_{i + 1}). 
The rule is defined by the function S and, given any initial state, this can be applied to compute the state at every future time. There are many different choices for S giving different cellular automata with different properties. I choose rule 30, for which the state s′_{i} is determined from s_{i  1}s_{i}s_{i + 1} by looking up its value in the following table:
As there are 8 possible values which s_{i  1}s_{i}s_{i + 1} can take, and each of these can lead to two possible states for s′_{i}, there are 2^{8} = 256 distinct elementary cellular automata. This particular one is called ‘rule 30’ since the second row of figure 1 is the binary expansion of the number 30.
To avoid dealing with infinite arrays, I will use a fixed window width of 200. That is, rather than an infinite array of cells, we just have 200 of them labelled as s_{i} for 0 ≤ i < 200. At the edges of the window, we allow the rule to wrap around, so that, on the left, s_{1} is defined to equal s_{199} and, on the right, s_{200} is defined to be s_{0}.
Then, if we start with the single cell number 50 set, and the others unset, repeated applications of the rule give figure 2 below. Here, the time starts from the top of the figure and increases towards the bottom. Each row of pixels represents the state at the corresponding time, with black pixels representing cells which are set and white those which are unset.
The pattern of 0s and 1s expands to fill the whole window, and becomes rather chaotic with randomly appearing triangular gaps below any consecutive sequence of 1s. Let’s try again with the following starting state:
0101101001100101011100100110111100100000010010110110111001101111011101110110110001100101011001000110011101100101
This is the ASCII data for the string “Zero Knowledge”, which I pad with zeros to be 200 cells wide. Starting with this state, the first couple hundred iterations are displayed in figure 3.
Now, let’s keep going. Starting with the same state as used in figure 3, but continuing for a billion iterations, the final couple hundred iterations are shown in figure 4.
The first hundred cells of the final row are:

(1) 
So, if we start with the “Zero Knowledge” message and apply rule 30 one billion times, the first 100 cells are above. We consider building a zkSTARK to represent the statement that we know of a message such that, if rule 30 is applied one billion times, then the above sequence is obtained.
This is a kind of hash function, where we start with an initial message and, after applying the rule, we obtain the output string above. Going in the opposite direction from the final string to the initial message is effectively impossible. This is very much like applying the SHA256 hash function. In fact, the ideas used here could in principle also be used for applications of SHA256. I use rule 30 since it is very simple, and we have the added complexity of using a large number of iterations so that, even if someone knew the initial message, it would take a fair amount of computing power to directly verify.
The first step in building a STARK is to convert the execution rules for the calculation into algebraic equalities, known as the Algebraic Intermediate Representation (AIR). That is, they should only involve the operations of addition, subtraction and multiplication. Rule 30 scan be expressed using the logical OR and XOR operations.
s′_{i} = s_{i  1} XOR (s_{i} OR s_{i + 1}) 
We will interpret the variables s_{i} as lying in a finite field 𝔽. I will take this to be the integers modulo a prime number p. For the moment, the important facts are that we can perform addition, subtraction and multiplication in 𝔽. That the cells s_{i} only take the values 0 or 1 in the field can be expressed by the algebraic identity
s_{i}^{2} = s_{i}.  (2) 
Next, in any field, the logical operation of OR and XOR can be written algebraically by

Let’s apply this to rule 30. It turns out to be a bit simpler if we rearrange it so that the XOR operation is on the left hand side, avoiding iterated logical operations in a single step
s′_{i} XOR s_{i  1} = s_{i} OR s_{i + 1}. 
Writing the logical operators using the algebraic representations above expresses rule 30 as the second order expression
s′_{i} + s_{i  1} – 2s_{i  1}s′_{i} = s_{i} + s_{i + 1} – s_{i}s_{i + 1}.  (3) 
The final condition is that the first hundred values of the final state are as claimed. Letting a_{i} be the 01 values in (1), we simply state the equality
s_{i}(N) = a_{i}  (4) 
for 0 ≤ i < 100.
The idea is to represent the evolution of each cell over the iterations by a polynomial function. Assume that N iterations of rule 30 are applied so that, in our case, N is equal to one billion. Specifically, for each index i, cell number i will be represented by polynomial f_{i}. Fix element ω of the field such that ω^{n} are all distinct over the range 0 ≤ n ≤ N. We will take ω to be a root of unity of some order 2^{s} > N. Then, let f_{i} be the polynomial over our field 𝔽 which traces out the values of cell i after n iterations
f_{i}(ω^{n}) = s_{i}(n) 
as n runs through the integer values from 0 up to N. By Lagrange interpolation there does exist exactly one such polynomial of degree less than or equal to N. Of course, this is a very high degree and storing the coefficients of f_{i} would take up a lot of space. Since N is a billion, this means storing a billion coefficients lying in the field 𝔽, which is even more space than is required just to store the original binary digits s_{i}(n). However, the polynomial will not be stored in the STARK. Instead a hash — or Merkle root — of its values is stored. Using SHA256, this is 256 bits or 32 bytes, regardless of the degree N. In addition, its values at a pseudorandomly selected set of point will be stored. Just enough points to statistically verify the claimed properties, which will be orders of magnitude lower than N.
It will be convenient to represent f_{i} on the set of all powers of ω,
W = {1, ω, ω^{2}, ω^{3}, …} .  (5) 
As ω is chosen to be of order 2^{s}, these powers will start to repeat once W is of size 2^{s}. This will be a bit bigger that N, it does leave some elements on W on which f_{i} is not specified, where it can be set to whatever we want. To keep the redundancy to a minimum, 2^{s} should be taken to be the smallest power of two greater than N, which is 2^{30} when N is one billion. The result is that f_{i} is defined on W and can be extrapolated as a polynomial of degree less than 2^{s}.
We define Z_{N} to be the polynomial vanishing over the execution trace,
(6) 
The fact that s_{i}(n) are 01 valued over the range 0 ≤ n < N means that f_{i}^{2} – f_{i} vanishes at the points ω^{n} by identity (2). Using polynomial factorization, this is equivalent to the identity
f_{i}^{2} – f_{i} = g_{1i}Z_{N}.  (7) 
for some polynomial g_{1i} of degree less than 2^{s}.
Similarly, the algebraic representation (3) of rule 30 can be expressed by a polynomial identity. Using f_{i}○ω to represent the polynomial f_{i}(ωX) it is equivalent to
f_{i}○ω + f_{i  1} – 2f_{i  1}f_{i}○ω – f_{i} – f_{i + 1} + f_{i}f_{i + 1} = g_{2i}Z_{N}  (8) 
for some polynomial g_{2i} of degree less than 2^{s}.
The final state (4) says that f_{i} – a_{i} vanishes at the point ω^{N} or, equivalently,
f_{i} – a_{i} = (X  ω^{N})g_{3i}.  (9) 
over 0 ≤ i < 100 for some polynomials g_{3i} of degree less than 2^{s}.
The original claim that we know of an initial state for which N applications of rule 30 gives the values a_{i} is equivalent to the existence of polynomials satisfying identities (7,8,9). It is only really necessary for the prover to reveal values of f_{i}, since the g polynomials can be computed from these.

(10) 
The claim that the prover needs to show is that these are also polynomials of degree less than 2^{s}. It is enough to show this on any subset S of the field 𝔽 of size at least twice the claimed degree. Then, by polynomial extrapolation, (7,8,9) hold everywhere and, in particular, hold on the execution trace.
Using the ideas above, the argument of knowledge will require polynomials with degree up to 2^{s}, which can be of the order of billions. Although, by any usual standards, these are very big polynomials, they are often referred to as ‘small degree’ in the literature! This is because an arbitrary function defined on a subset S of a finite field 𝔽 would usually have degree just one less than the size of S, so anything less than this places a big restriction on the set of allowable functions. In fact, there is a special name for them. They are ReedSolomon codes and used for error correction by projecting a received message onto this relatively small subset of allowed polynomials.
Revealing all of the polynomial coefficients would take up a lot of space and require the verifier to compute a lot of terms, which is not what we want. It would also not be zero knowledge. Instead, the prover will just reveal its values at a small number of randomly chosen points. However, this also raises the problem that the prover could be just making up the values as they go, rather than being consistent with a preconstructed polynomial.
These problems are addressed by having the prover first send a commitment to the polynomial. This is a Merkle root of the function values on the domain. Then, when he sends the function values at the selected points, a Merkle proof is also sent to guarantee that the values are indeed those of the function to which he has committed.
As an example, consider a function defined on a set S of size 2^{10} = 1024. Label the points in order x_{0}, x_{1}, …, x_{1023}. In the STARK, these points will follow a geometric sequence x_{n} = cω^{n}. We suppose that the function values are 128 bit integers, so can be represented by 16 byte strings. This gives a binary Merkle tree of depth 10. To achieve zero knowledge, the function values can be hidden by concatenating with a nonce, which I take to be a random 16 byte string, before taking its hash. This ensures that, even if you were to guess the value of the function at any point, revealing the hash does not provide any information on whether the guess is correct, since it is effectively impossible to also guess the nonce. So, the only information that the prover reveals is the function values at the specified points.
I use the SHA256 hash function, so that the hash values and Merkle root are all 32 byte strings, which I show as 64 character hexadecimal numbers. Suppose that the prover reveals the Merkle root as:
Merkle root = 06e893aec8533e367ebadf5da0cfe17ce7b90d01c7bb014f97b8a43e0f71e5e7 
Consider selecting a random point. Say, x_{469}. I chose this by taking the rightmost 10 binary digits of the SHA256 hash of the Merkle root, which has binary expansion 0111010101 or i = 469 in decimal. This ensures that it was not cherrypicked. The prover should reveal not only the value of the function at this point, but also its merkle proof and nonce. In this case:
Value = 775c8c7091445e6dcd26d45c6a525ff5 
Nonce = bb8ea60a3e38ed487f3f5794fbff155e 
Merkle proof = [ 
57c6a608ae818aa1227bbd274b31b87aa681e2726a8a72540699c0c7d2ae5ca7, 62adae7f75814e50f06e9516a370e06cb35ea5fbd02900ed37ec2f4124254521, 0368314612e835c54e1d5e2367fdd9debdc5d68c5f5708fc37eb16c2c661c65c, af8e55b2a734b42082928e57dac5d46259cff6433968632b106c1cff8e0d8fcd, 279de58f7b64e5287a7df215176f832883b008bc9d8bc73c469c6667f6207ac1, 9ccd17e7215de3386dd405c430f29b53fa63fdf365a10927abe155a6af43f42c, 95eeecb30f68037d23ddc0849aaa2f5a430205590f0f8fbb7d32f6cb6499902a, cc2c3eb3959c05890d9dd80088c63f2552417e7b4a6609dd097886e7cb94aff2, a716419a1b9095874dafab2b5f8c6e42c4aa99aac360e9bd2fd5ca97d4686b3b, 40a947966de220a4a6a5537f576e18dde4bd5ae41b8e28a8ca4e78ae1a6acb01 
] 
The verifier can confirm that the provided value is indeed from the function with the specified commitment by applying the Merkle proof. This means performing the following calculations, where ‘mp’ denotes the Merkle proof and ‘h’ is the hash being computed. The concatenation of arguments in the call to the hash function is denoted ‖, with the order of arguments being determined by the path up the Merkle tree — when the corresponding digit of the index 0111010101 is 0, the hash h appears on the left, otherwise it is on the right.
h =  SHA256(value ‖ nonce) 
h =  SHA256(mp[0] ‖ h) 
h =  SHA256(h ‖ mp[1]) 
h =  SHA256(mp[2] ‖ h) 
h =  SHA256(h ‖ mp[3]) 
h =  SHA256(mp[4] ‖ h) 
h =  SHA256(h ‖ mp[5]) 
h =  SHA256(mp[6] ‖ h) 
h =  SHA256(mp[7] ‖ h) 
h =  SHA256(mp[8] ‖ h) 
h =  SHA256(h ‖ mp[9]) 
=  06e893aec8533e367ebadf5da0cfe17ce7b90d01c7bb014f97b8a43e0f71e5e7 
The result agrees with the Merkle root given above, showing that the prover did provide the correct function value.
The example given here was only for a domain of size about 1000. This was just to keep it reasonably short for the purpose of displaying here. We can easily go up to size billion, so a millionfold increase, while only scaling the length of the Merkle proof by three.
So far, so good. If the prover has computed a function on a finite domain, then Merkle trees provide an efficient way to commit to this function, and reveal its values at an arbitrary set of points. For the STARK we need to construct here, the functions will be polynomials with values specified on some large number N of points, where N can be of the order of billions. We could try computing the polynomial coefficients directly using Lagrange polynomials or something similar, then evaluate separately at every point of the domain S. Each polynomial evaluation and coefficient has of the order of N complexity to compute. So, the total calculation would take the prover of order N^{2} time, which can be huge. Do we really expect the prover to go through such a longwinded and, possibly, infeasibly long calculation? Fortunately, we do not need to as Fast Fourier Transform methods can be applied to reduce the complexity to NlogN.
Consider a function f specified at a number d of points, which will be taken to be a geometric sequence c, cω, cω^{2}, …, cω^{d  1} in the field 𝔽. Representing f as a polynomial of degree less than d,
Now consider evaluating at the points of the geometric sequence,
If ω is a primative d’th root of unity, then this is just the Fourier transform of the terms b_{m}c^{m}. We would usually choose d to be a power of 2 so that efficient Fast Fourier Transforms (FFT) can be used. If the values of f are specified on the d points of the domain, an inverse FFT generates all of the coefficients at once. Then, for any other field element c, we can scale by powers of c and apply an FFT to compute f at points {c, cω, cω^{2}, …, cω^{d  1}} , which we denote as the set cW.
All this means is that the prover can use the FFT algorithm to evaluate f at all points of a subset S of our field, as long as this domain is closed under multiplication by ω. Equivalently, S is the collection of points c_{i}ω^{n} for a number M of field points c_{i} and over the range 0 ≤ n < d. This gives a domain of size Md and, by performing M applications of FFT, computes the function on S in a time of order Mdlogd.
While this is much faster than the Md^{2} time required to separately perform polynomial evaluation at every point, it will still be a rather long calculation for large degrees d. It is therefore required that the prover has significant computing power at their disposal — at least, when compared to the verifier who only has to verify the polynomial identities at a comparatively small number of points.
To build a STARK style argument of knowledge for an execution trace of length N, we start by choosing a power of two, 2^{s} > N. The idea is that our polynomials representing the execution will have degree less than this and be represented on roots of unity of this order. For N around 1 billion, we take s = 30.
The next step is to choose the finite field 𝔽, which we will take to be the integers modulo a large prime p. The restrictions are that p – 1 should be a multiple of 2^{s} in order that the required roots of unity exist, and also that 1/p should be a negligible probability. This last condition is so that there is a negligible chance of the verifier randomly choosing one of a small number of field values for which the prover could construct an argument without actually having the claimed knowledge. I take p to be slightly less than 2^{128}, since numbers less than this can be represented conveniently in 16 byte blocks, although larger values can be used for more security. So, we can take p of the form 2^{128} – 2^{30}m + 1, and searching for the smallest value of m making this prime gives,
p = 2^{128} – 36.2^{30} + 1. 
Next, we need to find a root of unity ω of order 2^{s}. To do this, just choose an integer x and set ω = x^{2–s(p  1)} (mod p). Fermat’s little theorem guarantees that this is a 2^{s}‘th root of unity but, to ensure that it has exactly this order (rather than a fraction of it), we check that ω raised to the power of 2^{s  1} is 1 (mod p). If it doesn’t we just try again with a different x. In our case, this succeeds for the choice x = 3 giving,
ω = 0x17ead9889fdb09b21c85d0cfd3bdee85. 
The powers of ω define the set W (see equation 5) on which to represent the execution trace.
Next is to choose a domain S in our field on which the prover will commit to the polynomial values which, in order that the execution values are not revealed, should be disjoint from the powers of . We will want its size to be a small multiple M (say, 4) of 2^{s} and, in order to be able to apply Fourier transforms as discussed above, S should the union of scaled copies cW of W. This can be achieved by choosing a sequence c_{1}, …, c_{M} in the field, and letting S consist of elements of the form c_{i}ω^{n},
S = c_{1}W ∪ c_{2}W ∪ ⋯ ∪ c_{M}W.  (11) 
We can set,
{c_{1}, c_{2}, c_{3}, c_{4}} = {2, 3, 4, 5} 
The sets c_{i}W will be disjoint, which can be checked by raising c_{i} to the power of 2^{s} (mod p), and seeing that they are distinct. Also, none of them raised to the power of 2^{s} equals 1 modulo p, so S is disjoint from W.
The setup above consists of the choice of prime p defining the field, the root of unity ω of order 2^{s}, the scaling factors c_{1}, c_{2}, c_{3}, c_{4} defining the evaluation domain, and the polynomial identities (10) to be verified. These are all fixed beforehand before moving on to the argument of knowledge. This will be scalable in the sense that, even for statements which require a very long computation to directly verify, the procedure described here should not require very many calculations by the verifier or require a lot of information to be processed. The statement regarding the result of a billion iterations of Rule 30 above should involve much less computation by the verifier to convince herself of the truth of the result.
I will first describe the interactive version, consisting of messages between the prover 𝒫 and verifier 𝒱. The noninteractive version will be derived from this later.
The polynomials here are computed from the execution trace of the ‘rule 30’ algorithm, taking values
f_{i}(ω^{n}) = s_{i}(n) 
over 0 ≤ n ≤ N. The prover needs to extrapolate these to the set S as polynomials of degree less than 2^{s}, which can be done using Fourier transforms as described above. Using FFT algorithms, this takes of order 2^{s}s mathematical operations which, using s = 30, is about 30 billion. This is the most computationally heavy part of the whole proof for the prover. After this, the prover needs to build up the Merkle tree and, finally, sends just the Merkle root to the verifier. At a later stage of the procedure, when the prover sends function values, he will also send Merkle proofs in order to guarantee that he is indeed sending values of the function committed to at this step.
The verifier needs to be able to trust that the f_{i} are all polynomials of degree less that 2^{s}, as are the functions g_{ji} defined by (10). If that was true, then they would indeed represent an execution trace for the Rule 30 automaton with the claimed final states.
Note that we have hundreds of functions here. Fortunately, there is a trick to reduce it to a single one. Simply take a random linear combination of them. If the prover can show this to be of the required degree, then almost certainly the original functions are too.
Theorem 1 Let 𝔽 be a finite field of size p and f_{0}, f_{1}, f_{2}, …, f_{n} be functions from a subset S to 𝔽. Let λ_{1}, λ_{2}, …, λ_{n} be independent and uniformly distributed random field elements.
Suppose that f_{i} are not all polynomials of degree less than d. Then the probability of the linear combination
being a polynomial of degree less than d is at most 1/p.
Assuming that 1/p is a negligible probability, this theorem says that showing the linear combination to be of degree less than 2^{s} is sufficient to imply that the same is true of all of the f_{i}. So, the first thing the verifier does is to choose random coefficients in order to reduce the proof to a single polynomial.
The idea is that these coefficients define a new function on the domain,
We could ask the prover to also send a commitment to this linear combination. I will not do this and, instead, the verifier can compute it directly as soon as the prover reveals values of f_{i}.
It remains for the prover to convince the verifier that the values of h_{0} are indeed chosen according to a polynomial of degree less than 2^{s} or, at least, are chosen in this way on most of the domain. By polynomial interpolation, any function can by approximated by such a polynomial on up to 2^{s} points, so it sounds like we would need to sample h_{0} at more values than this, which defeats the whole scalability idea behind STARKs. Fortunately, there is a trick which can be used to efficiently prove that h_{0} satisfies the claimed property with a high degree of certainty. This is known as a Fast RS IOPP, an FRI or, in full, a Fast ReedSolomon Interactive Oracle Proof of Proximity.
The idea is to first break h_{0} up into two polynomials of half the degree. I use S^{2} to denote the set of squares x^{2} of elements x of S.
Theorem 2 Let S be a set of nonzero elements of a field 𝔽 such that for every x in S, x is also in S. Then, any function h from S to 𝔽 can be uniquely decomposed as
h(x) = u(x^{2}) + xv(x^{2}) (12) for functions u,v from S^{2} to 𝔽. Furthermore, h is a polynomial of degree less than d if and only if u,v are both polynomials of degree less than d/2.
Equation (12) is easily inverted to calculate u, v from h,
So, we have replaced our polynomial by two of half the degree. On its own, this has not simplified matters. However, the same trick as above can be be used by exploiting theorem 1 again. If the verifier chooses a random field element μ, consider the new function
If the prover can show that this has degree less than 2^{s  1} then, using theorem 2, we can be confident (up to a probability of 1/p) that h_{0} has degree less than 2^{s}.
This process can be repeated all the way until it reduces to showing that a function is constant, for which the prover just has to send the constant value rather than a Merkle root commitment.
So, let us define new sets S_{0}, S_{1}, …, S_{s} iteratively by setting S_{0} = S and S_{k + 1} = S_{k}^{2}. Recalling that the domain S was defined by equation (11),
S_{k} = c_{1}^{2k}W^{2k} ∪ c_{2}^{2k}W^{2k}⋯ ∪ c_{M}^{2k}W^{2k} 
and W^{2k} is just the powers of the 2^{s  k}‘th root of unity ω^{2k}. With these domains defined, we iteratively reduce the problem to polynomials of lower and lower degree by the following steps, run in order from k = 1 up to k = s.
The claim from the prover is that h_{k} is related to h_{k  1} by,
(13) 
for all x in S_{k  1}. So long as the prover does really construct the functions h_{k} in this way, then they are guaranteed to be polynomials of degree less than 2^{s  k}. So h_{s} will be constant. This can be enforced by, on step 4k above with k = s, he returns the constant value of h_{s} instead of a Merkle root commitment. Then, if equation (13) is really satisfied, we can be confident (to within a negligible probability) that each function h_{k} really does have degree less that 2^{s  k} and, in particular, h_{0} has degree less than 2^{s}.
So far, all that the verifier has received is commitments for the functions f_{i} and h_{k} on their respective domains. These are Merkle roots, which will just appear as random 32 byte strings, so the verifier has learned nothing yet. The next stage is to actually ask for the values of the functions at selected points on their domains, so that the required identities can be checked.
We must decide how many points at which to evaluate each of the functions. This is iteration number n, and the larger its value, the more secure the argument. That is, the less likely it is that a prover can fool the verifier into believing a false statement. The choice of n can be decided by the verifier in the interactive procedure, otherwise it is decided upfront and is part of the initial setup for the argument. The following steps are performed for each value of k from 1 to s.
For the first iteration with k = 1, by saying that the prover sends the values of h_{0}, we really mean that he sends the values of f_{i} and the verifier computes h_{0} from this using the linear combination (13). At the final iteration k = s, the prover does not really send values of h_{s}, since this is a constant function and its value was already sent as the commitment.
This is all of the messages which need to be transmitted between prover and verifier. All that remains is for the verifier to perform some checks that the information received from the prover is as claimed. If these succeed, the argument of knowledge is validated, otherwise it is invalid.
Note that as the verifier can only check the identity (13) at the finite randomly chosen set of n points, she cannot validate that it holds everywhere. The best that can be said is that she statistically verifies that it holds at most points, so cannot be certain that h_{0} is a polynomial everywhere of the correct degree. Actually, it is sufficient that she determines that it is a polynomial of degree less than 2^{s} on at least 4^{s} points. As the polynomial degree of both sides of equations (8,9,10) will be of degree less than 4^{s} on these points, they will extend to the entire domain and, in particular, hold on the execution trace.
Since we chose a domain S of size four times that of the domain W of the execution, we only need to show that identities (13) hold on about a proportion ρ = 1/2 of the domain. The probability of this holding by chance, if the prover had not chosen a polynomial, will be no more than ρ^{n}. So long as this is a negligible probability, the proof is sound. Taking n to be about 100 to 200 should be sufficient. This argument for the probability bound is very rough and not rigorous, but for more details see a paper on STARKs, such as Scalable, transparent, and postquantum secure computational integrity.
Note the importance of the order of the interactions between prover and verifier. The verifier chooses points at which to verify equations (13) after receiving all of the polynomial commitments. This avoids the prover cheating by selecting function values to satisfy the identities only at the verified points and not elsewhere. Similarly, steps 3k and 4k are run in order from k = 1 to k = s so that, again, the prover cannot know the later values of μ_{k} and cannot cherrypick functions whose values satisfy the identities for these μ_{k} but for no other choices.
Looking at scalability, steps 3k, 4k, 5k and 6k need to be repeated s times, which is of the order of the logarithm of the number of iterations of Rule 30. Each of the function values revealed by 𝒫 require 𝒱 to verify a Merkle proof of length s, so of the order of (log_{2}N)^{2} ≈ 900 calculations, mainly consisting of computing SHA256 hashes for the Merkle proofs. These are repeated for a number n iterations, but this factor remains roughly constant as N becomes large.
We have described an interactive scalable argument of knowledge, although the procedure above is not zeroknowledge. The prover does not directly reveal either of the secret message used to initialize the calculation, nor does he reveal the execution trace. The functions f_{i} and h_{k} are computed from the execution trace, and are revealed at a number of points. Although these points are on a domain which does not include the execution trace, so does not directly reveal any hidden values, it does depend on them so potentially leaks some partial information to the verifier. It is better if the procedure can be made provably zero knowledge, so does not reveal anything other than the fact that the prover has a secret message with the required property along with some random values.
First, the prover can hide any information that h_{0} may contain by adding a polynomial blinding factor. This is a polynomial g of degree 2^{s} chosen uniformly at random, so has independent and uniformly chosen coefficients. It is to be added to the linear combination (13) when constructing h_{0},
(14) 
As a result, h_{0} will be a random polynomial independent of f_{i},. As h_{k} are derived from this, these are also independent of f_{i}, so do not leak any information. Hence, step 1 is replaced by:
Also, the verifier uses equation (14) instead of (13) incorporating the blinding factor when evaluating values of h_{0}.
Since the prover is required to reveal values of f_{i} at points of the domain S, it is still not quite zeroknowledge. This can be remedied without changing any calculation from the verifiers perspective. All the prover needs to do is to add a random factor to the f_{i} before computing their commitments. He can define new polynomials f′_{i} by
(15) 
where u(X) is a random polynomial of low degree d.
As Z_{N + 1} vanishes on the execution trace, this still satisfies all of the required properties of f_{i} except that its degree is now up to N + d. Sampling f′_{i} at up to d points will give entirely random values, and leaks no information. As the procedure above results in the prover revealing the values of f_{i} at 4n points, we just need to ensure that d is at least this large.
So long as 2^{s} is greater than N + d, the resulting polynomial f′_{i} will still have degree less than 2^{s}, so can be used in place of f_{i} without impacting the algorithm at all. It may require increasing s by one if we are unlucky, this is minor and most likely unnecessary since d will be orders of magnitude less than N. In our case with N equal to 1 billion and s = 30 there is room for the number of iterations n to be anywhere up to 18 million without having to increase s. Using of the order of 100 to 200 iterations is no problem.
Actually, the prover does not even have to explicitly compute the random factor u(X) or apply equation (15). All that is necessary is that for at least d points of the domain W on which f_{i} is constructed, but outside of the execution trace of size N + 1, the values of f_{i} are set to uniformly random values. This automatically adds a random factor u(X) while hardly changing the calculations performed by the prover. The format of the messages sent and the procedure carried out by the verifier is not affected at all.
Making these simple modifications turns the argument of knowledge described above into a zero knowledge proof.
Finally, to build our zkSTARK, the above interactive argument of knowledge needs to be expressed in a noninteractive format. This is just a block of data which anyone can check and verify that it was constructed with the claimed knowledge. In our case, the knowledge of an initial message which, after 1 billion iterations of Rule 30, the claimed result (1) is obtained.
The Fiat–Shamir heuristic will be used, and the zkSTARK itself will just consist of a list of all of the messages sent by the prover according to the interactive protocol described above. This is:
For the edge cases, the values of h_{0}(±x) do not need to be stored, since they are computed from the f_{i} and g. Similarly, values of h_{s}(x) do not need to be stored, since it has constant value given by its commitment.
To create the zkSTARK, the prover just needs to go through the steps of the interactive protocol above recording his messages. While this also involves a verifier 𝒱, her role can instead be simulated by a pseudorandom method. Recall that the verifier makes various random choices, and that it is important when they are transmitted to the prover. This is so that the prover cannot use knowledge of these when constructing commitments for the various polynomials. Specifically, 𝒱 makes the following choices.
The Fiat–Shamir method uses a cryptographically secure random number generator for these choices. Theoretically, this is done with a random oracle and, in practice, a hash function h() such as SHA256 is used. This will give outputs which are effectively uniformly random 256 bit numbers and can be used to define the verifier’s choice of field elements. Specifically, each choice made by 𝒱 is constructed from the bits of h(u‖v) where u is the concatenated list of commitments made by before the verifier’s choice, and v is just a number incremented after each choice.
This defines the zkSTARK! To construct it, the prover steps through the protocol above using the described heuristic for the verifier’s choices, and records his messages. To validate, we would use the stored values for the prover’s messages and the heuristic for the verifier’s. Then, steps 7 and 8 of the protocol above are performed to check that the STARK is valid.
Is it stands, constructing a zkSTARK exactly as above would be rather inefficient. Although it will be much shorter and faster to process than storing or checking the entire execution trace of a billion Rule 30 iterations, it is still much bigger than is necessary. We have 200 polynomials f_{i} and about 30 polynomials h_{k}, all evaluated at the order of 100300 points, and each value takes up around 16 bytes. These all come with Merkle proofs, each of which will contain around 32 hashes of 32 bytes each. This is adding up very quickly. Fortunately, much of this can be eliminated without affecting security. I didn’t do this above for simplicity, but will finish off by outlining some ways in which the size of the zkSTARK can be significantly reduced.
h(x) = u_{0}(x^{4}) + xu_{1}(x^{4}) + x^{2}u_{2}(x^{4}) + x^{3}u_{3}(x^{4}) 
for polynomials u_{i} of degree 2^{s  2}. These can be computed using
4x^{i}u_{i}(x^{4}) = h(x) + α^{3i}h(αx) + α^{2i}h(x) + α^{i}h(αx) 
where α is a primitive 4’th root of unity (i.e., square root of 1) in the field. This is just an inverse Fourier transform of size 4.
Using this would require four evaluations of h_{k  1} per iteration of step 6k, but we are taking twice as big steps, so it works out the same. However, if the Merkle tree is set up so that x, αx, x, αx are stored in neighbouring leaves, they can share the same Merkle proof. Hence, we halve the total number of Merkle proofs of the functions h_{k}.
In our example, there are s = 30 of these polynomials to commit to. If we stop the algorithm at k = s – 8 and, instead of its commitment, return the 2^{8} = 128 polynomial coefficients, this reduces us to 22 polynomials.
For this to be the case, it is only necessary for p – 1 to be a small multiple of 2^{30}. For example, we could use p = 49.2^{30} + 1. This leads to a fewer number of bits (36 instead of 128) per evaluation, which would significantly reduce the work done by the prover (specifically, when doing the Fourier transforms) and save a small amount of space in the zkSNARK.
From steps 2 onwards, the field containing the random verifier choices λ_{ji} and μ_{k} does need to be large in order to keep the soundness parameter small. These can be chosen in a finite extension of 𝔽_{p}, which will be of size p^{r} for extension degree r. Taking r = 4 should be sufficient.
While these terms do not take up space in the zkSNARK itself, it is required that the verifier computes its values, involving multiplying N terms for each evaluation. As N is of the order of a billion, this will take significant resources. It is possible to make use of the identity
Z_{N}(ωX)(X  ω^{N  1}) = ω^{N}Z_{N}(X)(X  ω^{1})  (16) 
enabling its values to be computed at consecutive points xω^{i} in constant time for each successive evaluation. This is very useful for the prover, who needs its values on the entire domain S. For the verifier, it is not so much help, as she only requires its values at a small number of nonconsecutive random points.
One method is for the prover to commit to and send the values of Z_{N} along with those of f_{i}, g. This adds a tiny bit of extra data to the zkSTARK. A random multiple of Z_{N} is added to the linear combination (14) to verify that Z_{N} is indeed a polynomial of degree less than 2^{s}. The verifier checks that identity (16) holds at the sampled values stored in the zkSTARK. This removes the computational complexity but, if we go through the maths, it will result in a slight increase in the soundness parameter.
An alternative method is to make use of the identity
to write Z_{N} as
If N is very close to 2^{s}, this can be used by the verifier to quickly compute the values of Z_{N}. For this reason, it can be useful to extend the execution trace length to be as close to 2^{s} as possible, by adding in extra iterations of the rule even though they may not be used in the original calculation.
The protocol was implemented by the Grin and Beam cryptocurrencies, launched in January 2019. More recently, it was introduced into Litecoin in the MWEB upgrade (Mimblewimble extension block). This formally locked in on 3 May 2022 and went live a couple of weeks later.
As this post is rather long, I include a list of contents:
The history of mimblewimble is rather mysterious, as really should be expected. It started on the morning of 2nd August 2016 at 4:35 UTC. A user going by the pseudonym Tom Elvis Jedusor logged into the Bitcoin wizards IRC page, under the id ‘majorplayer’, posted the comment “hi, i have an idea for improving privacy in bitcoin. my friend who knows technology says this channel would have interest,” along with a link to a text document on a hidden and now defunct Tor service, and left. Tom Elvis Jedusor is the original name of Lord Voldemort in the French translation of Harry Potter, and is an anagram of “je suis Voldemort”!
The aforementioned document was discussed in a reddit thread, Mimblewimble: Noninteractive CoinJoin and better scaling properties using Confidential Transactions. This also contains a link to the original paper by Jedusor. Mimblewimble is based on the confidential transactions of Greg Maxwell, originating in a 2013 Bitcoin forum post by Adam Back. In October 2016, Andrew Poelstra posted a paper fleshing out and expanding on the ideas contained in Jedusor’s post, as well as fixing at least one mistake in the original.
In November 2016 another character appeared on the same IRC chat. Ignotus Peverell, inventor of the invisibility cloak in the Harry Potter universe, posted a github link to the first implementation of mimblewimble. This was to eventually lead to the Grin cryptocurrency. In parallel, Beam was also developing its own implementation. Both of these launched in January 2019. After a couple of years in development and testing, the MWEB upgrade introducing mimblewimble to Litecoin went live in May 2022.
For some more information on this early history, see the post A Short History of Mimblewimble: From Hogwarts to Mobile Wallets, posted by Beni Issembert in October 2018. Also, Peverell posted a technical Introduction to Mimblewimble and Grin on github in March 2017. A high level overview with some technical detail is given by Tari Labs. In the current post, I attempt an overview of the technical ideas and mathematical techniques used.
Significant features and benefits of Mimblewimble, when compared to other protocols are:
Along with these benefits, there are obviously also some drawbacks:
I also mention another significant difference between Mimblewimble and blockchains such as Bitcoin. In Bitcoin, it is possible to trace the history of a coin through its previous transactions all the way back to its creation as block rewards. Mimblewimble breaks with this idea by combining all transactions, so all that we see is the total unspent transaction outputs and block rewards so far, without making any assignment from individual utxos to specific inputs.
Mimblewimble is based on the utxo model, as used by Bitcoin, Litecoin, and various other blockchains. Each transaction contains a set of inputs and a set of outputs along with their quantities. Each input is linked to a previous unspent transaction output (utxo), with its quantity being given by that of the corresponding utxo. The sum of the output quantities can be no greater than the sum of the inputs, with the difference being the transaction fee paid to the miner.
Consider the following scenario. Alice is the owner of 100 units of a mimblewimble based asset. This means that she has access, via knowledge of a private key, to a utxo with quantity 100. She decides to send 30 units to Bob, while paying a transaction fee of 5, meaning that she receives 65 units change. This transaction consists of the following.
This is in figure 1 below. All quantities are integer valued, so we must make sure that the units are small enough to not require fractional valued transactions. The fee is a kind of output which, ordinarily, does not have to be explicitly listed in the transaction since it can be computed by the sum of the inputs minus the sum of the outputs. We include it here though, and will be needed once the quantities are hidden.
Denote the transaction quantities by a_{i}, so that Alice’s input is a_{1} = 100, Bob’s output is a_{2} = 30, and Alice’s change is a_{3} = 65. Denote the fee by f = 5. The condition is that the sum of the a_{i} plus the fee, using a negative sign for inputs, evaluates to zero.
–a_{1} + a_{2} + a_{3} + f = 0. 
Next, we choose a private and public key pair for each input and output. For this, elliptic curve cryptography is used along similar lines to that described in the post on Schnorr signatures. We fix a cyclic group E with size equal to a huge prime number p, typically a 256 bit prime for Bitcoin related applications. In practice, E is an elliptic curve, although that is not really important for the description given here. A generator G is fixed, which is just any nonzero element of the group. Then, a private key x is just a number taken modulo p, and the associated public key is constructed by multiplying this by the generator, P = x.G. The idea is that, for generic points P of the group, it is practically impossible to ever back out the private key x. Although it is theoretically possible by a bruteforce search, the size p of the group is so huge that we will never find the solution. So long as x is chosen randomly in the range 0 to p – 1, the associated public key can be revealed to the world while x is kept secret.
In mimblewimble, we do not reveal the ‘public key’ x.G either. Instead, it is used as a blinding factor to hide the quantity a of the output or input. Another generic group element H is fixed by the protocol. It is important that noone is ever able to find a multiplier k such that H = k.G. If they could, the whole protocol would fall apart allowing people to generate transactions with arbitrary inputs and outputs, not even summing to zero. For example, we might choose H whose coordinates are given by an SHA256 hash of an agreed value, such as the bits of the generator G. Either way, users should be able to have confidence that H is a generic group point.
Now that the group points G and H have been fixed by the protocol, Pedersen commitments are defined for each input and output. If the quantity is a units and the ‘private key’ is x, then the commitment is given by combining these,
C = a.H + x.G.  (1) 
Effectively, the commitment is the public key shifted by an amount proportional to the value associated with the input or output. The value x is also known as a blinding key since it hides the quantity a. This is as in figure 2 below, where instead of associating the quantity with each input or output, we use the commitment value C_{i} = a_{i}.H + x_{i}.G. There is no public address, as with Bitcoin transactions. Instead, both the quantities and keys are combined in the commitments created onthefly as part of the transaction construction.
Each participant chooses the private key x_{i} for their output at random in the range 0 to p – 1, and keeps it secret. From this, they compute the commitment value associated with the output, and must also construct a range proof. This is a zero knowledge proof that the commitment C is indeed of the form (1) for some positive quantity a lying in a preset range defined by the protocol. More about these below.
For inputs, x_{i} is the same as for the output which is being spent (so, the person spending an output must know its private key). This ensures that the commitment value of an input equals the output commitment being spent.
Each commitment is an entirely random element of the group E, hiding the quantities and private keys from prying eyes. Only the fee does not have a blinding factor, so is revealed to the outside world. This is necessary, since it needs to be known to the miner.
The condition that the signed quantities and fee sum to zero can no longer be directly verified by an outside validator since they are not known. Instead, consider the sum of the commitments and fee, using a negative sign for inputs.
C  = –C_{1} + C_{2} + C_{3} + f.H 
= –a_{1}.H – x_{1}.G + a_{2}.H + x_{2}.G + a_{3}.H + x_{3}.G + f.H  
= (a_{1} + a_{2} + a_{3} + f).H + (x_{1} + x_{2} + x_{3}).G  
= x.G. 
As the signed transaction quantities sum to zero, the multiples of H cancel out and we are left with a multiple x = –x_{1} + x_{2} + x_{3} of the generator G.
This means that the summed or excess commitment C is effectively the public key associated with private key x. Although the value of x may not be known to any one individual, the participants in the transaction each know their blinding key x_{i}, which can be combined, effectively generating a multiparty signature for the transaction.
I’ll finish this section by commenting that every commitment C appearing in any output, technically, does simultaneously commit to every possible quantity a! That is, for every quantity a, there does exist a secret blinding factor x satisfying (1). Indeed, since G is a generator then, by definition, C – a.H must equal x.G for some value of x. The reason why this is not a problem, and the transactions don’t collapse into a meaningless mess, is that computing a discrete logarithm is intractible. Actually computing the value of x is effectively impossible except for the specific known values of a and x used in building the output.
For an outside party to be able to validate the transaction, the following needs to be provided.
The first step is to check that the transaction input commitments correspond to outputs in the existing utxo set. Unlike with Bitcoin, there is no script or signature data associated with each input, since this is taken care of by the transaction signature.
Next, the range proofs are used to verify that the output commitments are all of the form C = a.H + x.G for the transaction quantity a being in a specific range fixed by the protocol. For example, we could require it to be in the range 0 < a ≤ 2^{64}. These are noninteractive zero knowledge proofs that the commitments are indeed of the claimed form, without revealing any further information about the values of a and x themselves. Range proofs are a vital part of the protocol since, without them, transactions could be constructed with some negative output quantities, allowing other output sizes to be greater than the total inputs. Similarly, it would be possible to create outputs with a massive quantity which, due to overflow, still apparently sum to the value of the inputs. Either way, this would allow arbitrary inflation of the supply through unrestricted creation of new coins. Restricting the quantities to a small enough range will ensure both that they are positive and that there can be no overflow when they are summed.
As range proofs can be up to several kilobytes in length, these will take up a significant amount of the blockchain space. Finding better range proofs is still an active area for development. I will discuss these more below.
The final part of the validation is to compute the excess commitment C by summing the inputs, outputs and fee, and checking that the signature is valid for the given public key C and provided message. In Bitcoin, the message being signed is equal to the transaction (excluding the signature data itself, as that would be circular). It is vital that the signature applies only to the transaction being validated, otherwise it would be insecure since any third party on seeing a transaction sent to the mempool could replace it with their own transaction using the same signature, stealing the coins. With mimblewimble this is not necessary, since the excess commitment used as the public key is already specific to the transaction. The signed message can be anything we want, so would usually take it to be the empty string for simplicity. This is especially important when aggregating and combining transactions, since the individual transaction data can be dropped from the blockchain altogether.
Recall that the example transaction above has a single output with no blinding factor — the fee. However, some transactions will have a single explicit input also with no blinding factor. This is the block reward plus fees paid to the miner. Such explicit inputs and outputs should be combined into a single value. Once all of the transactions in the block have been combined, this will lead to a single explicit input equal to the block reward defined by the protocol.
I described how transactions are defined in mimblewimble above, but did not say how they can be constructed in the first place. Consider the example as in figure 1, with Alice paying a quantity of coins to Bob and returning a change amount to herself. Each party to the transaction can construct the commitment for their respective input or output simply by choosing a blinding factor x at random and applying equation (1).
In Bitcoin, the person spending money can create the transaction and send to the mempool which, in the example considered here is Alice. She just needs to know the address for Bob to send the coins to. Only Alice’s input needs to be signed, which she is capable of doing herself. Once the transaction is included in a block, Bob receives his money.
On the other hand, for a mimblewimble transaction, Alice is not able to sign on her own. This is because to find the private key (or, excess) of the transaction, she would need to know Bob’s private blinding key x_{2}. Bob cannot reveal this secret value, since doing so would allow Alice to steal back the coins he receives.
However, it is possible for Alice to send the necessary information to Bob in order that he can create the transaction himself and send it for inclusion in a block, so that he can receive his payment. Alice sends him the following information.
Once he receives this, Bob can compute the private key x = –x_{1} + x_{2} + x_{3} and use this to generate the signature. Combining with his own commitment C_{2} and associated range proof generates the transaction.
This method of creating payments was described by Jedusor in his original paper introducing mimblewimble. It is noninteractive since Alice only has to send one package of data to Bob, and does not need to enter into twoway communication with him. It does require Alice to provide some private information to Bob, specifically the signed sum –x_{1} + x_{3} of her blinding keys. However, as long as they are chosen entirely at random, this does not reveal the individual values of x_{1} and x_{3}. It is a bit restrictive in that it only applies to a payment to a single individual.
An alternative approach is to use a signature aggregation method, such as a variant of musig, for the parties to produce a joint signature without revealing any information on their private keys. Recall that a Schnorr signature for private key C and message m consists of an integer s taken modulo the size p of the group E and a public nonce R in E satisfying
s.G = R + h(R‖m).C.  (2) 
Here, h() is a cryptographically secure hash function such as SHA256. This is easily solved for someone in possession of the private key x satisfying C = x.G. They simply choose a (secret) nonce r at random in the range 0 through p – 1 and set R = r.G and s = r + h(R‖m)x. On the other hand, without knowledge of the private key x, finding a valid signature is effectively impossible.
For the multisignature case, we have a number n parties, each with a private key x_{i} summing to x = x_{1} + x_{2} + ⋯ + x_{n}, although no single individual is in possession of this joint private key. This is the situation above, where multiple parties to a mimblewimble transaction each know the private key used to generate the commitment for their inputs and outputs (reversing the signs for the transaction inputs), but not those of the others.
A joint signature can be created by performing the following steps.
It can be checked that the resulting signature does satisfy (2).
Some care needs to be taken sharing the public nonces in the second step above. For security, it is in each person’s interest that resulting combined nonce is entirely random. Any participant can enforce this by selecting their nonce uniformly at random, and independently of the other’s choices. However, this does not work if any of the other parties chooses their nonce after seeing the other ones, since this could affect their choice and independence is lost. For example, if one person chose their value to be negative the sum of everyone else’s, then the summed value R would simply be 0. To avoid this, before sharing, we can start by having everyone commit to their nonce without sharing its value. This can be done by sharing their hashes H(R_{i}) first, after which R_{i} are shared. Then, each person can verify that everyone’s nonce does have the correct hash and noone has changed their mind upon seeing other people’s choices.
This simple aggregation method works, and allows a signature to constructed for the transaction without having anyone reveal their private key to anyone else. However, it does require some rounds of communication, in both directions, between all participants. It is not possible for Alice to simply send a block of data to Bob and be done with it. Unlike the method described by Jedusor above, the musig approach here is much more general, but is interactive.
Let the real mimblewimble magic begin — transaction cutthroughs! A typical block of a blockchain will contain many individual transactions. With mimblewimble, all of the transactions can be combined into a single big one in such a way that it is not possible to disentangle it back into the original list of discrete small transactions. This offers security benefits and saves some space.
Actually, we can go further and combine transactions across blocks. Even across the entire blockchain so that it consists of just one huge transaction! Going even further, whenever a transaction output is spent by the input of a later transaction, these inputs and outputs can be cancelled out, by a process known as transaction cutthrough. Taking this to its conclusion, the blockchain will only contain a single massive transaction, whose outputs consist of the current utxo set. Its inputs will only contain the initial creation of coins, such as through block rewards.
Transaction cutthrough offers potentially massive savings of blockchain space since all spent coins are stripped out, and the size of the blockchain grows linearly in the size of the current utxo set, but not in the number of historical transactions. Actually, as we will see, some signature data of individual transactions will be retained even after applying transaction cutthrough, but this is relatively small (32 bytes per transaction).
The basic idea is simple enough. Each transaction consists of a list of inputs and outputs or, more precisely, the commitment values for these inputs and outputs. In addition, range proofs are provided for the outputs To combine transactions, then, we simply concatenate the lists of inputs and also the lists of outputs, together with their range proofs. If any output was spent by another transaction also in the list, then it is deleted from the combined transaction along with the corresponding output of the same commitment value.
This is all straightforward. If that was everything there was to it, all we would end up with is a list of inputs and outputs with range proofs. There would be no way to validate the result, so we cannot tell if the list of outputs really was derived from the inputs through a series of valid transactions. With Bitcoin, we need to validate the signature associated with each transaction input, using the transaction itself as the message being signed. We would still need to iterate through all of the original transactions validating all the signatures, so there would be no gain from applying transaction cutthrough just described.
Mimblewimble is different! First, recall that there is just one signature per transaction, and the associated message is just the empty string, not the entire message. So, to validate the combined big transaction, we need to keep the original list of signatures and loop through these validating them against the empty message. Validating a signature also, obviously, involves the public key to which it is associated which, in our case, is the excess commitment value of the transaction. However, now there is no need to keep hold of the original transactions which added up to the combined big transaction being validated. By additivity, the excess commitment of the combined transaction is just the sum of the excesses of the original transactions, and this can be checked instead. Hence, after combining all original transactions and applying cutthrough, we end up with the following
In addition, due to transaction fees and block rewards, one of the inputs or outputs can be an explicit amount with no blinding factor. As these are easily combined, there should be at most one.
The master transaction can now be validated with the following steps.
C_{1} + C_{2} + ⋯ + C_{n} = C. 
This is as described by Jedusor in the original mimblewimble paper, so along with the list of inputs and outputs, we have a list of excess values (public keys) and signatures. In the description of a mimblewimble transaction above, they were required to have one signature. If this is generalised a bit to allow an arbitrary number of signatures and associated public keys, then when they are combined, the result looks just the same as an original individual transaction.
There are still a couple of issues remaining. First, one of the advantages of combining transactions is that it is no longer possible to split it back up into the original ones. However, using the provided excesses C_{i}, this is not quite true. As was pointed out by user andytoshi in a 2016 reddit thread, this is easily solved. We simply allow each transaction to contain an additional arbitrary ‘excess’ k.G, which is added to the transaction excess to be used as the public key to be signed. When combining transactions, public keys are stored, but only the sum of the k.G values are stored. So, the combined transaction still has a single additional k.G value, and the list of public keys cannot be related directly to the inputs and outputs of the original transactions. This may be unnecessary if the signature data is further reduced so that C_{i} are not stored, as I will now describe.
Another issue is that we are still storing one public key and signature for each of the original transactions, but it is possible to do better. As mentioned above, it is possible to get this down to 32 bytes per transaction (whereas a public key and signature will be about 3 times this).
Suppose that we have a set of n transactions, with excesses C_{i} summing to C. According to (2), a Schnorr signature for each of these consist of integers s_{i} and public nonces R_{i} lying in the elliptic curve group E satisfying
s_{i}.G = R_{i} + h(R_{i}).C_{i}. 
To make this linear in C_{i}, divide through by h(R_{i}) and, after a slight change of variables, we obtain
s_{i}.G = h(R_{i})^{1}.R_{i} + C_{i}.  (3) 
Summing over all transactions, we obtain
s.G = h(R_{1})^{1}.R_{1} + ⋯ + h(R_{n})^{1}.R_{n} + C  (4) 
where s = s_{1} + ⋯ + s_{n}. To validate, we only require the value s, the list of R_{i}, and the excess C of the combined transaction. If (4) is satisfied, then the transaction is valid. Per transaction, all that is retained is the nonce value R_{i}, which can be represented using 32 bytes. This also throws away the individual excesses C_{i} and, as such, removes information which could be used to decompose into the original transactions. The retained nonce values should be selected at random independently of the transaction, so do not contain any information.
This approach does not lose any security over retaining all the individual signature information since, given s and R_{i} satisfying (4), it is straightforward to choose a list s_{i} summing to s and back out C_{i} from (3). These would pass the validation check described above for signing all the individual excesses, so validating each individual excess can be no more secure than just checking (4).
There are still some issues that I have skipped over, but need to be addressed in any implementation. For example, we need to be sure that it is not possible for someone to modify an existing transaction to spend coins that they should not have access to. For example, we could try switching all inputs and outputs in a transaction, and reversing the sign of the signature, in order to undo the existing transaction. This is not a problem with Schnorr signatures, but does affect some other schemes. To avoid this, the paper by Andrew Poelstra suggests using sinking signatures which incorporate the block height. They also suggest using a variant of BLS signatures instead of Schnorr, since these can be aggregated without having to retain any pertransaction information such as the R_{i} above. Such signature schemes do, however, require some additional cryptographic assumptions to be secure.
One final point. So far, I have concentrated on transactions alone but, in any blockchain, the transactions have to be anchored to the blocks. Without this, we cannot be sure that the provided transaction or transaction list is indeed contained in the blockchain. Usually, this means that each block header contains a commitment to the transactions in the block. In Bitcoin, for example, this is done by including the Merkle root of the transaction list.
With mimblewimble, once transactions are combined, it would no longer be possible to validate the block header by checking that it correctly commits to the block transactions. A way around this is to, instead, have the block header commit to the entire utxo set at the given block height. This works, although it does mean that when validating a blockchain we would not want to validate the utxo commitment in every historical block header, as this would be computationally expensive and would require transaction information for every block. Instead, we just need to validate the utxo commitment at the current block height. In this case, the only remaining role played by the historical block headers is to check that they link all the way back to the first (genesis) block, and satisfy the necessary proof of work conditions. In the paper by Andrew Poelstra, it is argued that it is possible to also remove most historical block headers from the chain, resulting in a blockchain growing only logarithmically in the height instead of linearly.
Every output of a mimblewimble transaction should contain a commitment value C, and also a range proof. This is to verify that C is of the form (1)
C = a.H + x.G. 
for some positive quantity a lying in a predetermined range, such as 0 ≤ a < 2^{64}. This is vital to ensure that users cannot create transactions with negative quantities or with huge quantities which overflow, as this would allow other outputs to be greater than the inputs and so arbitrarily generate new coins. We fix the upper bound 2^{N} big enough to allow sufficiently large payments, yet small enough to avoid overflows. However, it is also important not to reveal the values of x and a. This requires a range proof to be a zero knowledge proof of the statement:
I know of integers a and x satisfying (1) and such that 0 ≤ a < 2^{N}.
Using Bitcoin as an example, the minimum monetary unit is the satoshi, 100 million of which equals one bitcoin. The total number of bitcoin which can ever exist is less than 21 million — or 2.1 quadrillion satoshi. This is less than 2^{51}, so using N = 64 is more than big enough for a transaction quantity upper bound. On the other hand, assuming that we use an elliptic curve whose size is a 256 bit number, it would be impossible for overflow to occur unless the number of outputs exceeds 2^{192}, which is a ridiculously high number. In practice, we may also want to exclude zero quantity transactions, but this is easily achieved by applying the range proof to C – H instead of C, so I ignore this point here.
As a range proof is required for every single unspent transaction output, they can easily make up most of the blockchain space. It is, therefore, important that they use as little space as possible. This is an area of ongoing research, with the paper Bulletproofs: Short Proofs for Confidential Transactions and More describing a particularly efficient method. See, also, Bulletproofs and mimblewimble for a more easily readable highlevel description. In the current post, rather than looking at bulletproofs, I will detail one especially simple range proof construction. This is to give an idea of how it is possible, even though it does lead to much larger proofs than the stateoftheart.
For the first step, let us consider a binary expansion,
a = a_{0} + 2a_{1} + 2^{2}a_{2} + ⋯ + 2^{N  1}a_{N  1} 
where the a_{i} are each equal to either 0 or 1. The idea is to convert our original range proof for a to a sequence of much simpler ones, one for each of the binary digits. As these digits can only take 2 different values, which is a very small range, it will be straightforward to build range proofs for these. Split the private key x as a sum of terms
x = x_{0} + x_{1} + ⋯ + x_{N  1} 
The x_{i} terms should be chosen uniformly at random in the range 0 to p – 1 subject only to this constraint. Defining new ‘commitment’ values
C_{i} = 2^{i}a_{i}.H + x_{i}.G  (5) 
these automatically sum up to
C = C_{0} + C_{1} + ⋯ + C_{N  1} 
As they are uniformly random points of the elliptic curve E subject only to this constraint, revealing the values of C_{i} does not leak any information, so is zero knowledge.
It is only required to produce a range proof for each C_{i}, verifying that it satisfies (5) for integers a_{i} and x_{i} satisfying 0 ≤ a_{i} < 2. This is a vastly reduced range compared to what we had originally, so can be handled more directly. Summing (5) over i will then complete the required proof that 0 ≤ a < 2^{N} in (1).
Note that, for any fixed index i, if we set P_{1} = C_{i} and P_{2} = C_{i} – 2^{i}.H then a range proof for (5) is equivalent to proving that we know of a value x such that either P_{1} = x.G or P_{2} = x.G. This can be achieved using ring signatures described below.
To complete the construction of range proofs we need to produce a zero knowledge proof that, for two points (public keys) P_{1} and P_{2}, we either know of a value x such that P_{1} = x.G or we know of an x satisfying P_{2} = x.G. Things would be easy if, instead, we were trying to show that we know of a value x such that and we know of an x satisfying . Just produce two digital signatures, one for each of P_{1} and P_{2}. This is standard assuming, of course, that we do know both of the corresponding private keys.
So, how can we change the ‘and’ into ‘or’? In fact, it can be done in a very similar way to producing a pair of Schnorr signatures, one for each of the points P_{1} and P_{2}. It only differs in that, for the equations validating these signatures, the hash values of the nonces R_{1} and R_{2} are exchanged.
s_{1}.G = R_{1} + h(R_{2}).P_{1},
s_{2}.G = R_{2} + h(R_{1}).P_{2}. 
(6) 
The ring signature consists of the values (s_{1}, s_{2}, R_{1}, R_{2}). That is all! The first line involves h(R_{2}) and the second h(R_{1}). If these terms were exchanged, then it would just be the validation formulas for a pair of Schnorr signatures (s_{1}, R_{1}) and (s_{2}, R_{2}).
If we know of a private key x for P_{1} which, by definition, satisfies x.G = P_{1}, creating a ring signature is easy. Just choose integers r_{1} and s_{2} uniformly at random modulo p and set,
R_{1} = r_{1}.G,
R_{2} = s_{2}.G – h(R_{1}).P_{2}, s_{1} = r_{1} + h(R_{2})x. 
Similarly, producing a signature is easy if we know the private key for P_{2}.
The idea of ring signatures extends in a straightforward fashion to any finite number of public keys (points on the curve E) P_{1}, P_{2}, …, P_{n}. A zeroknowledge proof that we know of an integer x satisfying x.G = P_{i} for some value of i consists of pairs (s_{i}, R_{i}) satisfying
s_{i}.G = R_{i} + h(R_{i  1}).P_{i} 
for all i = 1, 2, …, n. Note that each equation depends on the preceding one through the h(R_{i  1}) term and, for i = 1 we take R_{0} = R_{n}. This arranges the equations in a circle, with each depending on the previous one.
Finally, lets look at the size of a range proof constructed as above for 0 ≤ a < 2^{N}. For each 0 ≤ i < N we only need to store the 4 values (s_{1}, s_{2}, R_{1}, R_{2}) as P_{1} and P_{2} can be backed out from (6). Supposing that we use an elliptic curve defined modulo a 256 bit prime, as used by Bitcoin, each of the curve points R_{i} and signature values s_{i} can be represented using 256 bits, or 32 bytes. This gives 32 × 4 × N = 128N bytes for storing the ring signatures, or 8kB of data. This can be reduced to 6kB either by aggregating the signatures in a similar fashion as was described above for combining transactions, or by using a base4 expansion instead of binary. By contrast, according to the paper Bulletproofs: Short Proofs for Confidential Transactions and More, bulletproofs would only be 688 bytes in length.
]]>Merge mining or merged mining is the process of mining more than one blockchain simultaneously. This is possible for some proofofwork (POW) chains, where a miner puts the same hash power to work creating blocks on multiple chains and earns the associated rewards on each of them, without having to pay separate energy costs for each. Here, I explain the ideas behind merge mining.
For a blockchain incorporating the proofofwork protocol, miners use hash power in order to win the chance of appending their block. For leading chains such as Bitcoin, this is an energy intensive process where each individual hash has a minuscule chance of winning. They need to perform a huge number of hashes to be in with any chance. Where there are multiple proofofwork chains available, the miner needs to choose which one to contribute his hash rate towards. So long as his hardware is compatible with the specific hashing algorithms used (such as SHA256), he is free to mine on either chain, and switch between them as desired. Usually, it is not possible employ the same hash power simultaneously on both chains. This is because each hash is applied to the block header for the chain in question, so can only be used for solving the POW problem for that specific blockchain. However, if the protocol for all (or, all but one) of the blockchains in question have been designed to specifically to allow for it, then it can be possible for each hash function application to contribute to the proof of work on each chain simultaneously. This is known as auxiliary proofofwork (AuxPOW).
There are several blockchains that can be merge mined along with Bitcoin. The first such case was Namecoin (NMC), which was created in April 2011 and upgraded in October 2011 to support merge mining. Another example is Rootstock (RSK), which was created in January 2018 and is a sidechain of Bitcoin supporting smart contracts. According to the article The Growth of Bitcoin Merge Mining from October 2020, over 90% of the Bitcoin hashrate is involved in merge mining. This is shown in figure 2 below, borrowed from the same article and showing the proportion of Bitcoin blocks which contain an auxiliary proofofwork in the coinbase transaction, indicating that it was merge mined along with another blockchain. One notable example not involving Bitcoin is Dogecoin, which is merge mined along with Litecoin, using the scrypt hash function.
By construction, if two blockchains can be merge mined, then they necessarily use the same POW hash function. However, it should be understood that they can be completely independent of each other. While it will depend on the specific setup of the chains in question, they can have different rates of block production, different rewards and different difficulty levels. This is as in Figure 1 above, with a hypothetical blockchain ‘X’ being mined along with Bitcoin. Both blockchains have different block production times, and the miner occasionally succeeds in producing blocks on each of the chains at different rates. Merge mined blockchains do not have to be mined together and can in theory exist independently of each other. There is not generally any link between blocks on the separate chains, and each chain can have its own network of nodes independently performing validation without reference to the other.
Interestingly, the idea of merge mining was suggested by Satoshi himself back in December 2010 in the Bitcointalk forum. To quote him directly:
I think it would be possible for BitDNS to be a completely separate network and separate block chain, yet share CPU power with Bitcoin. The only overlap is to make it so miners can search for proofofwork for both networks simultaneously.
The networks wouldn’t need any coordination. Miners would subscribe to both networks in parallel. They would scan SHA such that if they get a hit, they potentially solve both at once. A solution may be for just one of the networks if one network has a lower difficulty.
I think an external miner could call getwork on both programs and combine the work. Maybe call Bitcoin, get work from it, hand it to BitDNS getwork to combine into a combined work.
Instead of fragmentation, networks share and augment each other’s total CPU power. This would solve the problem that if there are multiple networks, they are a danger to each other if the available CPU power gangs up on one. Instead, all networks in the world would share combined CPU power, increasing the total strength. It would make it easier for small networks to get started by tapping into a ready base of miners.
Satoshi seems to be envisaging merge mining all POW chains in a single combined POW effort, although that currently does not seem to be a likely scenario.
When multiple blockchains are merge mined together then, in theory, there does not have to be any special ordering between them. In practice though, there tends to be one which is more established with a significantly higher hashrate, miner rewards, and total secured value. This is referred to as the parent (or main, or primary) blockchain, with the remaining ones being the child (or auxiliary, or secondary) chains. Also, as I will explain below, the protocol usually makes a distinction between the parent chain whose protocol uses regular POW with no awareness of merge mining, and the child chains using AuxPOW in an attempt to benefit from the parent’s hash power. In the examples of NMC and RSK mentioned above, these are children to the Bitcoin blockchain. More interestingly, according to the protocol, Dogecoin is merge mined as a child of Litecoin although, due to major price increases, the rewards for mining Dogecoin are currently more than twice that of Litecoin.
As mentioned above, when multiple blockchains are merge mined, their difficulty levels and average block production times can be completely independent. However, with the standard AuxPOW algorithms, if the miner does solve the POW for one block, then he simultaneously solves it for all merge mined chains with a lower difficulty. This is because the generated hash value will be below the threshold required for that block, so will also be below the larger threshold of all blocks of lower difficulty. For example, if he is merge mining RSK along with Bitcoin then, whenever a valid Bitcoin block is produced, an RSK one will also be created. On the other hand, he will likely generate RSK blocks without an accompanying valid Bitcoin one.
The aim of this post is to explain auxiliary proofofwork and how merge mining is possible, rather than to evaluate a list of all the benefits and drawbacks. It is good, however, to have an idea of some of the main points before going into the technical details. Starting with the benefits:
Unsurprisingly, there are also drawbacks to merge mining and, in some cases, it can be controversial.
Blind merge mining (BMM) has been suggested as an alternative approach which avoids these issues, although it currently does not seem to have gained much traction in realworld use cases. In BMM, the blocks of the child chain are not constructed by the miner of the parent chain. Instead, they are built by a separate party, who pays a miner to include the child hash in his blocks for the parent chain. Then, the miner does not need to know anything about the child chain. He simply accepts payment for including some additional data in his blocks. In this post I will concentrate on regular old common or garden merge mining, rather than BMM.
I now go into the theory and some technical detail of how merge mining is possible. To start, consider the usual POW algorithm for mining a Bitcoin block.
The miner collects together the sequence of transactions to be used in the block. Then, a hash root (Merkle root) of these is computed. The block header is constructed containing this hash, together with other required fields such as a version number, previous block hash, timestamp, difficulty level and the nonce. The hash of this block header is computed (a double SHA256 hash, in the case of Bitcoin). If it is below the threshold for the required difficulty level, then the block is valid. Almost certainly, the hash will not satisfy the difficulty requirement. So, the miner must repeatedly modify the block header and recompute its hash until, eventually, it does satisfy the requirement and can be published. The nonce field is provided for this purpose, since it impacts nothing but the hash value. The miner can keep modifying the nonce and recomputing the hash until, he either wins the race and produces a block hash satisfying the requirement, or another miner finds a block and he needs to start over. This is shown in figure 3 below, with the Merkle root of the transactions denoted by h.
As described above, POW involves repeatedly computing the hash of the block header. So, it can only apply to the chain for which this block is intended. It is not possible for a single hash computation to be valid for two separate blocks in two distinct blockchains.
What can be done? How is it possible for a single hash computation be valid for the POW requirement simultaneously in two blocks in different chains? To answer this, note that the use of the hash of the block header for the POW condition is due to the protocol defined for the chain. If we were to design our own blockchain from scratch, we do not have to set it up in exactly this way. All that is really necessary, is that the POW hash function involves the header in an essential way, so cannot be computed before the block has been constructed. We could take the block and combine with any arbitrary additional data before taking the hash. Or, we can take its hash, combine with additional data and then take the hash again. In fact, we do exactly this in auxiliary proofofwork.
Consider four hypothetical blockchains W, X, Y, and Z, and suppose that we have constructed a block for each of them. Suppose that we compute their hash values, denoted by h_{W}, h_{X}, h_{Y} and h_{Z} respectively. If the POW difficulty condition was applied separately to all of these, then we would need to keep recomputing all four of these hashes (with modified nonces in their block headers) until the conditions are satisfied. This is just performing proof of work separately for each of the blocks.
We can propose the following alternative approach. Simply concatenate the block hashes into a single block of data h_{W}h_{X}h_{Y}h_{Z} and compute its hash. In fact, since we are doing this, we may as will include the nonce n, rather than including it in each individual block header. So, we compute the hash h of the concatenated data h_{W}h_{X}h_{Y}h_{Z}n. If it meets the proof of work condition for any one of the blocks W,X,Y,Z, then it can be submitted for inclusion in the blockchain. Otherwise, we keep updating the nonce and recomputing the hash h until, eventually, it does satisfy the POW condition for one of the blocks. This is as in figure 4 below.
In this setup, once the hashes of the individual blocks have been computed, we only need to apply the hash function once for each iteration of updating the nonce and recomputing h. Even so, the computed hash applies to the proofofwork condition for all four underlying blocks.
This describes a proposed method of mining all four chains simultaneously. However, we still need to describe the protocol for each of these chains in such a way as to be able to recognise such a block as valid. Suppose, for example, that the hash h satisfies the necessary difficulty level for block X. We cannot simply append this block on its own to the blockchain, since it is impossible for a validator to recompute h without knowing the other blocks and the nonce. The blocks in chain X need to be bundled with some additional data for validating the POW condition. For the procedure outlined above, block X should be packaged along with the prefix h_{W} and suffix h_{Y}h_{Z}n. Then, the protocol for blockchain X can be defined so that we first compute the hash h_{X} of the block header, then concatenate this with the supplied prefix and suffix to obtain (h_{W})h_{X}(h_{Y}h_{Z}n). Finally, the hash of this is taken and the POW condition is declared to be satisfied if it is below the threshold of the block’s difficulty level.
The described protocol is a version of auxiliary proofofwork. Comparing with regular proofofwork used by Bitcoin:
So, if there are multiple blockchains all defined with such an AuxPOW protocol, and using the same hash function for computing h, then they can be merge mined with the procedure above. There are some technical details which would need to be looked at, such as ensuring that miners cannot simultaneously create multiple blocks on the same chain (which I discuss more below), but these are easily addressed. Also, if we wanted to mine a large number of blockchains simultaneously, then organising the block hashes into a Merkle tree would be more efficient than simply concatenating them. This is all well and good, but what if we want to merge mine with Bitcoin? It is likely to be our preferred choice, so that the other chains inherit the benefit of Bitcoin’s huge hash rate. It would be nice if Bitcoin employed such an AuxPOW protocol, but it doesn’t, and this situation is unlikely to change any time soon.
Now let’s consider how we can merge mine a child blockchain along with something like Bitcoin which uses a standard POW algorithm using the block header hash, rather than auxiliary proofofwork of the form described above. This is the situation encountered in practice. First, since we use the Bitcoin block hash, to make this also depend on the child block hashes they must be included somewhere inside the Bitcoin block. Really, it must be included in one of the transactions and, since this procedure is performed by the miner, it is included in the coinbase transaction created by the miner to pay their rewards.
Consider, again, figure 3 above. This shows the construction of a Bitcoin block and the calculation of its hash. As the coinbase transaction is the first one of the block then, if we are to insert the child block hash into it, we obtain something like figure 5 below. Since we do not want to have to distribute the entire Bitcoin block along with the child blocks for the AuxPOW, all transactions are stripped out other than the coinbase and its Merkle proof consisting of the hash values h_{001}, h_{01}, h_{1} in figures 3 and 5.
There are two places in the coinbase transaction where we can conveniently include arbitrary data such as a child block hash:
Actually, the RSK specification allows this data to appear anywhere in the coinbase transaction, although in an OP_RETURN output is usual.
With the method described here, there is no real need for the child blocks to contain a nonce, since that is already taken care of by the Bitcoin header. Regardless of the precise format and whether the data is stored in the scriptSig or an OP_RETURN output, if the Bitcoin block is below the child difficulty threshold, the child block is valid and can be appended to its blockchain. The following needs to be distributed alongside the child block:
This is sufficient data for a network node to be able to determine if a child block satisfies the AuxPOW condition. Any such validator should perform the following steps:
h_{000} = H(t_{1}), 
h_{00} = H(h_{000}h_{001}), 
h_{0} = H(h_{00}h_{01}), 
h = H(h_{0}h_{1}). 
If these checks succeed, the child block is valid! I stress that the AuxPOW validator need not know anything about the Bitcoin blockchain itself, and the supplied Bitcoin header need not be an actual block in the canonical chain. Typically, if the child chain has a lower difficulty level, then this will not satisfy the correct POW threshold for Bitcoin anyway. Also, the validator may not know the difference between Bitcoin headers and headers of very similar blockchains such as Bitcoin Cash, so it could also be merge mined with these and the child network would be none the wiser.
So far, I described how auxiliary proofofwork can be used to merge mine a child blockchain along with Bitcoin, or other POW chain. But what happens if we want to mine more than two once, so that there are multiple children? This is not particularly difficult, although there are a couple of ways of going about it. One method is to include multiple block hashes in the coinbase transaction, one for each of the child blocks that we want to mine. For example, using the RSK method of putting the hash into an OP_RETURN output, we can simply use multiple OP_RETURN outputs, one for each child block that we wish to mine.
Another approach is to merge multiple child block hashes into one, and just include this in the coinbase transaction. While this saves some space in the Bitcoin block, it does require additional information to be packaged in the AuxPOW data stored along with the child block. For hypothetical child chains W, X, Y, and Z, this could work similar to figure 4 above (but without the nonce). We just concatenate the child hashes together and take the hash of this to be included in the Bitcoin block. Then, the AuxPOW information should include the prefix and suffix data, so that these can be concatenated with the child hash to verify that it does indeed have the correct hash value.
Alternatively, the child hashes can be combined in a Merkle tree. This would be more efficient if many children are being mined together, since the AuxPOW would only need to include the Merkle proof in order to recompute the Merkle root h. This is as in figure 6 below, where the Merkle root h is put in the Bitcoin block in place of the individual child hashes. If the miner solves the POW difficulty condition for block X, for example, then its AuxPOW information would include the proof consisting of hashes h_{W} and h_{YZ}.
This is the approach used by Namecoin, so that the 32 byte hash can represent either a single child block hash, or a Merkle root of several children. Whichever is the case can be determined by the 4 byte ‘merkle size’ field, which must be a power of 2 (for a nicely balanced tree). This represents the number of leaves of the tree, so if it is equal to 1 then it is just the hash of a single child block. On the other hand, if it is greater than 1, then it is the Merkle root of an array of multiple block hashes for several child chains (the number of children can be less than merkle size, since the tree can be padded out with dummy hashes not corresponding to any child).
So, all child blockchains whose protocols implement the Namecoin AuxPOW specification can be mined simultaneously with Bitcoin while sharing the same Merkle root in the coinbase. This includes Namecoin, of course, but also various other chains such as Syscoin (SYS)
There is one drawback to combining child hashes, such as is done for the Namecoin specification. It is rather opaque. If we see that the coinbase of a Bitcoin block contains a 44 byte block of data starting with the ‘magic’ code fabe6d6d then we know that it has probably been merge mined along with some child chains. However, it can be difficult, or even impossible, to say what they are. It could include a Namecoin block, a Syscoin block, or any of the other blockchains which adhere to this specification. On the other hand, we can easily tell if it was mined along with Rootstock, since there will be an OP_RETURN output containing a 41 byte block of data with the prefix ‘RSKBLOCK:’.
There is a very important point which should be considered when looking at protocols for merge mining with multiple child blocks: it must not be possible to mine more than one block from the same chain at the same time. If this was possible, then we could mine a whole string of blocks in the chain while only paying the POW required for a single one. This would make a mockery of the idea of the chain POW, and make it very easy to perform 51% attacks without having close to 51% of the actual hash rate. Even simultaneously mining blocks in different branches of the same blockchain would be a big problem. This is because, in the event of a fork, miners would be incentivised to mine blocks in all branches at once. That way, they would get to keep any block reward regardless of which one wins out while only paying the POW to mine a single one. This would absolve miners of the responsibility of choosing the branch to build on, which is a vital part of the Nakamoto consensus since it allows the network to settle on one single canonical blockchain branch.
For a protocol such as that used by RSK, it is straightforward to ensure that multiple blocks from the chain cannot be simulteneously mined. The AuxPOW protocol only needs to include the condition that the coinbase transaction only includes a single valid substring consistent with the RSKBLOCK fields.
For protocols such as that used by NMC, where multiple block hashes are combined into a single Merkle root, it is more difficult. Even if the AuxPOW information contains a Merkle proof showing that a given NMC block hash is contained in the Merkle tree, it is impossible to say in general what other blocks may be included in the other leaves. It is for this reason that the protocol requires each blockchain to assign a unique hardcoded number (the chain ID) which maps to the index for the block hash position in the tree. So, all block hashes corresponding to the same blockchain would have to be put in the same slot, meaning that there can be no more than one of them. The ‘merkle nonce’ field mentioned above is intended to scramble the map from chain ID to index to avoid clashes between different blockchains but, due to a bug in the protocol, this is completely useless and only applies a shift to the index. Instead, the merkle size field should be chosen large enough to avoid such clashes, and the nonce is usually just set to 0.
As an example, I randomly picked a recent Bitcoin block (height 733,528, mined 25 April 2022 by ViaBTC pool) and looked at its coinbase transaction. This contains a single input (the coinbase) as standard, but also 4 outputs. The first of these outputs is paying the mining reward of 6.43 bitcoins, and the remaining 3 outputs are all of type OP_RETURN containing some data.
The input scriptsig pushes some values onto the stack, including the following 44 bytes (in hex). I split them up according to the description of Namecoin merge mining given in the Bitcoin wiki.
Namecoin merge mining data  
magic  fabe6d6d 
merkle root  250978ac44fd7d6e0e837ee74d4d786d 7869991ffae36d593045ffd612f988af 
merkle size  10000000 
merkle nonce  00000000 
This shows that the Bitcoin block was mined along with child blocks whose hashes have the given Merkle root. While it is impossible, in general, to back out from this what the child blocks are, we can check a Namecoin explorer to see if any blocks have an AuxPOW information corresponding to it. In particular, we see that it corresponds to Namecoin block of height 609,282. The explorer linked to shows the AuxPOW information in the ‘Raw Block’ tab. It could also include other merge mined blocks of additional child chains, but these are not easily visible from the Merkle root alone.
The first OP_RETURN output contains the following 41 bytes of data fitting the format for RSK blocks.
Rootstock merge mining data  
RSKBLOCK:  52534b424c4f434b3a 
RskBlockInfo  2b7125dd7547680378797d8e831e2214 1287ac1749f4826870900f3800410bd2 
So, it was also merge mined along with RSK and, from an RSK explorer, can be seen to correspond with block 4,262,866 (RskBlockInfo corresponds to Hash For Merged Mining in the explorer).
The second OP_RETURN output contains 36 bytes of data fitting the template for merge mining Vcash.
Vcash merge mining data  
magic  b9e11b6d 
hash  4869fa8eeb9618a6791a4000a6487ca1 f0a4dc277b48051563ac656dd228a044 
See the merge mining info on github for details. Thanks to Murch for identifying the Vcash merge mined data in a bitcoin.stackexchange answer. Again, checking a block explorer, we see that this corresponds to Vcash block 525,469.
The final OP_RETURN output contains the Merkle root of the block’s witness data, indicated by the first four bytes of data being ‘aa21a9ed’, so is nothing to do with merge mining. We have seen that Bitcoin block 733,528 was merge mined along with NMC, RSK, and Vcash. Also, the NMC merge mining info is a Merkle tree possibly containing further merge mined chains.
One interesting aspect of merge mining, which can be considered as either a benefit or — in some cases — a problem, is the possibility for crosschain consensus conditions. For example, if a child blockchain X is merge mined along with Bitcoin, then it is possible for the child to force rules on the parent. This could be useful if the child is used as a sidechain, which is supposed to interact with Bitcoin in some way, although it could also be useful for the mining process. This is explained in the article Modern merge mining, but I also give a brief description here.
As a purely hypothetical example of such a rule, suppose that the blocks of chain X are only valid if the parent (Bitcoin) block pays 0.1 BTC to some address specified by the child blockchain state. This could be to impose some additional cost to the mining process and pay income to stakers of token X, with the miner receiving compensation in the child chain. The condition could be enforced by the protocol of X requiring an SPV proof (i.e., Merkle proof) of the payment in the parent block, which should be stored along with the AuxPOW data. This means that the information used in the AuxPOW as described in figure 5 above would be updated to something like figure 7 below. The hash values shaded yellow represent the SPV proof of the 0.1 BTC payment.
Consider the effect on the mining process. If the miner did not include such a payment in his Bitcoin block, then any child block that he produces would not be accepted by the validators of chain X. So, he needs to include this payment. However, even if he does this, it is not necessarily true that any such payment is actually made in reality. If the miner solves the POW problem for the child block then, if it has a significantly lower difficulty level than Bitcoin, it is likely that the hash will not be below Bitcoin’s difficulty threshold. Hence, he can transmit the X block to be added to the chain but, as he has not also produced a valid Bitcoin block, the 0.1 BTC payment is not made. On the other hand, if the miner does produce a hash below Bitcoin’s difficulty threshold, and he transmits this for inclusion in the Bitcoin blockchain, then the payment is made. Of course, the miner could opt to avoid the payment by not transmitting the Bitcoin block but, if he did this, then he would be giving up the block reward and would, effectively, not be mining Bitcoin at all.
We can see from the argument above, that such a crosschain rule would not cause the 0.1 BTC payments to be made in a way corresponding deterministically to the production of blocks of chain X. Rather, payments would be made whenever Bitcoin blocks are produced by miners who are also merge mining with chain X. This is a random process, with the rate of such payments being, on average, proportional to the fraction of the Bitcoin mining hashrate which is being used to merge mine with X.
While this example is entirely hypothetical, and I am not aware of a merge mined chain using such a consensus rule, it is not difficult to imagine use cases such as allowing transactions in the child chain to trigger events on Bitcoin, effectively introducing a communication channel between the two blockchains. The article Modern merge mining does describe a very mild crosschain rule for RSK only involving the block timestamps. See that article also for further discussion of crosschain consensus rules and merge mining in general.
]]>Consider the scenario: Alice owns bitcoin, which she wants to use to purchase ether from Bob. The exchange rate (BTCETH) at the time is about 13. So, they agree to a swap where she sends one bitcoin to Bob and, in return, he sends 13 ether to Alice’s Ethereum account address.
This is fine in theory but, before sending her payment transaction to the Bitcoin blockchain, she has second thoughts. What happens if Bob does not submit his corresponding transaction? In that case, she will have sent him the bitcoin with nothing in return. Can he really be trusted? So, she asks Bob to submit his transaction first. However, Bob is also not really sure that he can trust Alice. If he submits his Ethereum transaction before Alice’s payment is confirmed, how can he be sure that he will be paid? To be safe, both parties in this transaction want the other to confirm their payment first. Consequently, they do not go ahead with the exchange.
What can be done to resolve the issue of trust and counterparty risk? This is not something specific to cryptocurrencies. Whenever two parties want to exchange any items of value, there is the possibility that one of them will not follow through with their side of the agreement. Here are three possible solutions.
The blockchain contracts and transfers for this solution are outlined in figure 1 above. Rather than directly facing each other, Alice and Bob both face Carol. This is a triparty agreement. If all goes well, we end up with the bitcoin in a contract controlled by Bob, and the ether in one controlled by Alice, shown in green. If either party defaults by failing to make their payment on time, Carol returns the opposing payment to the sender, shown in red.
This method will involve some costs, since Carol will expect to be paid for her services. It also does not entirely remove the issue of trust but, instead, replaces it with the requirement to trust the third party, Carol. This is effectively what we are doing when we make transactions on a centralized exchange like Coinbase or Binance. We need to place trust in the exchange itself that they will return to us the correct amount of assets that we are due, after fees have been taken. We do not place trust in the other users of the exchange who are on the opposite side of any trades that we make. It is not ideal if we are wanting to make peertopeer swaps directly with the other party.
If the assets exist on different blockchains, then the transactions representing each leg necessarily live on their respective chains, so cannot be combined. However, it is still possible to link them via a secret code which is revealed by the transactions of one leg. This can then be used to unlock the transactions in the other leg. Effectively, this transfers information across from one chain to the other. The two legs are then linked in such a way as to remove the counterparty risk and trust issues. I explain these atomic swaps in this post.
Atomic swaps, whereby Alice and Bob can exchange one asset for another, can be constructed with the use of Hash Time Locked Contracts (HTLCs). In some ways, this is similar to the use of an intermediary to remove trust issues, as described in solution 2 above. The transactions are very similar to those shown in figure 1 except, now, the intermediary Carol is not needed. The contracts which she controls are instead replaced by HTLCs, which are controlled cooperatively by Alice and Bob themselves. This is as in figure 2 below, which I will explain in more detail in a moment.
The first step is for both Alice and Bob to make their payments into intermediate addresses, or contracts. In order to lock these contracts until both parties have paid in, we need an additional constraint restricting how the assets can be withdrawn. The idea is to use a secret code (or, key) which needs to be chosen by one of the two parties. By exposing this secret code, the contracts can be unlocked so that each party receives the agreed amount of the asset.
To start, Alice chooses a secret code S to lock the intermediate contracts. This should be a random binary string with sufficiently many digits that it is infeasible for any other party to guess. She then computes its hash H = h(S). By the properties of cryptographic hash functions (e.g., SHA256), knowing the value of H does not help to find the secret S, other than by an infeasible brute force trialanderror. She can then send H to Bob without revealing the secret code S.
Alice then creates a transaction on the Bitcoin blockchain which can only be unlocked by Bob with his private key. However, an additional constraint is added, so that Bob can only extract the bitcoin if he also provides the secret value S. This is done in the transaction output script by checking that any spending transaction not only provides a valid signature for Bob’s public key, but also a value S whose hash is equal to H. This script only needs to contain the hash value H, and not the secret itself. So, by doing this, Alice keeps the secret code safe while using it to encumber the contract. This is known as a hashlock. In the same way, Bob also creates a transaction on the Ethereum blockchain sending his ether to a hashlock contract, which can only be unlocked by Alice signing with her private key in conjunction with the secret code.
For the second step, Alice uses her secret code and private key to withdraw the ether from Bob’s hashlock contract in the Ethereum blockchain. This is the transaction paying into Alice’s address displayed in green on the left of figure 2.
Here comes the clever point, on which atomic swaps rely for their security. Blockchains are publicly visible. So, any transaction which Alice uses to withdraw the ether from Bob’s contract will also contain the secret code S. Since Bob can see this, he can read the value of S, and hence withdraw the bitcoin from Alice’s timelocked contract. This is the transaction paying into Bob’s address shown in green on the right of figure 2. Sure, Alice can directly send the secret to Bob so that he can receive the bitcoin. The important point is that he does not need to rely on her to do this. It is not possible for Alice to withdraw the ether without also revealing the secret value which Bob needs to obtain the bitcoin.
There is still a big security issue with the procedure outlined so far. After Alice pays into the hashlock contract, what happens if Bob becomes unresponsive and never pays the ether into the corresponding one? This means that he will not be able to obtain the bitcoin but, the problem is, neither will Alice. Instead, it will be forever stuck in a contract requiring both Bob’s private key and the hashlock to be accessed. Similarly, after Bob pays into his hashlock contract, if Alice never reveals her secret then she will not be able to access the ether, but neither will Bob.
These issues can be resolved by having the intermediate contracts return the coins to the sender after some period of time. For the one which Alice pays the bitcoin into, she adds an alternative spend method where she is able to spend the coins herself, but only after a specified time. Similarly, the contract into which Bob pays has an alternative spend method by which he can recover the ether after a given time. These alternative spend methods are known as timelocks, and are represented in figure 2 by a small clock following the name of the individual who is able to spend from the contract after the lock expires.
Contracts like these which have two spend methods, one with a hashlock and the other with a timelock, are the HTLCs mentioned above. The idea is very similar to the use of an intermediary, Carol, described in solution 2 higher up in this post. There, Carol would return payments to the sender if the other party defaulted on their payment. HTLCs allow us to automate this process and remove Carol from the picture.
Putting together the ideas explained above gives the following steps.
This describes the atomic swap assuming that there are no problems, and that it successfully completes. They do need to wait after each of steps 2 and 3 for the HTLC transactions to be confirmed on their respective blockchains before moving on to the next step. After it is initiated by Alice in step 2, there are only two sources of counterparty risk, causing the swap to be cancelled.
The constraint on the timelocks is that T_{A} must be sufficiently greater than T_{B} to give Bob plenty of time to withdraw the bitcoin payment and for his final transaction to be confirmed, without risking the timelock running out and Alice taking it back.
The generally accepted history of atomic swaps is that they were first described by Sergio Demian Lerner in 2012 by a post to the bitcointalk forum. These ideas were fleshed out in more detail on the same forum a year later by Tier Nolan. It was not until 2017 that the first atomic swap was performed, exchanging Litecoin for the Decred cryptocurrency. An atomic swap between Litecoin and Bitcoin was performed a few days later by Charlie Lee.
Interestingly though, the underlying idea was already mentioned very briefly by Satoshi Nakamoto back in December 2010 in the bitcointalk forum thread BitDNS and Generalizing Bitcoin. This was while discussing the idea of using a separate blockchain for the BitDNS project, which eventually became Namecoin. Even if they were running on different blockchains, as Satoshi noted, it would still be possible to set up risk free trades between them. Note, in particular, the final paragraph of Satoshi’s post transcribed below:
Piling every proofofwork quorum system in the world into one dataset doesn’t scale.
Bitcoin and BitDNS can be used separately. Users shouldn’t have to download all of both to use one or the other. BitDNS users may not want to download everything the next several unrelated networks decide to pile in either.
The networks need to have separate fates. BitDNS users might be completely liberal about adding any large data features since relatively few domain registrars are needed, while Bitcoin users might get increasingly tyrannical about limiting the size of the chain so it’s easy for lots of users and small devices.
Fears about securely buying domains with Bitcoins are a red herring. It’s easy to trade Bitcoins for other nonrepudiable commodities.
If you’re still worried about it, it’s cryptographically possible to make a risk free trade. The two parties would set up transactions on both sides such that when they both sign the transactions, the second signer’s signature triggers the release of both. The second signer can’t release one without releasing the other.
This is lacking in detail, and does not describe HTLC contracts, as we would expect for a brief comment within a larger discussion. However, it does explain that for one party to retrieve their coins by signing the transaction, they necessarily provide some secret information allowing the counterparty to also retrieve their payment.
]]>I previously discussed zero knowledge proofs, where a party Bob has some secret data satisfying certain properties. For example, this could be the private key associated with a publicly known Bitcoin address, or it could be a file whose SHA256 hash is equal to a given value, or maybe he knows a 3colouring for a specified graph. In fact, it could be any data satisfying a clearly defined computable property. Bob wants to prove that he has such information to Alice but, as it is secret, he does not want to send her the data itself. Ideally, he wants to prove this without revealing any knowledge besides the simple fact that he knows of something satisfying the claimed properties. As was discussed, this can be achieved by an interactive proof where he exchanges messages with Alice until she is able to conclude with a high degree of confidence that he does indeed have the claimed secret. This as in figure 1 below, where Bob is acting as the prover and Alice is the verifier. As I will build on such zeroknowledge proofs here, if you are not already familiar with them it is suggested to first read the earlier post.
Alice verifies that Bob’s messages satisfy some required properties and, if they do, she accepts his proof. Otherwise, she rejects it. Recall the important properties:
To ensure soundness, Alice should select her messages at random according to a specific procedure. Essentially, she is challenging Bob with unpredictable questions, which he is very unlikely to be able to answer all correctly unless he has the claimed knowledge. If Alice sticks to this method, she is an honest verifier. Similarly, if Bob has the claimed information then, to ensure completeness and zeroinformation, he should also construct his messages according to a specific procedure which, again, are randomized to ensure that he does not leak any secret information. If he does this, then he is an honest prover.
In many situations, the setup described above is not desirable or is not possible at all. It may be preferable for Bob to send a single proof to Alice for her to verify without having to respond with further questions. Maybe there is no communication channel from Alice to Bob, or maybe she will check the proof at a later date when she no longer has contact with Bob. This is the case, for example, where Bob is submits a transaction containing the proof to be included in a blockchain, and Alice is a network validator who later verifies this transaction. These are noninteractive proofs, so figure 1 should be updated to only include a single message sent from the prover to the verifier.
Technically, it is not possible for valid noninteractive proofs to be zeroknowledge, at least by the standards described in the previous post. For one thing, Alice could take the message sent from Bob, and pass it off as her own to try and prove that she has access to the secret data, which is incorrect. In fact, Bob may have already done this, and simply passed off a proof originally from someone else as his own. So, the best that we can expect, is for the message to prove that someone had access to the secret and constructed the message.
Furthermore, let us denote the message by m and the verification procedure by a function F(m). This is a computable function returning either True if it successfully verifies the proof, and False otherwise. If Alice does not have access to the original secret data, then she would not be able to construct a message m such that F(m) evaluates to True. However, if Bob sends her the proof, then he is directly giving her such an m. This is information in itself so, strictly speaking, the proof cannot be zero knowledge. Such considerations did not apply to interactive proofs where Bob is demonstrating that he can correctly respond to messages chosen randomly by Alice,.
There is a way around these issues, if we relax the zeroknowledge requirements slightly and make some assumptions regarding random oracles. The idea is that Bob can simulate an interactive proof, generating his messages in the usual way and using a pseudorandom number generator to generate the messages which (honest) Alice would otherwise have sent. The random numbers can use Bob’s previous messages as a seed to ensure that they are not predictable beforehand. The proof then consists of Bob’s sequence of messages from the simulation. A verifier can later reuse these messages and the same random number generator to rerun the simulation and check if it succeeds. This is the Fiat–Shamir heuristic, and is shown in figure 3 using h to denote the pseudorandom number generator.
For cryptographically secure pseudorandom numbers, we use a hash function. Under the random oracle model, these are considered to output random values, but are deterministic in the sense that if they are reevaluated with the same input, then they regenerate the same output. While this is not strictly possible, hash functions do behave very much like this and there is no known way to predict the output of a secure hash function other than by computing it. The result is that we obtain noninteractive proofs which are zeroknowledge in the sense that, other than possession of the secret data, the only information that they reveal involves the values of the hash function or random oracle for the input values used in the proof.
While the points above suggest that noninteractive proofs can be trickier to construct and to analyse than their interactive versions, there is at least one factor which benefits the noninteractive case. For interactive proofs, the verifier has the freedom to choose her messages in whatever way she likes, taking into account any messages already received from the prover. We cannot assume that she is honest, and follows any particular scheme. She could be trying to extract secret information, rather than following the standard procedure to verify the proof. On the other hand, we know exactly how the verifier constructs messages in noninteractive proofs. This is because she does not really exist, and all of her messages are explicitly simulated by defined functions. Effectively, the ‘verifier’ is honest, allowing more flexibility in arranging the proofs while retaining the zeroknowledge property.
As mentioned at the top of the post, noninteractive zeroknowledge proofs are a kind of digital signature algorithm. The proof itself is just a message which can be verified to determine that it was constructed by someone in possession of some secret data satisfying defined properties. Validating the proof requires applying the hash function to the prover’s messages to simulate the verifier’s choices. This can be modified by appending an arbitrary message m in the hash function argument which, using the Fiat–Shamir heuristic described above, is the same as prepending m to the prover’s list of messages. This has the effect of changing the random number seed, so we still obtain a zeroknowledge proof of the same fact, but with respect to a different pseudorandom number scheme. Such a proof can be verified to confirm that it was constructed by someone in possession of both the secret data and the message m. Effectively, then, the proof acts as a digital signature for the message m, and the secret data is the private key. From this viewpoint, the public key is some data encoding the precise properties that the secret data is required to satisfy.
Taking the example where Bob’s secret data is a private key associated with a given public key, the noninteractive proof is just a signature scheme corresponding to these keys. Alternatively, if Bob’s secret data is a file with a specified SHA256 hash, then it gives a signature scheme where the file is the private key and the hash is the public one. Where Bob is claiming to have a 3colouring of a given graph, the colouring is the private key and the graph is the public key.
When looking at the security of noninteractive proofs, we need to take into account that the prover could use a bruteforce attack. For example, suppose that we start with an interactive proof where, if Bob does not have the claimed secret data, he has a 1 in a million chance of fooling Alice that he does. This may be an acceptable level of security. For the corresponding noninteractive one constructed using the Fiat–Shamir heuristic, Bob builds the proof in private. If he does not have the secret data, this will still only have a 1 in a million chance of being valid. However, Bob could construct lots of proofs, making small changes to his choices of messages in each iteration. It is quite possible that he could could construct millions of them, and then have a high chance of finding a valid proof and fooling the verifier.
Such bruteforce attacks need to be considered when looking at the probability of someone without the secret data being able to prove that they do. The probability p of the interactive proof being broken should be multiplied by a large number N representing the number of bruteforce iterations to give the updated probability bound Np. Equivalently, if p is the acceptable level of soundness for the interactive proof, then this should be divided by N for the noninteractive one. Suppose that Alice accepts a probability of 1 in a million as being an acceptable chance of incorrectly believing that Bob has the secret, but also that Bob could potentially run a billion iterations of the construction. Then, she should be aiming for a probability bound of 1 in a quadrillion for each attempted proof construction.
In practice, a probability of 2^{128} of breaking the soundness and finding a valid proof is considered sufficient to allow for a huge number of iterations using all the computing power in the world over a long period of time, while still leaving a negligible probability. This consideration is as with any digital signature or secure hashing algorithm, where we must suppose that an attacker has a massive amount of computing power over a long period of time (many many years) to try and bruteforce a solution.
A further consideration is that, in a proof built up from many messages, Bob could try and bruteforce each individual message rather than the whole proof at once. For example, consider the first proof of private key described in the previous post. After Bob’s first message consisting of a point on the curve, Alice has a binary decision to make, knowing that if Bob does not have the secret, he can only answer one of these two options. So, Bob can only have at most a 50% chance of fooling Alice. Repeating this n times, he only has a 2^{–n} probability. If we converted this using Fiat–Shamir as in figure 3, then it consists of a sequence of n exchanges with Alice’s choice at each time being determined by a hash of Bob’s previous messages. Even though brute forcing the entire construction would require an infeasible 2^{n} attempts on average, he can attempt to force each message. On each of the n stages, Bob could try multiple choices for his first message until he can correctly respond to Alice’s choice. On average, this requires 2 choices. So, Bob is able to bruteforce a proof that the verifier would accept, and it would only require an average of two tries for each stage.
The issue of bruteforcing each individual stage can be addressed by just looking at 3message proofs. These consist of an initial random message from Bob, followed by a randomized question from Alice and a response from Bob. If our noninteractive proof is not of this form, and contains multiple messages from Alice, we can try to reorder the messages to put it in this form. This may not be possible in the interactive version while retaining the zeroknowledge property. For noninteractive proofs, the fact that the verifier is always honest since she is simulated by a random oracle, makes this much easier to do. Updating figure 3 gives the 3message version shown in figure 4 below. The hash function is only applied once, to Bob’s initial message.
As an example, consider the second proof of private key given in the previous post. Recall that the keys are defined using an elliptic curve E, which is a cyclic group with generator point G and order equal to a huge prime number. The private key is an integer x in the range from 0 to p – 1 inclusive, and the corresponding public key is given by multiplying this by the group generator, P = x.G. The proof that Bob knows x is a 3message interactive protocol consisting of the following exchanges.
As long as Alice chooses e at random in the range 0 to p – 1, the probability that Bob can give a valid response without knowing the private key x is bounded by 1/p. We noted that this is not really zeroknowledge, which Alice can exploit by choosing e in a way that depends on R. In the interactive case, this was fixed by adding an additional message where Alice commits to her choice of e upfront. In the noninteractive version, this does not matter since we fix Alice’s message to be honest by simulating it with the hash function h. We can take e = h(R). While this does technically depend on R, under the random oracle model it is effectively a random value so, statistically, is independent of R. It also assumes that the hash function output ranges over the integers from 0 to p – 1. Bob’s simulation of the interactive proof is now as follows.
The noninteractive proof sent to Alice just consists of Bob’s two messages:
A verifier can check the proof by rerunning Bob’s simulation, which is the following procedure.
If Bob knows the private key, then he produces a valid zeroknowledge proof by selecting his messages in the same way as in the interactive case, which gives the following procedure.
Note that the noninteractive proof provided here is just the standard Schnorr signature corresponding to an empty message, and the verification procedure is the same as for Schnorr signatures. In fact, if use a string m to seed the random numbers by using h(e‖m) in place of h(e), we recover the Schnorr signature and verification procedure for message m. We have reinvented Schnorr signatures!
I have already demonstrated how one of the proofs of private key from the previous post can be made noninteractive, leading to the Schnorr signature algorithm. However, a different interactive proof was also described in the previous post, consisting of the following steps.
In step 2, Alice selects from her two options with a 50% probability for each. If Bob does not know the private key, then he will only be able to give a corect response for one of these. Hence, the probability that he can fool Alice into accepting the proof is 1/2. Since this is much too large, the steps above are repeated some number n times to reduce the probability bound to a negligible value 2^{–n}.
If Bob does now the private key x, then he selects R for the first message equal to t.G, where t is chosen at random from 0 to p – 1 inclusive. He can then use s equal to either t or ex + t in the final message, depending on Alice’s choice.
Although it is much less efficient than the Schnorr signatures described above, to demonstrate the ideas, I will describe how this method can also be made interactive. The first problem with this method is that it consists of multiple stages all of which Bob has a relatively large probability of 1/2 of erroneously proving he has the secret. Although these probabilities compound to a negligible value, in the interactive case he could bruteforce each stage in turn by trying different choices of message. To fix this, we reorder the interactions to obtain a 3message proof, with each message consisting of an array of n submessages combining the stages from above.
Here, we are assuming that Alice is honest, so that her binary choices are made independently of Bob’s message. Using the Fiat–Shamir heuristic gives the following proof performed by Bob alone.
This construction requires us to use a hash function h with at least n binary digits. The noninteractive proof that Bob sends to Alice is given by his messages from the simulated proof:
To verify the proof, Alice reruns the steps above using the given values for Bob’s messages.
If Bob knows the private key, he can construct a valid proof in the same way as described for the noninteractive case. Specifically, he chooses integers t_{i} randomly in the range 0 to p – 1 inclusive, and sets R_{i} = t_{i}.G. If the i‘th bit of the hash is zero, he takes s_{i} = t_{i}, otherwise s_{i} = x + t_{i}.
If Bob does not know the private key then, as for the interactive case, for any choice of points R_{1}, ,R_{2}, …, R_{n}, the chance that he can find the corresponding integers s_{i} is no more than 2^{–n}.
For the final example, I will look at how the zeroknowledge proof of a 3colouring for a specified graph. Since this is an NPcomplete problem, every other problem in class NP can be converted to this and, hence, be described by a noninteractive zeroknowledge proof. That is, in theory. It would be rather impractical to convert problems to 3colourings and apply the method here. Instead, zkSNARKs and zkSTARKs can be used, which are outside of the scope of this post.
Bob can interactively prove to Alice that he knows a 3colouring by the following steps.
The commitment that Bob sends to Alice in the first step is a Merkle root of the tree, whose leaves are the vertex colours (concatenated with random strings, to avoid revealing information). If Bob does not know a 3colouring, then there must be at least one edge whose vertices are not distinct allowed colours. So, if Alice chooses at random from the m graph edges, the probability of Bob giving a valid response is no more than 1 – 1/m. The steps are repeated a number n times to reduce this probability to (1  1/m)^{n}. Choosing n large enough, this can be made negligible. If Bob does know of a 3colouring, then he can use this in step 1 combined with a random permutation of the colours to make it zeroknowledge.
As in the alternative proof of private key, all of the n iterations can be combined into a 3message proof, and the Fiat–Shamir heuristic gives a noninteractive proof.
The hash function used must have enough binary digits to apply step 3. This is easiest if the number of edges m is a power of two, m = 2^{k}, so that k digits of the hash can be used to select the edge.
The noninteractive proof sent to Alice consists of Bob’s messages.
This is verified by Alice going through the same steps as for Bob above, using the values of Bob’s messages taken from the proof.
]]>Alice is busy trying to find Waldo in the picture above (or, Wally, for those outside of North America). After some time without success, she is doubtful that he is even in the image at all. Bob assures her that Waldo is there, since he already found him. However, this is not enough to satisfy Alice. She needs proof before spending any more time on the task. How is it possible for Bob to prove this to her, without giving the game away by pointing out his location?
Assuming that the picture is printed out on a sheet of paper, Bob could take a large piece of cardboard and cut out a small Waldoshaped hole. Then, while keeping the hole covered, he slides the paper underneath and, after correctly positioning it, he uncovers the hole to reveal Waldo! Alice can look with her own eyes and see his face, which looks like this: . She agrees that Waldo is in the image but, since all of the rest of the picture is covered by the cardboard, Bob has not revealed the location. So, she continues with her task in the knowledge that, at least, it is possible.
What Bob has done here, is provided a zeroknowledge proof that Waldo is in the picture. That is, he did this while providing no information to Alice other than the fact that he is there. You could argue that he did provide some information. Specifically, he showed Alice the exact size and appearance of Waldo’s face. However, if we assume that this information is no secret and is already publicly known, then it is true that Bob did manage to prove Waldo’s existence in the image without revealing any other previously unknown information.
Zeroknowledge proofs are a very useful cryptographic technique, finding important applications in cryptocurrencies. This includes zkSNARKs, zkSTARKs and zkRollups, in which there is a growing interest. I do not go into the details of such uses in this post, and concentrate on the idea of zeroknowledge. There are many other simple examples that can be given along the lines of the ‘Where’s Waldo?’ one above, but I will not go through these here. We will look instead at practical cases which can be performed by transferring digital information over a communication channel, such as the internet. For more examples similar to the one above, I refer to the Wikipedia article, which includes the ‘Ali Baba cave’ and ‘two balls’ demonstrations.
The idea is that one party, Bob (the prover) is privy to some secret information. Maybe he has the private key associated with a publicly known Bitcoin address. Or, he knows how to prove some previously unsolved mathematical conjecture. Or, he has a file whose SHA256 hash is equal to a given value. He wants to prove to Alice (the verifier) that he has this information without revealing the information itself.
I consider interactive proof systems, which consist of the prover and verifier exchanging messages until, eventually, the verifier either accepts or rejects the prover’s claim to know the secret information. This is as in the figure below, showing the flow of information including messages sent between Alice and Bob, until Alice accepts or rejects the proof.
There are a few points worth noting. The last message will always be from the prover (Bob) to the verifier (Alice). This has to be the case, because if Bob does not respond to Alice’s final message then it cannot play any part in convincing Alice of the claim. Next, all of Alice’s messages will be chosen at random. Otherwise, if they are deterministic, then Bob would be able to predict her messages, so they would contain no information and would not be necessary.
The three properties that a zeroknowledge proof system should satisfy are:
The first two properties should hold for any proof system, and do not relate to the zeroknowledge property. Starting with the the first property, if the Bob has the claimed knowledge, there has to be some protocol which he can use to construct messages that convince Alice of this fact. For the second property we cannot assume that that Bob is honest, or is following any specific protocol. If we are not prepared to accept at face value that he is telling the truth about having the claimed knowledge, why would Alice trust him to be constructing his messages according to an agreed protocol? So, regardless of how he constructs his messages, if he does not have the claimed knowledge then it should not be possible to convince Alice that he does. There is a slight technical point here. Due to the randomness inherent in these proofs, there will be some probability that Alice erroneously accepts Bob’s claim. The idea is that this probability can be made negligible.
The third property is what concerns us in this post. The proof should be zeroknowledge, so that Bob does not leak any information about his secret. In fact, he should not leak any information beyond the fact that he has access to the claimed knowledge. It is tricky to make this idea mathematically precise. Bob does send messages but, what does it mean to say that these do not contain any information? The general approach is that Bob randomizes his messages to obscure any information, beyond the fact that he has access to the secret. I will look at how we can define and prove the zeroknowledge property later
To summarise the ideas in an interactive proof protocol between prover Bob and verifier Alice:
There are actually two types of statements regarding existence of a secret that we could look at trying to prove.
We are only concerned with the second type of statement here. The prover Bob is trying to convince Alice that he knows of a secret satisfying some specific properties, rather than just that it exists. If he succeeds, then Alice will know that the secret exists, since she is convinced that Bob has this information She does not have the knowledge of what it is, so is not able to prove that she does to any third property. Interactive zeroknowledge proofs considered here are only able to convince the specific verifier who is taking part in the procedure. Bob would need to repeat the process if we later need to repeat the proof to a third party. For blockchain applications, the prover will typically be constructing a contract or transaction to be submitted to the chain. The verifier is any third party who is validating the blockchain. As such, there are no messages sent from the verifier to the prover, so interactive proofs cannot be used. For these uses, any of the interactive proofs discussed in this post would need to be transformed into a noninteractive format, which I will not look at here, but will follow up in a later post.
A colouring of a graph consists of assigning a colour to each vertex, so that no two vertices sharing an edge are assigned the same colour. It is called a kcolouring if it uses no more than k colours in total. While it is easy to check whether a graph has a kcolouring for k = 1 and k = 2, for any larger values of k, it is an NPcomplete problem. I will consider 3colourings.
Suppose that, for a specific graph, Bob has found a 3colouring and wants to prove this fact to Alice without giving any information on the colouring itself besides the fact that it exists. The following interactive proof can be used.
This example could be carried out either physically with a graph drawn on a piece of paper, or with digital data transmitted between Alice and Bob. In the physical case, in step one he ‘commits’ to a colouring by actually filling in each vertex using a red, green or blue pen, and covers them with small pieces of paper to hide the colours from Alice. In the third step, he uncovers them by removing the paper covering the two vertices of the selected edge.
For the situation involving a digital communication channel, we would agree on an ordering of the vertices so that a colouring is given by an array consisting of the symbols ‘R’, ‘G’ and ‘B’. To commit to a specific colouring while ‘covering up’ his choice, the following could be done. For each vertex, he selects a secret random string, starting with its colour symbol and long enough that it is not feasible for Alice to guess. He then computes their hashes, and sends the array of these hash values to Alice. Due to preimage resistance, she is not able to work out his colouring from this. In the third step, he ‘uncovers’ two vertices by sending their strings to Alice. She can check that they have the correct hash, and see Bob’s colour choice by looking at their initial characters. Due to collision resistance, it is not possible for Bob to change his colour choice after committing to them in the first step. For efficiency reasons, with large graphs Bob would likely just send Alice the Merkle root of his array of hashes rather than each individual vertex hash, but that is not important for the current discussion.
As finding graphcolourings is NPcomplete, a zero knowledge proof for this implies that all problems in the NP complexity class have a zero knowledge proof. That is, whatever secret data Bob claims to have, if there is a polynomialtime algorithm verifying that it satisfies the required properties, then we could construct a zero knowledge proof that Bob has the data. This does not necessarily result in a practical procedure though.
Soundness: If the above proof steps are carried out multiple times so that Alice is convinced that Bob will always correctly reveal two distinct colours in step 3, regardless of her choice of edge, then she would also agree that Bob has a correct 3colouring. This is because, if the pair of vertices associated with every choice of edge shows up two distinct colours of either red, green, or blue then, by definition, it is a 3colouring.
We can be a bit more precise. Suppose that, in step 2, Alice chooses one of the m graph edges uniformly at random. If Bob does not have a 3colouring, then at least one of these choices would fail to be correctly verified in step 3. The probability of this happening is at least 1/m. Suppose that the procedure above is repeated n times, and Alice chooses her edge in step 2 independently and at random each time. Then, the probability of Bob revealing valid colours in step 3 every time is bounded by
The full interactive procedure is as follows. First, an integer n is chosen large enough that (1  1/m)^{n} is negligible. The 3 steps above are executed in order n times with, at each stage, Alice making her choice in step 2 entirely randomly. If Bob reveals valid colours in step 3 for each of these runs, Alice concludes that he has a 3colouring.
Completeness: If he really does have a 3colouring, then it is straightforward to ensure that whenever the steps of the interactive proof are performed then valid colours are always revealed in step 3. All he has to do, is commit to a valid colouring in step 1. So long as he does this, Alice will conclude that he has a 3colouring.
ZeroKnowledge: Bob does have to use some care to ensure that he does not reveal any information of his 3colouring to Alice. Suppose, for example, multiple runs of the procedure are carried out, but Bob commits to this same 3colouring every time. Alice could select different edges on each run so that, by the time she has chosen edges connecting every one of the vertices, she would know his entire 3colouring.
To avoid such issues, Bob can do the following. He starts with a specific colouring using only red, green and blue. Before step 1 above for each run, he applies a random permutation to the colours. This means that step 3 will always reveal a pair of distinct colours chosen uniformly at random from red, green and blue. Since this is a known fixed distribution, not depending on Bob’s colouring, Alice gains no further information.
In a public key cryptosystem, a participant starts by choosing a key pair consisting of a private (or secret) key and a public key. As the names suggest, the value of the private key is kept secret, whereas the public one can be freely distributed. The way that this works, is that the private key is chosen at random by the participant, making it virtually impossible for any third person to guess its value. The public key is computed from this by a oneway function, so that there is no known way to invert it to recover the private key. As discussed in the post on digital signatures, the private key is used by the participant to digitally sign messages, which can then be verified by any third party in possession of the public key.
Public key cryptography, such as that used by Bitcoin, is often based on an elliptic curve E. Assume that this curve is a cyclic group of order a huge prime number p. For example, the secp256k1 curve used by Bitcoin has order
0xfffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e8cd0364141 
I use additive notation for elliptic curves, so that the group operation applied to two points P and Q is written as their sum P + Q. The product of an integer x and group element P is x.P. Note that multiplicative notation is often used, in which case the product is instead written as the power P^{x}, but this is nothing more than a notational difference. A public key cryptosystem starts by fixing a base element (or generator) G of the group.
A private key is nothing more than an integer x in the range from 0 to p – 1 inclusive. This is chosen at random, and the public key is simply the product of x with the group generator,
This setup is as I described previously for Schnorr signatures, but the ECDSA algorithm also has exactly the same setup for the key pair, differing only in the way signatures are constructed and verified.
Suppose that Bob claims to have the private key associated with a known public key P. How can he convince a third party, such as Alice, that he does indeed have this information? Clearly, he does not want to give away the private key, since this would also give access to any Bitcoin secured by it. One method, which is used in practice, is for Bob to sign a message of Alice’s choice (within reason…he would not sign a Bitcoin transaction giving Alice access to the coins). Alice can then verify the message. This is not truly zero knowledge. Even though Alice has no way of recovering the private key from a digital signature, she has still gained knowledge of a valid signature for that specific message which, if she was not trustworthy, she could try and pass off as her own signature to another party.
Instead, I look at a zeroknowledge approach by which Bob can convince Alice that he knows the private key. Consider the following exchange of messages between Bob and Alice.
Soundness: In the second step, Alice has a binary choice to make. If Bob is able to respond correctly, regardless of her choice, then we can argue that he has access to the private key. This is because, he must be able to come up with two integers s_{1} and s_{2} satisfying,
Taking the difference of these,
So, if Bob is able to produce numbers s_{1} and s_{2} corresponding to Alice’s available choices at the second step, he just needs to take their difference to obtain the private key, x = s_{2} – s_{1} (modulo p). Hence, if Bob is consistently able to send a valid integer s in step 3 for multiple runs of the protocol above, Alice will conclude that he has access to the private key.
We can be a bit more precise. Suppose that, in step 2, Alice makes her choice of Q at random, with equal chance of picking R and P + R. If Bob did not know the private key, then he would only be able to give a successful response in step 3 for one of these, which has a probability of 1/2. So, it is possible that Bob manages to fool Alice entirely by chance with a single run through the procedure, but only with a probability of 1/2. Suppose that we were to repeat the procedure a number n times, with Alice making her choice in step 2 independently for each run. If Bob does not know the private key, then the probability that he successfully sends a valid value for s in step 3 on all runs is no more than 2^{–n}.
The full interactive procedure is to first pick n large enough that 2^{–n} is negligible (e.g., n = 128). The 3 steps above are executed in order n times with, at each stage, Alice making her choice in step 2 entirely randomly. If Bob is able to send a valid integer s in every one of these runs, Alice concludes that he has the private key.
Completeness: Suppose that Bob starts by choosing an integer t and sets R = t.G. Then, if Alice chooses Q = R in step 2, he responds with s = t in step 3. On the other hand, if she chooses Q = P + R, then he responds with s = x + t (modulo p). This satisfies the requirements since,
Hence, Alice will be convinced that he knows the private key x.
ZeroKnowledge: In the procedure followed by Bob in the completeness argument, he needs to be careful about how his initial number t is chosen. For example, if he used the same value for two separate runs, so that Alice is sent the the same point R, then she would be able to learn the private key. To do this, she simply makes a different choice in step 2 of each of these runs. Suppose she chooses R and P + R in the two runs, and Bob responds with numbers s_{1} and s_{2} respectively. Alice verifies that,
Taking the difference gives,
so that she can compute the private key as x = s_{2} – s_{1} (modulo p).
To avoid leaking such information, Bob can make his choice of t uniformly at random over the integers from 0 to p – 1 inclusive, and independently for each run of the procedure. Then, R = t.G will be a uniformly random point of the curve. It follows that P + R is also a uniformly random point, and the integer s sent at the final step will also be uniformly random, for each of the choices that Alice can make. The only information that Alice ends up with is a random integer s and a random point on the curve equal to s.G, which she is able to compute herself already. So, there is no information to be gained other than the fact that Bob could respond regardless of which choice she made, giving a zeroknowledge proof. This is a little vague, and we still do not have a precise definition of ‘zeroknowledge’, but it gives the idea and I will make this a bit more precise in a moment.
In the examples above, I argued that they were zeroknowledge proofs, so long as Bob randomizes his messages in the proposed fashion. My arguments were a bit handwavy, which is inevitable since we have not yet given a proper logical description of what ‘zeroknowledge’ even means. It is important to have a definition, so that we are able to evaluate whether or not a proof is really zeroknowledge. We need this since, otherwise, it is possible for it to leak information in ways that we had not considered.
The proof system described above for Bob to convince Alice that he has possession of a private key x corresponding to known public key P could be generalized in the following way, known as the Schnorr protocol.
The procedure given previously was just the same as this but, effectively, only allowed Alice to select e equal to 0 or 1 in the second step. This updated version is also both sound and complete. Suppose that, for two different possible choices e_{1} and e_{2} in step 2, Bob is able to answer with valid values s_{1} and s_{2} respectively in step 3. This implies that he knows solutions to,
Taking the difference gives
allowing him to easily compute the private key x as (e_{2}–e_{1})^{1}(s_{2}–s_{1}) modulo p. So, if Bob does not have access to the private key, there is at most one value of e for which he could give a valid response in step 3. If Alice chooses e uniformly at random, the probability of selecting this specific value is 1/p, which is negligible, and much better than the previous 1/2 bound. So, only a single run through the steps should be enough to convince Alice that Bob has access to the private key.
For completeness, Bob can start by selecting an integer t and setting R = t.G in step 1. Then, in the final step, he can respond with s equal to the value ex + t to convince Alice that he has knowledge of the private key.
At first, you might think that this procedure is also zeroknowledge. After all, if Bob acts as just described and selects the value t uniformly at random on the range 0 to p – 1 inclusive, then his R value will be a uniformly random point on the curve. For each specific choice of e by Alice in step 2, the value of s that Bob responds with will also be uniformly random. However, it is not zeroknowledge.
Recall the Schnorr digital signature algorithm. A valid signature for a message string m is equivalent to a triple (R,e,s), for a curve point R and integers e and s (modulo p) satisfying,
Here, h is the hash function with argument being the concatenation of digital representations of R, P and m. If, in the interactive proof procedure outlined above, Alice computed the value of e as here, then Bob’s value of s in the final step would provide her with the digital signature. That is, the interactive proof procedure above allows Alice to trick Bob into signing any messages that she wants! This is not only not zeroknowledge, but gives away knowledge that could be catastrophic for Bob and allow his Bitcoin to be stolen.
We need a proper definition of zeroknowledge which ensures that Bob does not give away any sensitive information.
It is possible to rescue the the second proof above that Bob knows a private key, and make it zero knowledge. The idea is to ensure that Alice’s choice of e does not depend on R in any way. This can be done by requiring Alice to commit to e by sending its hash before she receives R from Bob.
The proof of soundness and completeness follows in much the same way as above. We could also make Alice concatenate e with a random string before taking its hash, just to make sure that Bob is not able guess its value before choosing R. However, since e is chosen randomly from such a large set that it is infeasible for Bob to check, this is not important. It is still the case that, if Bob does not know the private key, then the probability that he can fool Alice that he does is negligible, at about 1/p.
It is no longer possible for Alice to trick Bob into signing messages, since she cannot choose e to depend on R. Given that the previous proof leaked sensitive information when, at first glance it seemed good, we might still be a bit uneasy about using this modified version. However, as we will see, it is indeed zeroknowledge.
An interactive proof procedure can be shown to be zeroknowledge by using a simulation to replace the role of the prover, Bob. This simulator is bound by the same rules as Bob, but does not have access to any private information. It is only allowed access to knowledge that Alice already has. At the same time, we ask that the simulation is able to fool Alice, via the interactive proof, that it does have access to the secret data. The idea is that, if Bob was to use the interactive proof to convince Alice that he knows the secret but, at each step, his messages have the same random distribution as the simulator, then it must be zeroknowledge. This is because he is only providing information that Alice can already compute by running the simulation by herself.
There is a rather big and obvious problem with this idea. If we can find a simulator which can fool Alice into believing that it has the secret data, then the interactive proof system cannot be sound. It is a basic requirement that it is not possible for anyone without knowledge of the secret data to be able to fool Alice into believing that they do. At least, not outside of a tiny probability.
To have any chance of finding a simulation which can fool Alice, we need to give it some ability not granted to any real prover. Specifically, the simulator is granted unlimited doovers. This means that, at any time, it is allowed to effectively rewind time to an earlier point of the procedure and try again.
Consider the graphcolouring problem. A simulator could be designed such that, at the first step, it colours each of the vertices independently red, green or blue at random. This is unlikely to be a valid colouring, but never mind. At step 3, when the simulated ‘Bob’ reveals the two vertex colours, they will both be independent and uniformly random. So, there is a 1 in 3 chance that they are the same, and he fails the test. If this happens, he requests a doover, goes back to step 1, and starts again with a new random colouring of the vertices. If it goes wrong again, he just requests another doover and, soon, until eventually in step 3 the two uncovered vertices have distinct colours. When that happens, they will be uniformly distributed over all possible pairs of distinct colours from the allowed choices of red, green and blue. This is just the same as for the real Bob who uses an actual 3colouring with a random permutation applied to the colours.
Consider the first interactive proof that Bob uses to convince Alice that he possesses the private key. This can also be done by simulation. At the first step, simulated Bob chooses a random integer s in the range from 0 to p – 1 and sets Q = s.G. He also randomly selects a curve point R equal to either Q or Q – P, both choices with 50% probability. This is the value he sends to Alice, and will be uniformly distributed over the curve and, independently, Q is equal to R or P + R, both with 50% probability. In step 2, Alice has a 50% chance of choosing the same value that the simulator has for Q, in which case it responds with its value of s. Otherwise, it requests a doover and starts again. When the process successfully terminates, R will be uniformly distributed on the curve just as for the real Bob who sets R = t.G for a random integer t.
We also can try building a simulation for the second, non zeroknowledge proof of possessing the private key, and see what goes wrong. Simulated Bob would independently choose random integers s and e in the range from 0 to p – 1 and set R = s.G – e.P. If, in step 2, Alice chooses the same value for e, then he responds with his value of s, otherwise requests a doover. While this technically works, the probability of Alice choosing the correct value of e is only 1/p, which is tiny. The expected number of doovers is then p, which is huge, and is similar to simply trying to crack the private key by a bruteforce search of the whole space. While theoretically possible, this is not feasible, and the simulation would never end in any reasonable length of time.
The third proof that Bob has the private key does have a practical simulation. In step 2, simulated Bob chooses R however he likes. Then, after Alice reveals e in step 3, Bob rewinds, chooses integer s uniformly at random, and replaces R by s.G – e.P. If he sends this same value of s in step 4, then the proof succeeds. This value of R is uniformly randomly distributed, just as with the real prover Bob, so we can conclude that it is a zeroknowledge proof.
These considerations show that we should put some restriction on the complexity of the simulation. A reasonable way to prove that an interactive proof is zeroknowledge is, then, to construct a simulation with unlimited doovers, which almost surely terminates with a reasonable amount of computation. By ‘reasonable’ here, we mean that it can feasibly be performed by Alice. A more mathematical condition, is that it runs in probabilistic polynomial time. If the resulting messages have the same joint distributions as ones with the real Bob, who has access to the secret data, then we say that it is zero knowledge.
In the graph colouring problem, the simulated probability of having to redo the steps was 1/3 each time, so the total expected number of repetitions is, on average, just
Similarly, for the first proof of private key possession, the probability of repeating was 1/2, so the expected number of repetitions is,
The argument why the existence of such a simulation implies zero knowledge, is that Alice could perform it all by herself. Since the messages will have the same distribution as those from real Bob, we can say that these do not provide any information that Alice can’t compute on her own, other than the simple fact that Bob is able to successfully pass the test without resorting to doovers. If this can be done by anyone with access to the secret data, Alice does not obtain any knowledge beyond the fact that Bob knows the secret.
Simulation also shows why zeroknowledge proofs are probabilistic. At any time, the simulator has a nonzero chance of not requiring a doover. There is a nonzero, but possibly vanishingly small, probability that it makes it all the way through the process without any doovers at all. This fools Alice into erroneously accepting that it has access to the data.
A more technical definition is obtained by replacing the roles of the prover Bob, the verifier Alice, and the simulator, by probabilistic Turing machines. These compute the messages to be sent from the previously received messages. Let us suppose that the prover Bob’s messages are computed by Turing machine P. Suppose also that, for any choice of verifier Turing machine V which runs in probabilistic polynomial time (PPT), then there is a simulator S which is also a Turing machine running in PPT. This simulator generates both of Alice and Bob’s messages, and with the same joint distribution as the original prover/verifier combination. Then, we say that the interactive proof is zeroknowledge. In practice, the simulator would work by running the verifier Turing machine for Alice’s messages, and rewinding to an earlier state when it is not able to continue.
]]>While your first answer to the question above is probably “no, this is not possible”, there is one way in which it can be done with, albeit, a rather large caveat. Bitcoin transaction outputs incorporate a script, which is a simple stackbased programming language used to restrict how the coins can be spent. Output scripts starting with the OP_RETURN
opcode cannot be spent in any way. More than this, they are provably unspendable, so are ignored by Bitcoin nodes which do not even record such outputs in the UTXO set. The output quantity would likely to be set to zero, so as to avoid irretrievably destroying any bitcoin associated with the output. This means that we can include any arbitrary data following the OP_RETURN
to be recorded on the blockchain, which is simply ignored by all validating nodes. This is one of the uses of Bitcoin, as an immutable and decentralized store of data.
So, all we have to do, is to include smart contract code in OP_RETURN
outputs using whatever language, such as EVM, that we want. Ok, this does not form part of the protocol, so is ignored by Bitcoin nodes. Blockchain explorers would show the code as raw data, but would not interpret the contract language. This is a rather large caveat, and you may well reply that it is not really ‘running’ on Bitcoin and, really, all that we are doing is using the blockchain as data storage. However, it is interesting to consider and, even if due to reasons to be discussed below, it is not very efficient, these methods are implemented by the Omni Layer and also lead on to ideas such as Proof of Transfer.
While standard Bitcoin nodes would not attempt to validate such smart code, and simply ignore it, it would not be difficult to write applications which scan the blockchain for all OP_RETURN
statements. Any contained data would be handled as a transaction in the smart contract language, such as EVM. The application would store the ‘smart chain’ state and parse each of these contracts in turn, rejecting it if it is not valid and, otherwise, using it to update the state. This is as in figure 1, with OP_RETURN
Bitcoin outputs marked in green, which are then arranged in order. In the figure, arrows are drawn from each Bitcoin block header to the previous block, as well as from each valid smart contract to the previous one. Any which are not valid smart contracts with respect to the current state are marked in red, and ignored, with the remaining valid ones processed in sequence. Anyone with a copy of this application would be able to run it and see the current smartchain state.
What we are doing here, is effectively building a new blockchain on top of Bitcoin with the ability to run a smart contract language. This means that we do not need to build a new consensus mechanism, such as the proofofwork (PoW) used by Bitcoin. As such, we do not need any new miners for our chain, or a network of nodes validating it. All of that hard work is handled automatically by the Bitcoin network. All we need to do, is launch our application which can see a copy of the Bitcoin blockchain, and we can see the state and all transactions for our new ‘smart’ chain. The work of miners creating blocks and of nodes validating the chain is entirely separated out from the work of interpreting the smart contracts.
This idea begs the question: why don’t all smart chains separate out the process of validating the blockchain from interpreting the smart contract language. In fact, they could do this, but it would create some issues. Miners are rewarded by being paid out in a native chain asset, and contracts have to pay in this same asset in order to be included. At the very least, this effectively forces miners to process all transactions which can impact the validity of these payments. Anyone designing the protocol would likely want to include such an asset in the base protocol to ensure that miners are paid and give the chain a reasonable chance of success. In that case, we either include the smart contract ability in the protocol, or else it will be the case transactions in the native asset cannot be controlled, or affected in any way, by the smart contracts. In fact, this is exactly the situation with building a smart chain on top of Bitcoin as described here.
If we were to build a smart chain on Bitcoin as described, it is possible (and desirable) to allow our smart contracts to see the Bitcoin state. Since each contract exists in the Bitcoin blockchain, it has a specific Bitcoin state. To allow contracts which depend on the Bitcoin state — such as smart contracts which are triggered by a payment of bitcoin to a specific address — we should allow our smart contract language to have keywords referencing the Bitcoin state. However, the other way round is not possible. As Bitcoin nodes do not care about the contents of OP_RETURN
statements, the ‘smart chain’ state cannot impact Bitcoin transactions in any way. Still, we would have a chain supporting quite general smart contracts which can see the Bitcoin state, has the same security as Bitcoin, and without requiring building our own consensus mechanism or needing miners or validators.
The ideas discussed above have been implemented in the Omni Layer. This does not use a fully functional programming language such as EVM, but does allow custom digital assets to be represented, transacted and traded directly on the Bitcoin blockchain. The Omni layer was launched back in August 2013, along with its own asset, Omni. This predates smart chains capable of handling custom assets, such as Ethereum which was officially launched in 2015. The Omni layer was initially popular for handling alternative digital assets. Notably, the Tether stablecoin pegged to the US dollar was launched on Omni but, now, mainly uses other platforms such as Ethereum. For documentation of the protocol, see the github page.
From the Omni Explorer, at the time of writing it can be seen that many Bitcoin blocks have no Omni layer transactions at all and, those that do, only contain a handful of transactions. These are mainly in Tether, with a few other altcoins also appearing (Omni token, MaidSafeCoin, …). This is to be expected since, for the reasons to be discussed below, representing transactions directly on Bitcoin is rather inefficient.
As an example, consider Bitcoin block 736873 which contains 4 Omni layer transactions, all in Tether. Choosing one of these, a transfer of 742 Tether, it is represented by a Bitcoin transaction with the Omni layer code in an OP_RETURN statement. Looking in a Bitcoin explorer, the transaction OP_RETURN output contains the following 20 bytes of data. I express this in hexadecimal and break it down into the fields specified by the omni protocol.
Omni Layer Transaction  
prefix  6f6d6e69 (‘omni’) 
version  0000 
type  0000 (simple send) 
currency  001f (31=Tether) 
quantity  0000001146aa2600 
The quantity of coins is scaled by 100,000,000 so that the number in this example, which is 74,200,000,000 in decimal, represents 742 Tether. The receiving and sending addresses are given by the remaining two output addresses of the Bitcoin transaction.
There are some drawbacks of this idea, which are enough of a dealbreaker to make it impractical for many use cases.
For the remainder of this post, I consider how we can overcome the first two of these points (spoiler: we will end up with a protocol along the lines of proofoftransfer as used by the Stacks blockchain). The first idea is to bundle together multiple contracts into the same OP_RETURN
output. This will be a small efficiency improvement, but the entire contents of each smart contract is still being included and taking up space in the Bitcoin blockchain, incurring large costs. There is no way to avoid this, other than to group transactions together and severely compress them to only use up a small amount of valuable blockchain space. Reversible compression algorithms are not going to be able to attain anything like the required space savings. Instead, we need to use something like a hash function. This converts an arbitrarily large block of data into a fixed size, such as 256 bits for the SHA256 hash. Second preimage resistance means that it uniquely identifies the original data, although we cannot invert the hash to recover it. Instead, we need to have separate access to the block, and validate it by checking that it has the correct hash value.
The proposal is now as shown in figure 2. The hash value of a block of smart contracts is included in a Bitcoin OP_RETURN
output. Our application now needs to have access to all such blocks. It can then parse the Bitcoin blockchain checking for OP_RETURN
statements including their hashes, which are referred to as commitment transactions. Then, it can arrange these blocked smart contracts in order and parse them in sequence, as previously suggested.
We have reduced the large data requirements to only including a relatively small number of hashes in the Bitcoin blockchain, solving the problems identified above. However, at the same time, this introduces some new significant issues. First, it requires people to provide the service of collecting together individual contracts that users have submitted, block them together, and submit Bitcoin transactions containing their hash. There needs to be some incentive to do this, and to cover the cost of the Bitcoin transaction fees. A solution is for our smart contract language to include a native asset. Users can then pay a fee to be paid to the person creating the blocks. Once the total transaction fee value of submitted contracts exceeds the Bitcoin transaction fee, people will be incentivized to create the transaction blocks and add the commitment transactions to the Bitcoin blockchain.
The next issue is that users of our smart contract platform will need access to the blocks. They cannot be constructed from the Bitcoin blockchain alone, so we will have to build a separate network in order to share the blocks. The main issue, however, is that the updated proposal loses immutability. For example, someone could submit a commitment transaction to Bitcoin, but fail to send the associated smart contract block to the network. The result is that the commitment transaction would not contain a hash of any publicly known block. So, it would be rejected and the transactions would not be included in our smart contract state. If they later submitted the block, then it would be belatedly inserted in the smart contract chain, requiring the state to be recomputed and any later transactions could now become invalid. This is totally unacceptable since, at any time, the entire smart contract chain could be reorganised.
A partial fix for the immutability problem is for each block to include a reference to the previous one. This stops people from ‘inserting’ a new block into the chain. The reference could either be included in the header data for the block of transactions or, alternatively, in the data of the commitment transaction itself. The latter approach is a bit more expensive, since it takes up more Bitcoin blockchain space, but it also simplifies things by making it possible to construct the chain of block hashes from parsing the Bitcoin blockchain alone.
Now, if someone adds a commitment transaction to the Bitcoin chain, but does not submit a corresponding smart contract block to the network, subsequent blockbuilders are going to ignore this and build on an earlier block instead. If they were to later submit their block, it would no longer be a part of the main chain. However, we are still vulnerable to large reorganisations of the chain. Someone could add an entire sequence of commitment transactions, each of them building on their earlier ones. If they did not also submit their blocks to the network, the result would be a fork. There would be a public branch, and also the private one only viewable to the rogue block builder. If he later submits his blocks, then there could be a reorganisation. The fix to this is to make valid commitment transactions expensive to create by requiring them to spend a significant amount of bitcoin on top of the transaction fee. The block reward paid in the native smart contract asset would need to be large enough to cover this expense, so that people are still incentivized to create blocks, so long as they are a part of the publicly recognised chain.
The proposed method, with the efficiency modifications, consists of linked commitment transactions in the Bitcoin blockchain, each containing a hash of a block of smart contracts and spending a quantity of bitcoin, to be compensated by a block reward and transaction fees paid in the native smartchain asset. This is reinventing the proofoftransfer protocol, already implemented by the Stacks blockchain and described in an earlier post.
]]>Bob (the prover) has performed a long and intensive calculation, and sent the result to Alice (the verifier). To be sure, Alice wants to verify the result, so requests that Bob also sends a proof of its correctness. The problem is that Alice does not have the time or computational resources to replicate the entire calculation, so it is necessary for Bob’s proof to be significantly shorter than this. So, Bob computes polynomial interpolants of the execution trace, and Alice requests their values of at a small number of random points. She checks that these values satisfy some simple algebraic identities which encode the calculation steps. If they are satisfied, then she agrees that Bob sent the correct result. However, this is assuming that Bob can be trusted. He could be sending any values selected to satisfy the identities. If we are not willing to simply trust that he has performed the original calculation correctly, why would we trust that he is really evaluating the polynomials? Alice needs a way of checking that the received values are consistent with polynomial evaluation. We will discuss how this can be done, and the solution to this problem forms a major component of STARK implementations for blockchain scaling.
Recall the setup. All polynomials are defined over a finite field F of size q = p^{r} for a fixed prime number p and positive integer exponent r. The case q = p is just the integers modulo p, although binary fields of size 2^{r} are also common. Generally, such fields are chosen to have a very large size such as 2^{256}, for reasons similar to the choice of large finite fields in digital signature algorithms. It ensures that the chances of selecting one of a small number of ‘bad’ values entirely at random is negligible.
If Bob’s calculation involves some number N steps, the execution trace will be represented by polynomials of degree less than this,
(1) 
The coefficients c_{i} are in the field F and the bound N on the degree is typically large, maybe of the order of a few million. Despite this, such polynomials are referred to as low degree. This is because the point of comparison is the size q of the field. By interpolation, every function on F can be represented by a polynomial. Most of these will have degree equal to the full size q of the field so, compared to this, N is indeed low. Such functions, consistent with a low degree polynomial, are also known as Reed–Solomon codes.
Now, Alice is able to request the values of f(x) for some randomly chosen field points x. She does not have the processing power to perform a number N operations, so is not able to evaluate the polynomial herself to check that Bob is telling the truth. She cannot even parse through all of its coefficients. The aim is to find a method for Alice to verify that the values f(x) do indeed come from a fixed polynomial of degree less than N. This is low degree testing, and needs to be done with much fewer than N operations by Alice. Something like O(logN), or a small power of this, is a reasonable level of complexity to aim for. More realistically, Alice needs to statistically check that the large majority of Bob’s values are consistent with a polynomial. This process, which is the entire focus of the current post, is known as a Fast Reed–Solomon Interactive Oracle Proof of Proximity (FRI). See the 2017 paper by Eli BenSasson, Iddo Bentov, Ynon Horesh, and Michael Riabzev, for a detailed description of a specific algorithm.
The prover, Bob, needs to commit to a particular function f upfront, which Alice wants to determine is a polynomial. She has oracle access to this function, meaning that she can request its value at field points of her choosing, and verify that the returned values are indeed given by Bob’s preselected function. In theory, Bob could send the polynomial coefficients to Alice so that she can check the values for herself, but this requires a calculation of complexity O(N) which, as discussed above, is much too large for Alice’s limited computing capacity. An alternative is for Bob to precompute f at every point on its domain, and send a hash of this array of values. More specifically, he sends a Merkle root.
Even with his relatively large computing capacity, computing the values at every point in the field F would be way too much to be even remotely feasible. Recall that the field was chosen to be very large, of the order of 2^{256} or similar. Instead, we restrict the domain of the function to be a much more manageable subset S of the field,
Alice is then restricted to requesting the values of f for points x in the domain S.
The size M = S of the domain should be chosen larger than the polynomial degree N to be of any use. Indeed, if its size is smaller than N, then any function can be fitted by such a polynomial. Also, recall the argument from the previous post. If f and g are distinct polynomials of degree less than N, then they can only coincide at fewer than N points. Hence, if Alice chooses a random point x in the domain, the probability of them evaluating to the same value is bounded by,
(2) 
So, if she declares the polynomials to be equal if they evaluate to the same thing at x, then the probability of being wrong is bounded by N/M. This does not need to be a negligible value since Alice can repeat the process n times to reduce the bound to (N/M)^{n}, but the larger M is the better. On the other hand, since Bob needs to evaluate the polynomial at M points upfront, the larger it is, the more work he needs to do. The best choice for the size of S is, therefore, a tradeoff between the work required by the prover and that required by the verifier. The important quantity here is the ratio ρ = N/M, known as the rate parameter. This should always be between 0 and 1, with lower values enabling faster proofs, but requiring the prover to commit to the function f on a larger domain. The choice of rate parameter will depend on the implementation but, for example, Vitalik Buterin uses a domain of size 1 billion for a computation of size N equal to 1 million in his introduction to STARKS. This corresponds to ρ = 0.001.
An example Merkle tree commitment is shown in figure 2. Bob computes the values of f on the whole domain S which, here, consists of the integers 1 through 16. This is just for ease of displaying in a figure. In practice, it can require computing values at millions or billions of points. A Merkle tree is constructed, by computing the hash of each value. Pairs of hashes are concatenated and hashed again, continuing iteratively until there is just a single hash, the Merkle root. Bob sends the root value h to Alice which, if the SHA256 hashing algorithm is used, just consists of a single 256 bit value.
When Alice requests the value of f at a point in the domain, Bob sends the value along with its Merkle proof. In the example shown in figure 2, Alice has requested the value of f(6), and Bob returns this together with the proof consisting of hashes h_{00}, h_{011}, h_{0100}. By the standard algorithm for Merkle trees, Alice can use this proof to verify that f(6) is equal to that previously committed to by Bob. I note that this does add some additional computation for Alice, but it is bounded by the length O(logM) of the Merkle proof. Now that has been explained, we do not need to think about Merkle trees in the remainder of the post. We simply state that Bob commits to a polynomial f on a domain S, and Alice can request its values (has oracle access) at any points she wishes. The use of Merkle trees is implicit.
Since Alice’s oracle access to the function f only allows her to query its values at individual points, there is no way that she will be able to prove anything about every point. The best that she can do is to sample a small number of values, and determine statistical properties of f. For example, she cannot say if every value of f is equal to any specified polynomial, although she could generate statistical bounds on the proportion of its values that agree with the polynomial. For this reason, we do not attempt to prove that f exactly coincides with a polynomial but, rather, agrees with a polynomial evaluation at a large majority of points. Recall, from above, that FRI algorithms are Fast Reed–Solomon Interactive Oracle Proof of Proximity. We do not attempt to prove that f is exactly a polynomial but, rather, that it is approximately one.
The relevant metric here is the relative Hamming distance. Given two functions f and g with the same domain S, the relative Hamming distance between them is the proportion of points of S at which they disagree,
Suppose that we are given two functions f and g which are supposed to be equal, and want to verify this. A straightforward algorithm is to sample them both at a number n randomly chosen points, and compare. We declare that they are equal if they agree on the sample. Clearly, this will be the correct answer if they do agree. On the other hand, if they differ by at least ε in the relative Hamming metric, then the probability that we incorrectly declare them to be equal is no more than (1  ε)^{n}. For a given tolerance ε, we need to ensure that n is sufficiently large that this bound is negligible. If the functions differ at less that a fraction ε of the points, then it doesn’t matter if we correctly distinguish them or not. They are close enough for the purposes of the algorithm. A typical value for the tolerance ε is 0.1.
The notation RS[F, S, ρ] is used for the Reed–Solomon codes of rate parameter ρ. That is, the functions from S to F which are given by a polynomial of degree less than ρS. So, the aim of the FRI algorithm is to determine that Bob’s commitment on domain S is within a small distance ε of RS[F, S, ρ]. As previously discussed, two distinct polynomials of degree less than N can only agree at fewer than N points. Equivalently, two distinct elements f and g of RS[F, S, ρ] must be at least a distance
apart. This is equivalent to (2) above. So long as ρ + 2ε ≤ 1, it follows that a function f can only be within a distance ε of at most a single code in RS[F, S, ρ].
Fortunately, it is not really necessary to know that Bob’s commitment is exactly equal to a polynomial of degree less than N = ρS. It is enough to know that it is within a small distance ε of one. All that happens, is that bounds such as (2) are slightly modified. If f and g are functions on domain S within a relative distance ε of distinct elements of RS[F, S, ρ], their distance is bounded by
Hence, if we sample them at a randomly chosen point x of the domain then, the probability that they agree is,
Polynomials’ greatest strength is also their greatest weakness. Ok, that may be a rather dramatic way of stating it, but this is the situation here. The important property of polynomials allowing them to be used for representing a calculation as described in the previous post on STARKS, is that they can be used to interpolate an arbitrary set of values. However, the fact that they can be used to interpolate a sequence of values means that it is difficult to distinguish them from an arbitrary function. If we sample the function f at N or fewer points, then there is always a polynomial interpolant of degree less than N agreeing with f on this sample. So, it is impossible to distinguish any function from such a low degree polynomial.
Possibly, you might think that there is some statistical way of distinguishing polynomials from arbitrary functions. Since Bob commits to the function f upfront, you might think that the statistical properties of a fixed polynomial at a randomly chosen set of points differs in some way from an arbitrary function. This is still not case. Suppose, for example, that f is initially drawn uniformly at random from the set of all polynomials of degree less than N. This means that its N coefficients, as in (1) are chosen independently and uniformly at random from the field F. Equivalently, each of the q^{N} polynomials corresponding to each possible sequence of coefficients occurs with equal probability q^{–N}. Then, the values of f at a sequence of N or fewer points will be independent and uniformly distributed on F. This situation is indistinguishable from that where the values of f at each point of the domain is chosen independently and uniformly in F.
Theorem 1 Let x_{0}, x_{1}, …, x_{n} be a distinct sequence of points of the field F for n < N. If f is a polynomial of degree less than N chosen uniformly at random, then the values
are independent and uniformly distributed on F.
Theorem 1 is straightforward to prove. Consider the number of polynomials of degree less than N and taking values y_{i} at points x_{i}. If f_{1} is any one such polynomial, then any other one can be written as
where g is a polynomial of degree less than N and vanishing at the points x_{i}. Since the number of choices for g does not depend on the values y_{i}, every possible sequence of values has the same probability of occuring.
These considerations show that it is impossible to test whether Bob’s committed function is a low degree polynomial by sampling it at N or fewer points. All is not lost, but it does show that we need Bob to provide more information than just the values of f at Alice’s chosen points.
A common algorithmic technique is divideandconquer. This breaks a complex problem down into two or more smaller problems. By iterating the procedure, it is eventually broken down into a number of simple calculations, giving an efficient overall approach to the original problem. This is used, for example, by fast Fourier transforms (FFT) which reduce the complexity of computing a discrete Fourier transform from O(N^{2}) to O(NlogN), a significant improvement.
We do something similar. The polynomial f of degree less than N given by (1), can be broken down into two polynomials of degree less than N/2. For example, choosing a positive integer M < N, f can be broken up as
(3) 
where u and v polynomials
of degrees less than M and N – N respectively. Choosing M equal to N/2 rounded up to the nearest integer, both u and v have degree less than N/2. Alice can concentrate on showing that they indeed have degree less than N/2, and then that f is equal to u + Xv.
There are many ways of breaking f into lower degree polynomials besides (3). We can, alternatively, split it into odd and even powers of X,
(4) 
where u and v are given by
Again, the degrees of u and v are both less than N/2. Decomposing f using (4) has several advantages over alternatives such as (3). For one thing, we can express u and v without explicit reference to the polynomial coefficients,
(5) 
These identities require dividing by 2, so is only possible in fields where this can be done. The field F of size p^{r} has characteristic p, meaning that we can divide by any integer which is not a multiple of p. Hence, (5) can be applied so long as F is not a binary field, of size 2^{r}. For now, we suppose that this is the case, so that (5) applies, and will deal with the characteristic 2 case in a moment. So long as for each x in the domain S, –x is also in the domain, then Alice can use (5) to compute u and v herself.
Another advantage of using (5) is that u and v are not defined on the domain S but, rather, on
If, again, we suppose that –x is in the domain S whenever x is, then the squaring map is 2to1 implying that S_{1} has half the size of S. This reduces the complexity of the algorithm.
So far, we have reduced the problem of showing that a function has degree less than N, to showing that two separate functions have degree less than N/2. You may well ask, so what? Is it really any simpler than where we started? This is where the next trick comes in. We can combine u and v by taking a random linear combination, and it is sufficient to show that this single function has degree less than N/2.
For a uniformly random field value α, if u + αv has degree less than some value N, then so do both of u and v.
While this statement is not exactly true, it has negligible probability of giving an incorrect result. This is because, if the degrees of u and v were not both within the stated bound, then there would be at most a single field value α for which the degree of u + αv satisfies the bound. This has a probability 1/q of happening, which is negligible so long as the field F is sufficiently large. More precisely, we need to know that if u and v are not both close to a low degree polynomial, then neither is u + αv. This is because Alice does not check the identities at every point of the domain but, rather, only at a random sample, allowing her to check closeness in the relative Hamming metric instead of exact identity. For more details on such estimates, see the 2017 paper by Eli BenSasson, Iddo Bentov, Ynon Horesh, and Michael Riabzev, specifically lemmas 4.3 and 4.4. For now, we note that it is sufficient to prove that u + αv has sufficiently low degree.
We reduced the problem of showing that a function f defined on S has degree bounded by N to showing that the function f_{1} = u + αv defined on S_{1} has degree bounded by N/2. This method of breaking the problem into multiple smaller problems, which are combined into a single smaller problem, is known as splitandfold. We can iterate this procedure until it reaches the point that we just need so show that a function f_{n} is constant on its domain S_{n}, as shown in figure 3 below.
The algorithm starts with the commit phase. Initially, Bob commits to a function f on domain S. Writing f_{0} = f and S_{0} = S, the following steps are performed in order i = 0, 1, 2, ….
This continues until Bob declares that f_{m} is equal to a constant value c. Then, in the query phase, Alice does the following for each 0 ≤ i < m.
(6) 
(7) 
This is the complete algorithm and, if Alice successfully verifies (7) for all of the points for each i, she declares that f is a low degree polynomial. A few points are in order though. If preferred, equations (6) and (7) can be combined, so that Alice needs to check,
For the special value i = m – 1, she does not need to sample f_{i} since she can just use the constant value c. The procedure just described is assuming that –x is in the domain S_{i} whenever x is. Since S_{i} consists of 2^{i} th powers of S, this requires S to be closed under multiplying by 2^{i} th roots of 1. Taking i = m – 1, this means that it is closed under multiplying by 2^{m} th roots of unity. In particular, the field F of size q must contain all such roots of unity or, equivalently, q – 1 must be a multiple of 2^{m}. This is not absolutely necessary, since Alice could instead ask Bob to commit to the functions u_{i} and v_{i}, and verify that they satisfy
(8) 
instead of computing them herself using (6). For efficiency though, it is preferable if F contains 2^{m} th roots of unity and the procedure above can be used.
Let us also consider the degrees of f_{i}. By construction, f_{m} is constant, so has degree less than 1. If f_{i + 1} has degree less than a value L and (7) is satisfied on the domain, then u_{i} and v_{i} also have degree less than L. So, (8) shows that f_{i} has degree less than 2L. Hence, f_{m  1} has degree less than 2, f_{m  2} has degree less than 4, and so on, until we get to f = f_{0} which as degree less than 2^{m}. This is exactly as desired in the case when N is a power of 2 and, even if it isn’t, it is usually sufficient to bound the degree by such a power. Alternatively, it is possible to modify the procedure slightly to handle non powers of 2, and I describe one method of doing this at the bottom of the post.
Recall that Alice does not verify (7) at every point of the domain but, instead, at some random sample of n points. This means that she does not really verify that f has degree bounded by N. Suppose we want to be sure that it is within a tolerance ε of such a polynomial in the relative Hamming metric, meaning that it is equal to the polynomial everywhere outside of a proportion ε of the domain. Then, n should be chosen large enough that the probability bound (1  ε)^{n} is negligible.
Finally, we can look at the complexity of the algorithm, for a fixed field F and tolerance ε. Since Bob has to commit to a polynomial on each of the domains S_{i} of size 2^{–i}S, this amounts to committing to a total number of evaluations,
This is only linear in the size of the original domain, so is quite efficient. Alice just requires sampling the functions at a fixed number 2n points of each of the domains S_{i}, so her work is bounded by 2nm, which is of the order O(logN) so, again, is quite efficient and, in particular, is orders of magnitude lower than the original calculation performed by Bob.
The outline of the FRI algorithm above makes use of the squaring map X^{2} to break the problem into two simpler ones. More general polynomial maps can be used in place of X^{2}, as we will see shortly. I only consider quadratic maps in this post, which halves the degree of f at each step. It is possible, though, to use maps of any degree d instead, which reduces the degree of f by a factor of d, although this requires splitting it into d terms instead of 2 at each step.
The algorithm above, based on squaring the elements of the domain, has some issues when the field F is binary of size 2^{r}. This is because it has characteristic 2, meaning that adding an element of the field to itself (i.e., multiply by 2) always gives zero. Hence, it is not possible to divide by 2, and Alice cannot apply (6) to compute the values of u_{i} and v_{i}. In addition, the fact that –x is equal to x everywhere means that the squaring map X^{2} is onetoone rather than twotoone. Hence, the domains S_{i} would all be the same size as S, increasing the complexity.
The method can be rescued for binary fields by using maps
in place of just squaring, where a_{i} is a nonzero field element chosen such that S_{i} is closed under adding a_{i}.
As above, the polynomial f_{i} of degree less than M can be broken down as
(9) 
where u_{i} and v_{i} are polynomials of degree less than M/2. It can be seen that, in characteristic 2, θ_{i}(x + a_{i}) = θ_{i}(x) for all points in the domain, so θ_{i} is twotoone. Alice can solve (9) at any point x by,
(10) 
We can ensure that there exist elements a_{i} with the required property by choosing the initial domain S to be an additive subspace of the field (i.e., closed under addition). Then, the same will be true of each of the domains S_{i}, and a_{i} can be taken as any nonzero element of this. The algorithm is then the same as in the previous section, with the modifications described here.
Recall from the earlier post, that if Bob performs a calculation consisting of N steps, then the execution trace is represented by polynomials of degree less than N. Alice asks for the values of these polynomials, at random points and checks that they satisfy certain identities which encode the calculation transition function as well as the initial and final states. This requires knowing that the values really do come from polynomials, in order to extrapolate the identities from the random sample chosen by Alice to the entire domain. Suppose that they are defined on a domain S in the field F, and Alice compares two polynomials of degree less than ρS. If both sides of the identity are indeed equal, then she correctly verifies this. If they are different, then the probability of them being the same at a random point is bounded by ρ. By sampling at n random points, the probability of incorrectly verifying the identity can be reduced to ρ^{n}. So, to ensure that they are indeed polynomials, Alice can use an FRI algorithm as described above to bound their degree, and then verify the polynomial identities. She is, therefore, able to verify Bob’s calculation while keeping the probability of giving an incorrect positive verification below any threshold value.
There are, however, some remaining issues to consider. Even if the polynomials themselves all have degree bounded by the computation length N, the identities involve multiplying them together, increasing the degree and decreasing efficiency. Furthermore, we can remove some of the steps of the algorithm. As an example, let us consider a sequence of N Boolean values represented by a polynomial f of degree less than N. The fact that it only takes values 0 and 1 in the calculation is represented by the identity
(11) 
where Z is a prespecified polynomial of degree N vanishing at all field points used to represent the execution trace. The term p is referred to as the quotient polynomial, and also has degree bounded by N. I will suppose that, for the purposes of this discussion, Alice is able to compute Z herself. Then, according the outlined argument above, she could separately verify that f and p have degree bounded by N and then verify that (11) holds at random points. Note, however, that the both sides of (11) can have degree anything up to 2N – 2. In fact, once it is verified that f is a low degree polynomial, it is not necessary for Alice to separately check that p is low degree and that (11) holds. As p is not used anywhere else, instead of asking Bob to commit to it, she can define p by
(12) 
So, all that remains is to check that this is indeed a low degree polynomial using the FRI algorithm without separately checking (11).
This procedure can be made slightly more efficient still. Recall that, for the polynomials interpolating the execution trace, there are two kinds of polynomial relations to be checked. First, there are those which enforce properties of the entire execution trace, such as enforcing the transition function and any further fixed properties, such as representing a Boolean variable. Second, there are those enforcing the initial and final condition, which are of the form
(13) 
where Z_{0} is a low degree polynomial vanishing only at the initial and final times, Z_{0} = (X  x_{0})(X  x_{N1}), and u is a similarly low degree polynomial giving the correct initial and final values. Using the ideas outlined above, this would be rearranged as
and we show that the quotient polynomial q is low degree. Alternatively, the procedure can be made a bit more efficient by using (13) to eliminate f entirely. Instead, the prover Bob just provides his commitment to q, which needs to be shown to be of low degree. Equation (13) is used to replace occurrences of f in expressions such as (12) with q.
As a final point, note that it will generally be required for Alice to verify that several polynomials all have low degree. This could be done by separately applying the FRI algorithm to each of them. Alternatively, they can be combined in much the same way as described for the ‘splitandfold’ method above. If Alice wants to show that polynomials f_{0}, f_{1}, …, f_{n} all have low degree, then she can form a random linear combination,
Here, α_{1}, …, α_{n} are independent uniformly random points of the field. It is only necessary to show that this combination g is low degree to imply, up to a negligible probability, that each of the terms f_{i} are low degree.
Here, ‘low degree’ means that it is a polynomial of degree less than that ensured by the FRI algorithm which, as described above, will generally be a power of two, 2^{m}. This can be generalized to prove that each f_{i} has degree less than specified value N_{i} ≤ 2^{m}. We just note that this is equivalent to f_{i} and X^{2m  Ni}f_{i} both being polynomials of degree less than 2^{m}. So, using the same ideas as above, we just need to choose a random field value β and apply the FRI algorithm to
Combining these ideas, in order to show that each f_{i} has degree less than N_{i}, we pick random field values α_{i}, β_{i} and apply the FRI algorithm once to the combination
Hence, no matter how many polynomials are used to represent Bob’s calculation, only a single FRI proof is required in a STARK algorithm. We could even combine multiple STARK proofs, which would give the significant efficiency of being able to combine their FRI proofs into one.
]]>Suppose that Alice and Bob each have a data file of length N bits, and they want to check that they are identical. Of course, Bob could just transmit his information to Alice, and she can confirm if they are indeed the same. However, these are large files, and communication is slow. Instead, Bob could send only the bits from a randomly chosen set of places in his file. Alice can check that, at least, their data agrees in these locations. If a significant proportion of their data bits are different, then this has a high probability catching the discrepancy. However, it is possible that they differ at only one place, in which case Bob would need to send all or almost all of the data to have a good chance of catching this.
How can they do this is an efficient manner? How many bits does Bob have to send to Alice in order for her to have a high probability of telling whether or not their data agrees for every bit. At first thought, you may think that he needs to send all or almost all of it, so about N bits. On the other hand, if you are familiar with hash functions, then you would probably answer that he could compute something like the SHA256 hash and send this instead, so only 256 bits need to be transmitted. While that is true, there is also an efficient approach using polynomials, where we can prove its efficacy with some simple maths and without assuming any of the properties of hash functions. Also, as we will see, polynomials allow us to do so much more than just comparing bits.
The idea is that, if two polynomials differ even slightly for one coefficient, then they evaluate to different values almost everywhere. Bob constructs the polynomial whose coefficients are the bits from his data,
(1) 
Fixing a prime p, he chooses a random integer 0 ≤ x < p, and sends it to Alice along with the polynomial value f(x), taking the result modulo p. This is 2log_{2}p binary bits of data. Alice also forms the polynomial with coefficients from her data, evaluates it at x, and compares the result with Bob’s value f(x). If they agree (modulo p) then she declares that her data is the same as Bob’s, otherwise they are different.
The point is, two distinct polynomials of degree less than N can only agree at fewer than N points. If Alice and Bob’s data are not identical, then the probability of them getting the same values for their polynomial value (modulo p) is less than N/p. Let us fix a tiny value ε. If the prime p is chosen greater than N/ε then the probability of Alice giving an incorrect statement is less than ε. Taking p close to N/ε, the number of bits sent to Alice is
This is logarithmic in N, which is much less than sending all of the bits. By sending polynomial values rather than the original raw data, the number of bits that Alice needs to transmit has been drastically reduced. This idea is also used in Reed–Soloman errorcorrecting codes.
Using polynomials also enables us to represent various relationships between the data bits, unlike hash functions which just completely scramble it. For example, maybe Bob’s data is not arbitrary, but consists of a sequence of calculations, with each term depending on the previous ones. In this situation, Bob is fulfilling the role of a prover, who is attempting to demonstrate some knowledge or calculation results to the verifier, Alice, and the data represents the execution trace. The calculations are represented by some polynomial identities so that, by finding polynomials satisfying these identities, she demonstrates that the calculation has been carried out. He can prove this to Alice by sending her the values of the polynomials at a randomly chosen point. This is a key idea behind SNARKs (Succinct NonInteractive Argument of Knowledge) and STARKs (Scalable Transparent Argument of Knowledge).
Before proceeding, I will briefly recap the setup for the problem being addressed in this post. There are two parties, Bob (the prover) and Alice (the verifier). Bob performs a computation involving a large number N of calculation steps. Each calculation state consists of a fixed number of variables y_{0}, y_{1}, …, y_{m}, the values of which lie in some set F. Using y_{i,j} for the value of variable j at execution time i, the calculation state is y_{i} = (y_{i,0}, …, y_{i,m}). The initial state y_{0} is specified . Then, at each step, the state is updated according to a transition function y_{i + 1} = f(y_{i}). The calculation result is given by the final state y_{N  1}.
Bob sends the result to Alice. However, she wants to verify that the calculation was performed correctly, but does not have the resources to perform the full computation herself. Therefore, Bob needs to also send some proof that Alice can use to convince herself of its correctness.
It is required that Alice be able to perform a check of correctness in much less than N steps. We also require the data transferred between the two to be much less than size N. For example, Alice could request the states (y_{i}, y_{i + 1}) at a random sample of calculation times i and check that the transition function was applied correctly. At best, this will only give a statistical upper bound on the number of computation errors. Unfortunately, even a single error in the calculation completely invalidates the result. Alice needs to be able to verify every part of the calculation in much fewer than N steps. We really have no right to expect that this is possible, so it is quite amazing that it can be done at all.
In a bit more detail, the state variables lie in a finite field F. The execution trace of variable y_{j} is interpolated by polynomial g_{j}, so that y_{i,j} = g_{j}(i). The transition function is represented by multivariate polynomials p_{0}, p_{1}, …, p_{r} so that the calculation step y_{i + 1} = f(y_{i}) is equivalent to
This is known as the algebraic intermediate representation for the calculation. In terms of the interpolation polynomials,
This has to hold for all 0 ≤ i <N  1, but the equalities can be converted to identities which hold if i is replaced by any element of the field. Alice chooses some random points in the field and asks Bob for the corresponding polynomial values g. She just needs to verify the identities at these points, as well as check the initial and final states are as claimed. This supposes that Alice and Bob can transfer information in both directions, via a twoway communication channel.
Of course, Bob could be lying. What he claims are the polynomial values could be any numbers that he has just selected to satisfy the identities that Alice is checking. We cannot ask him to send the entire polynomial, since this would be the same as sending the entire calculation trace. For now, we will just trust him on this point but, for practical application, we will need a method of also verifying that he is really sending values consistent with a polynomial.
In blockchains, particularly those designed to represent general calculations (smart chains) such as Ethereum, we want to represent a calculation and its result on the chain. For anything beyond very simple calculations, this could take up a lot of valuable space on the blockchain and require nodes to spend a lot of computation validating it. Such methods of representing calculations can instead move the heavy work offchain, with the onchain data being much smaller and requiring much less computation to validate. The onchain information is just the result of the calculation together with a proof that this is indeed the correct result, rather than the full computation itself.
We can think of Bob as sending a transaction to the chain containing the calculation result and proof. Alice is a validator, who is verifying correctness of the blockchain. Communication is only oneway, from Bob to Alice, but this is outside of the scope of the current post. However, the ideas here can be built on by including some additional steps to eliminate both the trust in Bob that he is correctly computing polynomial values, remove any communication from Alice to Bob and also hide any private information used in the calculation. Then, we arrive at zkSTARKS and zkSNARKS.
Above, we assumed that Alice takes the results of her polynomial evaluation modulo a prime number p. This is exactly the same as evaluating it in the finite field of integers modulo p. Throughout the remainder of this post, I will assume that F is a finite field of size q = p^{r} for a fixed prime number p and positive integer exponent r. It is known that fields exist of such primepower sizes and, in the case q = p, it is just the integers modulo p. This case is enough to follow the discussion below, so it is only really necessary to understand modulop arithmetic rather than general finite fields.
I write n.x for the product of an integer n with field element x. For the field with p^{r} elements, p.x always evaluates to zero. That is, adding together p copies of the same element gives zero or, equivalently, the field has characteristic p. Then, two such products m.x and n.x for nonzero field element x are equal if and only if m and n are equal modulo p.
I will also optionally interpret an integer n as the field element n.1 given by adding together n copies of the field unit ‘1’. In this way, integer arithmetic interpreted in the field F is the same thing as modulo p arithmetic.
Sometimes, we will want to look at powers of a nonzero field element γ. Its order is the smallest nonnegative integer s satisfying γ^{s} = 1. Then, two powers γ^{i} and γ^{j} are equal if and only if i and j are equal modulo s. Multiplication of powers of γ corresponds to integer addition modulo s. The nonzero field elements under multiplication form a group of size q – 1 and, it is known to be cyclic. That is, there exists cyclic generators, which are elements of order q – 1. Any other nonzero field element can be expressed as a power of any such generator, and it also follows that there are elements of any order which is a factor of q – 1.
All polynomials will be assumed to have coefficients in F. The symbol X will be used for the indeterminate in polynomial expressions, as in (1) above. A polynomial can be written as f(X) or, more succinctly, as f with the indeterminate X being implicit. In the remainder of this post, I will state several simple results on polynomials which are very useful for the procedures considered here.
As in the argument above with Alice and Bob, the fact that distinct polynomials of degree N can agree at no more than N places provides a bound on the probability that they take the same value at a random point.
Theorem 1 Let f and g be distinct polynomials of degree no more than N. Then, the probability that they take the same value at a uniformly random point x of the field F is bounded by,
So long as N/q is negligible, we can therefore check for equality of the two polynomials by just comparing their values at a single randomly selected point.
This does raise a slight notational convention that I glossed over above. What does it even mean for two polynomials to be equal? Regarding them as functions, this could be taken to mean that they take the same values at all points. Or, it could mean that they have the same coefficients. For polynomials over an infinite field such as the rational or real numbers, these two statements are equivalent. Over finite fields, however, they are not. For example, all elements of the field F of size q satisfy the identity
which is Fermat’s little theorem. The polynomials X^{q} and X take the same values at all points, but do not have the same coefficients. In keeping with convention, I consider equal polynomials to, by definition, have the same coefficients. So, X^{q} ≠ X. For polynomials of degree less than q, the distinction is moot, since equality of coefficients is the same as evaluating to the same values everywhere.
In the situation at the top of the post Bob used the bits of his data as the coefficients of his polynomial. Instead, he could have used them as the values when evaluated at certain preselected points. Polynomial values can be used to represent an arbitrary sequence y_{0}, y_{1}, …, y_{N  1} of elements of the field F of length N ≤ p. In fact, there exists a unique polynomial f of degree less than N and satisfying
(2) 
for each i = 0, 1, …, N – 1.
An explicit construction of the polynomial in (2) can be given by Langrange’s method. Consider representing the sequence at an arbitrary distinct set of points x_{0}, x_{1}, …, x_{N  1} of the field F. For each 0 ≤ i < N, we start by constructing a polynomial which is nonzero at x_{i} but is zero at each x_{j} for j ≠ i,
Now, normalize this so that it takes the value 1 at x_{i},
These are degree N – 1 polynomials satisfying,
(3) 
Taking linear combinations of these Lagrange polynomials allows us to represent an arbitrary sequence.
Theorem 2 Let x_{0}, x_{1}, …, x_{N  1} be a distinct sequence of elements the field F. Then, for each sequence y_{0}, y_{1}, …, y_{N  1} in F, there exists a unique polynomial f of degree less than N satisfying
(4) for each i = 0, 1, …, N – 1. This can be written explicitly as,
(5)
From the values (3) of the Lagrange polynomials, it is immediate that the polynomial defined by (5) satisfies f(x_{i}) = y_{i}. Also, as ℓ_{i} have degree N – 1, it follows that f has degree less than N. As distinct polynomials of degree less than N cannot coincide at N points, it is the unique such polynomial satisfying (4).
Equation (2) is just an application of theorem 2 for the points x_{i} = i, and the condition N ≤ p is only required so that these points are all distinct modulo p.
There is one issue with representing the sequence y_{i} at the linearly spaced points x_{i} = i. Specifically, we have the restriction that the maximum length N of the sequence is bounded by the characteristic p of the field. Otherwise, the points x_{i} would not all be distinct when interpreted in F. This may not be much of a problem when the field is the integers modulo a large prime p, but sometimes we might want to work in a field of size q = p^{r} for a small prime p raised to a relatively large power r. In fact, poweroftwo field sizes are often used, which would make (2) almost useless since it couldn’t handle a sequence longer than two. So, instead, we can use geometrically spaced points x_{i} = γ^{i}, for a nonzero field element γ of some integer order s. As mentioned above, this order can be chosen to be the size, q – 1 of the full set of nonzero field elements. Alternatively, it can be any factor of this. The points x_{i} are pairwise distinct if and only if N is no greater than s, allowing us to represent any sequence up to length q – 1. Applying theorem 2 gives unique polynomial f of degree less than N satisfying
(6) 
for each i = 0, 1, …, N – 1. For the reasons just explained, as well as some efficiency considerations, using the interpolating polynomial (6) is often preferred over (2).
For any subset S of the field F, I will use Z_{S} to denote the lowest order nontrivial polynomial vanishing on S,
A polynomial f vanishes at a point x if and only if it has the factor X – x. Applying this for all points x in S shows that f vanishes on S if and only if it has Z_{S} as a factor or, equivalently, there exists a polynomial g such that,
Next, suppose that we have two polynomials f and g, and want to verify that they take the same values on the set S. We could of course, just evaluate them both at each of these points and check. It would be preferable to instead have a polynomial identity which, as in the situation at the top of the post, can be validated by checking at a single random point. Equality f = g is clearly sufficient, but not necessary, since it is possible that they may not be equal at points outside of the given set. Instead, using the fact that their difference f – g must vanish on S and, hence, have Z_{S} as a factor, we obtain the following.
Theorem 3 Let S be a subset of the field F and f,g be polynomials. Then, f(x) = g(x) for each x in S if and only if there exists a polynomial h satisfying,
(7)
So, if Bob wants to prove to Alice that f and g are equal on S, he can achieve this through the quotient polynomial h. Then, if they satisfy (7) at a random point, Alice can conclude that they do indeed coincide on S.
As an example application of theorem 3, suppose that Bob has a Boolean sequence, interpolated by a polynomial f, each element of which can only take the values 0 or 1. How can he prove that is indeed Boolean? It is a straightforward fact that a field element y is equal to 0 or 1 if and only if y^{2} = y.
Example 1 (Boolean function) A polynomial f takes only values 0 and 1 on the set S if and only if there exists a polynomial g satisfying,
Next, suppose that Bob claims that his polynomial f takes some small number of stated values y_{i} at a sequence of points x_{i}. He may, for example, be stating that his execution trace has the required initial and final values. Without evaluating it these specific points, how can this be verified? Theorem 2 allows us to construct a polynomial g of low degree with the required property and, then, we apply theorem 3.
Corollary 4 Let S = {x_{0}, x_{1}, …, x_{N  1}} be a distinct sequence of elements of the field F, and y_{0}, y_{1}, …, y_{N  1} be any elements of F. Using theorem 2 choose polynomial g to satisfy g(x_{i}) = y_{i}.
Then a polynomial f satisfies f(x_{i}) = y_{i} for each 0 ≤ i < N if and only if there exists a polynomial h satisfying,
We need a polynomial which vanishes at a large number of consecutive points in order to make statements such as that the calculation is consistent with the state transition function at every single step. Specifically, for a nonnegative integer N ≤ p,
vanishes at each 0 ≤ i< N. As, typically, N will be a large number, it would be very inefficient to require the verifier to have to compute the values of Z_{N} by multiplying together each of the factors. This could potentially be millions of terms. Fortunately, for the special value N = p, it reduces to an especially simple form which can be readily evaluated.
Theorem 5
(8)
The proof of equation (8) follows from the observation that both sides are degree p polynomials whose roots are the integers 0 ≤ i < p (using Fermat’s little theorem for the right hand side).
Even in the case where N is strictly less than p, theorem 5 can still be useful since it allows us to reexpress Z_{N} as
This has p – N terms in the denominator, so can give an efficient method of evaluation if p is greater than, but close to, N.
An alternative approach is to make use of the simple identity,
(9) 
This follows straight from the definition but, the important point here, is that this identity is sufficient to characterize Z_{N}. The idea is that the prover Bob can do the work of evaluating Z_{N} and send the result to Alice. She just needs to check the identity above, which is sufficient to guarantee that Bob’s polynomial is, at least, a multiple of Z_{N}.
Theorem 6 If a polynomial Z satisfies the identity
(10) then it is a multiple of Z_{N}.
Theorem 6 can be proven by induction on N. The case N = 0 is immediate since, here, Z_{N} is constant. For positive N, we assume the induction hypothesis that the result holds with N replaced by N – 1. Expressing identity (10) as
it follows that X – N + 1 is a factor of Z,
Substituting into (10) gives the same identity for Z′, but with N replaced by N – 1. So, by the induction hypothesis, is a multiple of Z_{N  1}. Then, Z is a multiple of (X – N + 1)Z_{N  1} = Z_{N}, completing the proof.
As discussed above, it can be convenient to represent a sequence at points γ^{i} for a nonzero field element γ of order s. With that in mind, for each nonnegative integer N ≤ s, lets also define the polynomial
which vanishes at each γ^{i} for 0 ≤ i< N. As above, this has a particularly simple form for the special case N = s which significantly reduces the calculation cost.
Theorem 7
(11)
The proof of equation (11) follows from observing that both sides are polynomials of degree s vanishing at the s distinct points γ^{i}. Even in the case where N is strictly less than s, equation (11) can be useful since it allows us to reexpress Z̃_{N} as,
The denominator here has s – N terms so, choosing the field element γ to have order close to, but greater than or equal to, the number N of calculation steps can significantly reduce the computation required to verify the proof.
Alternatively, it can be checked that Z̃_{N} satisfies the identity
(12) 
As already explained for Z_{N}, Bob could compute the values of Z̃_{N} and send the result to Alice, who verifies that it satisfies this identity. We have the analogous result to theorem 6.
Theorem 8 If a polynomial Z satisfies the identity
then it is a multiple of Z̃_{N}.
This result can be proven inductively just as for the proof of theorem 6 above, so I do not give it here.
For a simple example applying the results above, consider the Fibonacci Sequence
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, … 
It is defined by the initial conditions y_{0} = 0, y_{1} = 1 and the transition function y_{n + 2} = y_{n + 1} + y_{n}. We suppose that Bob claims that the Nth Fibonacci number y_{N} is equal to some value a (modulo a prime p ≥ N), for a large value of N. Although it may not be the best use of the ideas described here, since there are already fast methods of computing terms in the Fibonacci sequence, let us show how it can be verified.
Bob uses theorem 2 to construct a polynomial of degree no more than N satisfying f(i) = y_{i} over 0 ≤ i ≤ N. Theorem 3 gives a polynomial g satisfying
(13) 
By inspection, it can be seen that g is of degree at most 1. We also construct a polynomial satisfying the initial conditions u(0) = 0, u(1) = 1 and the claimed final state u(N) = a,
Theorem 3 gives a polynomial h satisfying
(14) 
Again, by inspection, it can be seen that h has degree no more than N – 3 which is large, so we would not expect Alice to evaluate it.
Now, Alice can ask Bob for the values of f, h and possibly Z<_{N  1} at a random point of the field and verifies identities (13) and (14). By theorem 1, this validates the claimed result with a probability of error of no more than N/p. By repeating this process a number n times, this probability can be reduced to (N/p)^{n} if necessary.
Alternatively, by representing the calculation at field points γ^{i} for nonzero field element γ of order s > N, identities (13) and (14) are replaced by,
where u is the polynomial of degree 2 or less satisfying u(1) = 0, u(γ) = 1 and u(γ^{N}) = a,
Note that this example does not exactly fit into the exact framework described in the setup described above, since the calculation state y_{i} depends on both y_{i  1} and y_{i  2}, not just the previous state. This is not important, and it can easily be written in the setup above by using pairs of consecutive Fibonacci numbers as the calculation state, but this just complicates things.
Consider the sequence y_{i} given by the initial condition y_{0} = 0 and transition function y_{n + 1} = y_{n}^{2} + n.
0, 0, 1, 3, 12, 148, 21909, 480004287, … 
Bob claims that the Nth value of this series, y_{N}, is equal to some value a modulo a large prime p > N. Using theorem 2 he constructs an interpolating polynomial f(i) = y_{i} (mod p) for 0 ≤ i ≤ N. By theorem 3, there exists quotient polynomials g and h such that
The first of these enforces the transition function y_{n + 1} = y_{n}^{2} + n, and the second enforces the initial and final conditions f(0) = 0, f(N) = a.
All Alice needs to do is ask for the values of of f, g and h at some random points and verify that the identities hold.
The Collatz sequence with starting value a positive integer y_{0} is defined by the transition function
(15) 
For example, starting at 12, the sequence is
12, 6, 3, 10, 5, 16, 8, 4, 2, 1, 4, 2, 1, 4, 2, 1, … 
The unsolved Collatz conjecture states that for any starting value, this sequence will always hit 1 eventually. We suppose that Bob computes the sequence starting from 97, and sees that y_{118} = 1. He wants to prove this to Alice, using the method outlined above. How can this be done?
97, 292, 146, 73, 220, 110, …, 5, 16, 8, 4, 2, 1 
First, it should be noted that 118 steps is not many, but it is just for illustration and the idea extends to much longer calculations. The first issue in creating an algebraic intermediate representation is that (15) depends on whether y_{i} is even or odd. One solution is to represent the binary digits at each stage of the computation, so that we can just check the least significant bit to determine if it is even.
Bob computes the terms of the sequence, and notes that its maximum value is 9232, which can be represented by a 14 digit binary number. Hence, the execution state will be represented by 14 binary digits.
We work modulo a prime p ≥ 2^{16} which is large enough to represent 3y + 1 for all 14 bit binary numbers y. Using theorem 2, Bob creates 14 polynomials f_{j} for 0 ≤ i ≤ 13 such that f_{j}(i) is the jth binary digit of y_{i}. As in example 1 above, the fact that these take the values 0 and 1 is expressed by the identity
(16) 
for some quotient polynomials p_{j}. While we could work with the binary digits, it will be convenient here to also introduce a polynomial g representing the numbers in the Collatz sequence themselves. Since g(i) = y_{i} is given by summing 2^{j}f_{j}(i), theorem 3 gives the identity
(17) 
for quotient polynomial q_{1}. Next, the transition function (15) is given by the identity
(18) 
for a quotient polynomial q_{2}. Note that if y_{i} is even then f_{0}(i) = 0, so (18) reduces to 2g(i + 1) = g(i). If, on the other hand, y_{i} is odd then it reduces to g(i + 1) = 3g(i) + 1. The initial condition g(0) = 97 and final state g(118) = 1 give the identity
(19) 
To verify Bob’s statement that y_{118} = 1, Alice just needs to verify that identities (16,17,18,19) hold at some randomly selected values for x.
Note I: The example of a Collatz sequence shows how we can represent a state as a set of binary digits, and update it by a transition function. All of the standard logic gates can be represented algebraically with, for example, a NOR gate given by
It follows that any Turing machine can be simulated using these methods. The difficulty is doing it in an efficient way.
Note II: Above, we used a binary expansion to represent the computation state, which was helpful to be able to algebraically express whether the number is odd or even at each step. Really, though, the important effect of using a binary representation is to bound the terms in the sequence. Since we used 14 binary digits, each term was guaranteed to be less than 2^{14}. Algebraically expressing whether an integer y is odd or even is straightforward. There will always exist unique integer values u and v satisfying
(20) 
Then, u is equal to 0 if y is even and 1 if it is odd. However, modulo an odd prime p, (20) always have two solutions, one with u equal to 0 and one with it equal to 1, so it does not help. For 0 ≤ y < p – 1, equation (20) does have a unique solution satisfying the bound 0 ≤ v < p/2 – 1, and we can use the value of u to determine if y is even or odd. However, how can we restrict to this solution? We need to be able to express the bound on v algebraically. One way to bound v is to express its binary expansion with a fixed number of digits, restricting its range of possible values. If another efficient method of expressing such a bound via algebraic equalities exists, then it could be used to remove the need for the binary expansion above.
In the scenarios above, the prover Bob has an execution trace of length some large number N. These are sequences of length N, for which he needs to construct interpolation polynomials. If he does this by naively applying equation (5) of theorem 2 as it is, then it would be very inefficient. Each term ℓ_{i}(X) is a product of n terms, so directly computing these for each value of i would take a time of order O(N^{2}), which could be infeasible. Since the sequence is of length N, Bob must take at least order O(N) computing time, but going all the way up to N^{2} is a bit much.
We start with a set of points S = {x_{0}, x_{1}, …, x_{N  1}} and sequence y_{0}, y_{1}, …, y_{N  1}, all of which are elements of F. Equation 4 for the interpolation polynomial can be rearranged as,
(21) 
where,
(22) 
The overall multiplicative factor Z_{S}(X) in (21) is a product of N terms, which only needs to be performed once, but we still have the issue that each of the individual ℓ_{i}^{0}(x_{i}) is a product of O(N) terms. Fortunately, it can be simplified for the cases of interest.
In the case where x_{i} = i, (22) simplifies to,
(23) 
The factorials from 0! up to (N  1)! can be computed and stored with a single iteration of length N, so (23) enables all of the ℓ_{i}^{0}(x_{i}) terms to be computed at once in O(N) time. Putting these back into (23) gives the value of the polynomial interpolant f(x) for any point x in O(N) time.
Alternatively, if x_{i} = γ^{i}, then (22) gives the iterative equation,
This again enables all of the ℓ_{i}^{0}(x_{i}) terms to be computed in an iteration of length N, giving f(x) in O(N) time.
The full algorithm used by STARKs actually require the prover to initially compute the values of f at a large number M ≥ N of points. If we were to apply the above O(N) algorithm at all of these, the total algorithm would be of order MN, which is at least N^{2}. Consider the case where x_{i} = i and we want to evaluate f at N consecutive points x + j over 0 ≤ j < N. Putting this into (21) gives,
The Z_{N} multiplying factors can be computed by an iteration of length N using the recursive formula (9), so takes time O(N). The summation can be expressed as a convolution,
There exist convolution algorithms of order O(NlogN), such as by the use of fast Fourier transforms (FFT). This allows f to be computed on the entire sequence of N points with a complexity O(NlogN).
So, if Bob is required to compute the values of f at a number M ≥ N of points then, so long as they can be split into M/N sets of N consecutive values, the entire calculation can be done in time O(MlogN).
The case with x_{i} = γ^{i} can be handled in a similar way. Using (21) to compute f at a sequence of N points xγ^{j} over 0 ≤ j < N,
Again, the Z̃_{N} factors can be computed using iterative formula (12) and the sum is a convolution,
So, the values of f on the entire sequence of length N can be computed in O(NlogN). Therefore, if a set of M ≥ N points can be arranged into M/N sequences of the form xγ^{i}, then f can be computed on the entire set in time O(MlogN).
I note that there are various different ways in which the polynomials can be evaluated while still leading to an O(MlogN) algorithm. For example, if we have a polynomial of degree less than N
then its values at the points xγ^{i} are,
If γ has order N, then this is just the Fourier transform of the coefficients c_{j}x^{j}. An inverse FFT can be used to back out the coefficients from its values at the points γ^{i}, followed by an FFT to obtain its values at the above points.
I include some useful links for further information on STARKS and the ideas discussed in this post.
I will discuss an interesting blockchain protocol — or consensus mechanism — in which blocks are constructed on one blockchain by transferring assets on an entirely separate one. This is used by Stacks, where bitcoin needs to be spent in order to add blocks to the Stacks chain. Benefits include recycling the considerable proofofwork of Bitcoin to secure additional chains, and can extend its functionality by introducing features such as smart contracts closely linked with Bitcoin.
As described in previous posts, decentralized cryptocurrencies such as Bitcoin require a protocol in order to regulate construction of the blockchain and to ensure immutability. Blocks of transactions are appended, one by one, to the end of the chain. The protocol helps decide who gets to assemble each block and receive the associated reward, as well as ensure immutability so that confirmed blocks remain unchanged in the chain for perpetuity. The focus of this post will be on the consensus mechanism itself, rather than any additional features of the blockchain in question such as support of smart contracts.
Most wellknown is the proofofwork (PoW) protocol used by Bitcoin as well as by many other leading cryptocurrencies. This requires miners to compete by expending computational power in order to gain the chance to create a block. Currently, the main competitor to proofofwork is proofofstake (PoS), which requires validators to lock up units of the underlying chain asset in order to be selected to create blocks. Examples include Cardano and Solana. Both kinds of consensus mechanism function by requiring the prospective blockbuilders to spend some resource in order to win the chance of building a block, and receiving a block reward paid onchain. These approaches gain their security from the idea that an attacker would need to gain control of more than half of the global resource in order to be able to control the network, known as a 51% attack.
For proofofwork, the resource in question is hash rate or computational work, which boils down to using sufficient energy. This is external to the blockchain since the energy exists independently of the blockchain. For proofofstake, the resource is the blockchain asset itself or, more precisely, its opportunity cost because it is only required to lock up the asset for a period of time. This is internal to the blockchain. Since the resource used for security itself depends on the security of the network, it can introduce some circularities or difficulties when trying to analyse properties of the blockchain such as immutability and possible attack vectors.
There is a third kind of protocol, which is the focus of this post. Specifically, a blockchain can be secured by requiring validators to spend a resource existing on a separate blockchain. Since this approach gains its security from a ‘base’ blockchain to which it refers, it makes sense to use what is considered the most secure and decentralized chain. Namely, Bitcoin. The idea is quite general, and other chains such as Ethereum could be used be used in exactly the same way. As of writing this article, there is one blockchain using such a consensus mechansim. This is Stacks, which gains its security by validators spending bitcoin in order to build Stacks blocks. As such, I will use this as the canonical example demonstrating the approach. The name ‘proofoftransfer’ (PoX) is used by the Stacks team. This makes sense, since validators are effectively transferring bitcoin in exchange for native Stacks coins when constructing blocks. The important point, though, is that they are spending a resource on the Bitcoin blockchain in order to build blocks on the Stacks one.
Figure 1 shows the idea graphically. The top row is the Bitcoin blockchain, consisting of a sequence of blocks, each containing a collection of transactions and a header referencing the previous block. Individual commitment transactions can refer to a Stacks block by containing its hash and, in the figure, are shown in green. These both transfer some bitcoin and anchor the Stacks block to a specific location in the Bitcoin blockchain. Since Bitcoin transactions can hold arbitrary data inside of OP_RETURN outputs, these are used to store the Stacks block hash in addition to a reference to the transaction anchoring the previous Stacks block. Referencing the previous block commitment has the major benefit of allowing us to see the entire chain of Stacks block hashes and how they link together, including any branches, just by looking at the Bitcoin blockchain. Of course we still need to have access to the actual Stacks blocks to be able to see their contents. Finally, in figure 1, the Stacks blocks are shown along the bottom row. They are drawn considerably bigger than the Bitcoin blocks, showing one of the benefits of this approach. It enables us to leverage the security implicit in Bitcoin’s proofofwork without being limited by the 1MB maximum size of Bitcoin blocks.
Since miners of a PoX chain are spending assets on the base (Bitcoin) blockchain via their commitment transactions, they need to be rewarded to provide the incentive to do this. As with all decentralized blockchains, this is achieved by having a native asset on the PoX chain and used to pay the miner a block reward and/or transaction fees. Hence, Stacks has a native asset, called STX.
The PoX protocol used by Stacks is, in many ways, more similar to proofofwork than to proofofstake. This is because the resource (bitcoin) spent by miners is external to the Stacks blockchain, and exists entirely independently of it. Bitcoin existed before Stacks and, if anything ever went wrong with the Stacks chain, Bitcoin would be unaffected. In some other ways, though, PoX has similarities to PoS. Whereas the proofofwork problem is essentially random, so that we do not know who will be the first to construct the next valid Bitcoin block and have it accepted by the network, with PoX and PoS this randomness is not automatic. Instead, it needs to be introduced explicitly in the protocol by the use of pseudorandom numbers or similar methods.
Occasionally, the Stacks chain is confused with the idea of side chains used to extend functionality or scalability of a main blockchain. Whereas a side chain is characterised by the existence of a twoway peg allowing assets to be moved between it and the main chain, PoX is the underlying protocol controlling the production of blocks and ensuring immutability. A sidechain could potentially use a PoX protocol, just as it could use PoW, PoS, a federated approach, or something else. So, the two concepts are orthogonal.
There are various reasons why it can be beneficial for a blockchain such as Stacks to use the proofoftransfer consensus mechanism. For more information, see the article What kind of blockchain is Stacks?
On the other hand, proofoftransfer does have some limitations of which we should be aware,
One of the great things when looking at the PoX protocol, is that all block hashes and forks of the chain are recorded on the base chain for everyone to see. So, let’s get a bit more handson and have a look at some actual commitment transactions used to anchor Stacks blocks.
I took the four Stacks blocks with heights 29966 through 29969 and checked their commitment transactions on the Bitcoin blockchain. The first output of the commitment transactions are all of type OP_RETURN with an 80 byte data string. This data contains the hash of the Stacks block and a reference to the commitment transaction of the previous block, as discussed above, along with some additional information. Figure 2 shows the results of this investigation. Clicking on the first column entries will take you to the Stacks explorer entry for the relevant block. The second column links to the commitment transaction in a Bitcoin block explorer, where the OP_RETURN data of the first output is displayed as a 160 digit hexadecimal string.
To explain what this is showing:
Each of the commitment transactions pays an equal amount in the second and third outputs, which is the transfer to the stackers (holders of STX). The recipients are determined by the Stacks chain state, so it is not possible to determine if the commitment is paying the correct addresses by just looking at the Bitcoin blockchain. It can also be seen that the commitment transactions have a single input, using the same Bitcoin address as the the second output of the corresponding key transaction. Hence, when a miner submits a key transaction, they are fixing the address to be used to pay for any future commitments.
For a more detailed description, see SIP001 and SIP007 on Github.
To add a block to a PoX chain, a miner needs to construct a commitment transaction containing its hash and submit for inclusion in a block of the base chain. However, many different miners could do the same, so that a single block of the base chain can contain many commitment transactions. The protocol used by Stacks selects a single one of these, together with its associated Stacks block. Regardless of how many different commitments are contained in a Bitcoin block and how many different branches (or chain tips) of the Stacks blockchain they build off, only one commitment is regarded as valid.
Consider Bitcoin block height 700631, for example. From figure 2 above, we see that it contains a single commitment at transaction number 352, corresponding to Stacks block 29967. If we look through all of the transactions in this Bitcoin block, we actually see six commitments, listed in figure 3 below. This includes the total amount transferred and the fee for each commitment (in Sats). We also note that the commitments do not all build off of the same chain tip (transaction 271 has a different parent from the rest) but, even so, only the one in position 352 is valid.
The miner whose commitment transaction is selected is referred to as the leader, as in PoS protocols. The choice of leader — or leadership election — is made using a verifiable random function (VRF), and requires each miner to include a 32 byte proving key in his key transaction. This function selects the leader in a pseudorandom fashion depending on the following data.
For the details of this calculation, see SIP001. The important properties for our discussion are:
The first property is vital, and corresponds to the idea in PoW that the probability of a miner producing the next block is in direct proportion to their hashpower. Suppose that an attacker tries to gain control of the Stacks block production by creating the majority of the blocks themselves. They would need greater than 50% chance of winning each leadership election which requires transferring more than 50% of the total bitcoin across all participants. That is, a 51% attack. Economically, the total Bitcoin transferred should be about the same or slightly less than the total value of STX paid to miners in compensation. This will equal the Stacks block reward plus transaction fees multiplied by the value of STX. So, the security against such attacks will be in proportion to the value of STX.
The second property listed is required since, if they could predict the election outcome, miners could create lots of alternative blocks in private and only submit the commitment for the one that they predict will win. This would severely reduce the amount transferred onchain, reducing security and possibly even devolving into a defacto proof of work where the miner who can generate the most commitment transactions has the highest chance of finding a winning one.
The third property means that it is possible to construct the entire chain of Stacks block hashes, including any forks that may exist, by only looking at the Bitcoin blockchain. It also means that the winner of the leadership election is known as soon as the Bitcoin block is produced, and only the winner needs to transmit their Stacks block to the network.
Note that this method of selecting a commitment transaction does not guarantee a valid Stacks block. The election winner could submit an invalid block, or even refuse to submit any block at all. This should not be a problem. If such a situation occurred, then proceeding blocks in the chain would just be built off an earlier, valid, Stacks block. Consequently, the missing or invalid block would not appear in the canonical Stacks chain, and the loser in this situation is the offending miner who paid Bitcoin with his commitment transaction but does not receive the reward on the Stacks blockchain.
As with other protocols such as proofofwork, chains using proofoftransfer can fork. This occurs when more than one block builds off of the same tip, causing the chain to split into two alternative branches. The protocol needs to provide a rule for nodes to determine which of these is the canonical chain. If nodes were to disagree on the canonical branch, or switch from one to the other, then there will be a chain reorg, and transactions in one branch could effectively be lost. This is perfectly fine and is how the blockchain is intended to function, so long as disagreements between nodes or reorganizations only impact a small number of blocks near the tip of the chain. This is why we only consider a transaction to be final when it is confirmed by having sufficiently many blocks on top of it.
One way in which a PoX chain can fork, is if the base chain to which it is anchored forks. This is rather straightforward, and we leave it up to the base chain nodes to determine the canonical branch. For example, with Bitcoin, transactions are usually considered final after they have 6 confirmations, meaning that it is in a block with at least 5 further blocks already added on top. So, for a Stacks block to be considered final, we should at least require that its commitment transaction has 6 confirmations in the Bitcoin blockchain.
There is another way in which a PoX chain can fork. Even without any fork occurring in the base chain, it is possible for more than one block to be built off the same chain tip. This occurs when the leader for a particular base chain block does not build off of the most recent PoX chain block. Such a (hypothetical) situation for Stacks is demonstrated in figure 4 below
We see that the Stacks commitment transaction in the second displayed Bitcoin block builds off of the one in the first as expected. However, the third commitment does not build off of the most recent available anchored Stacks block. Instead, it again builds off of the first one. This creates a fork, and leaders in each of the subsequent Bitcoin blocks choose one or other of these two forks to build on.
This property, where the PoX chain can fork even when the base chain doesn’t, is not easily eliminated. You might think that we could enforce miners to always build off of the most recent PoX block, which would eliminate forks. However, this could be fatal. For example, a leader could submit an invalid block, or could withhold submitting his block for a period of time, or even fail to submit it at all. If we were forced to build off of his block then the PoX chain would come to a halt. We must allow the flexibility for miners to build off of earlier blocks if necessary.
Considering figure 4 again, maybe the second leader was slow in sending his block to the network. So, the third leader could not see his Stacks block and was forced to build off the previous one. If the first leader then, belatedly, submits his block, there are now two valid forks for subsequent leaders to build off. Alternatively, maybe there was a network split where half of the Stacks miners are not able to communicate with the other half. Then, each half starts working on its own branch, forking the Stacks chain.
In the event of a fork, the Stacks protocol needs to select one branch as the canonical one. The solution is simple. The canonical chain is the longest out of all the available valid branches. From looking only at the commitment transactions on the Bitcoin blockchain, it is possible to build the entire tree containing all existing branches. We then need the Stacks blocks themselves to determine which branches are valid, and can select the canonical one. The idea is that in the event of a fork, once one branch becomes longer than the other, this becomes the accepted one. Miners are economically incentivized to build on this branch since, otherwise, they would be paying bitcoin but not receiving any reward in the accepted chain. The effect is to further increase the length advantage of the canonical chain. The longer the canonical branch grows the harder it becomes for any other to catch up, and, before long, all other branches are abandoned. This is called the Nakamoto consensus, since it is the idea introduced by Satoshi Nakamoto with Bitcoin. Once a block has sufficiently many built on top of it, so that any branch splitting from the canonical chain before this point has negligible chance of overtaking it, the block is considered to be final.
Since the Stacks PoX protocol records all branches in the Bitcoin blockchain via commitment transactions, we can look for actual examples of forks that have occurred. In fact, look at the example blocks in figure 2 above. These are consecutive Stacks block heights, but the associated Bitcoin blocks are not consecutive. Bitcoin block heights 700629 and 700630 were skipped. Also, looking at the commitment transactions in Bitcoin block 700631 shown in figure 3, we see that they do not all build off of the same chain tip. What happened here? Figure 5 below shows all valid Stacks blocks anchored to these Bitcoin block heights, and how they are linked.
The number above each Bitcoin block shows its height, and the anchored Stacks block height is displayed underneath. First off, Stack blocks of height 29965 and 29966 are achored to Bitcoin blocks 700627 and 700628 respectively. Next, Bitcoin block 700629 was empty, and contains no transactions at all other than the coinbase. In particular, it does not contain any Stacks commitments. Following this empty block, the leader for Bitcoin block 700630 did not build off of the most recent Stacks block. This created a fork, with two Stacks blocks of height 29966, one in each branch, and anchored to Bitcoin blocks 700628 and 700630 respectively. The leader for Bitcoin block 700631 could have built on either of these branches. In fact, from figure 3 above, we see that there are commitment transactions extending both branches. The winning leader, however, built off the Stacks block 29966 anchored to Bitcoin height 700628, resulting in this becoming part of the canonical chain. The alternative Stacks block 29966 was consequently abandoned, or orphaned.
We can see Stacks block 29966 in the Stacks explorer, anchored at Bitcoin height 700628. The orphaned block 29966 can also be seen in the explorer anchored to Bitcoin block 700630.
One drawback of the PoX protocol as described above is that we can only have one block for each block of the base chain. Using Bitcoin as the base chain, this means that there is on average 10 minutes between each Bitcoin block, during which no new PoX block can be produced. If a user submits a transaction, then it cannot be added to the PoX chain until a new Bitcoin block is added.
Stacks addresses this issue with what are called microblocks. These are blocks which are not anchored to Bitcoin, so require an extension to the protocol. Mixing protocols like this should not be an issue, since the chain is still anchored to Bitcoin, just not at every block.
The way Stacks approaches this is to view these microblocks as being part of the previous anchored block that they build off. They are not treated completely as independent blocks in their own right, and do not get assigned a block height, for example. When the leader creates a block, and sends its hash in a commitment transaction to be included in a Bitcoin block, this cannot include any subsequent transactions which have not even been sent to the mempool yet.Instead, it just contains the transactions that he commits to including. If he wins the election, he can add new microblocks containing additional transactions as they arrive in the mempool.
When the next Bitcoin block comes in, the new leader can choose to build their block off of the most recent microblock, an earlier one, or ignore these microblocks altogether. To encourage leaders to add these microblocks, and for the following leader to build off of the most recent ones, the transaction fees are shared. A portion goes to the leader of the block they are built on, and a portion goes to the next leader who builds on top of these microblocks. See SIP001 for more details on this.
As discussed above, the underlying idea is to secure one blockchain by requiring miners to spend an asset on an entirely separate chain. Keeping with the example of mining Stacks by spending bitcoin, there are two main ways that this can be done. They both involve anchoring Stacks blocks with commitment transactions on the Bitcoin blockchain, and the procedure is identical in almost all ways other than what these commitment transactions do with the bitcoin.
In some ways, the first method (PoB) is closest to PoW, where miners consume energy to solve the proofofwork problem. We are just replacing ‘energy’ with ‘bitcoin’. This ensures that a 51% attack would be expensive to perform, since the attacker would need a lot of bitcoin to burn. PoB is actually the approach originally used by Stacks, described in SIP001.
In the case of burning bitcoin, we are not really destroying total value, since any lost or burned coins reduces the supply and increases the value of any remaining bitcoin. However, there are some drawbacks. The idea of intentionally destroying bitcoin could be controversial, even if it does increase the value of the remaining coins. Furthermore, it transfers value from Stacks to holders of bitcoin. This manifests itself by the process where miners need to keep buying bitcoin to burn, pushing up its price, and then sell the earned STX depressing its price. As the security of the protocol against 51% attacks depends on the value of STX, this can be regarded as an undesirable property of PoB.
The second method (PoX) avoids having to destroy bitcoin by instead transferring it to holders of STX. This is the method currently employed by Stacks, and described in more detail in SIP007. Unlike with PoB, this has the effect of keeping the value within the Stacks ecosystem, with the downwards pressure of miners selling STX being offset by the ability of STX holders to earn bitcoin.
There is one potential problem with PoX. Imagine the extreme situation where one person held all the STX, or at least, held all of the STX participating in stacking. Then, he could mine Stacks blocks almost for free, since the transferred bitcoin would just come back to himself. There would only be a small transaction cost incurred. Although this situation is rather extreme, it is still the case that a holder of a significant proportion of STX could mine Stacks more cheaply than otherwise. For this reason, Stacks has the ability to revert to PoB with the approval of a minority of STX holders.
]]>