# Building a STARK

This post will step through how to build a simple zk-STARK. Whereas I have outlined the ideas already in a couple of posts (STARKS I and II), these took a very general approach and consequently were rather abstract. This post will be more concrete by detailing how one can be built for a specific calculation. For this, I pick a ‘toy’ example for ease of exposition, but which is still very interesting. The calculation will be based on the rule 30 elementary cellular automaton. There are many ways of actually building a STARK, some of which will perform better than others. I make the choices aiming for simplicity and to avoid complex constructions.

First, let’s recap what a zk-STARK is. It is an acronym for zero knowledge Scalable Transparent Argument of Knowledge. While they are often phrased as proofs of existence of some quantity (e.g., existence of 3-colorings of a graph), more properly, they are a proof that the person constructing the STARK has knowledge of this quantity. We can be a bit more precise. The setup must include some algorithm or computable function F and a value b which it may or may not return. This information will be publicly known, so that anyone can verify what F(a) does evaluate to on given input a. Anyone can also make a statement such as:

I know of a value a such that F(a) evaluates to b.

However, such a statement does not itself prove that they do really know of such a value. An argument of knowledge is a method of proving the truth of this statement. As an example, consider digital signatures. In this scenario, the value a is a private key and F is the calculation deriving the associated public key b. A digital signature is an argument of knowledge of a private key. For elliptic curve based schemes as used in ECDSA and Schnorr signatures, it can be shown mathematically that for any public key there does exist an associated private key, although it can be practically impossible to actually find it. Hence, a digital signature is not a proof that a private key exists, which we know already. Instead, it is a proof that it was known to the person constructing the signature.

One method of proving a statement like the one given above is to simply display the value of a, so that anyone can evaluate F(a) and verify that it does indeed equal b. There are two reasons why this may not be a desired or viable approach, corresponding to the other words making up the acronym zk-STARK. Firstly, the value of a may be a secret, such as with the digital signature example. It should be zero knowledge. Ideally, the proof does not reveal any information beyond the fact that the prover knows of a valid value for a. Secondly, it should be scalable.

By scalable, we mean that the size of a zk-STARK should remain small and reasonably quick to validate, even if F is very computationally expensive to compute. For example, we could take the value a and hash it billions of times to get the result b. If we want to convince others of the result, then expecting them to go through the whole tedious calculation might not be realistic. Instead, a STARK can be constructed, which anyone can check proves that the result is b as claimed without having to redo the entire calculation. This is especially useful for blockchains where the use of STARKS can greatly reduce the work required by validators and avoid using up a lot of the limited block space. A long list of transactions with their associated digital signatures could be replaced by a much shorter proof of the combined effect of these transactions.

The scalability property is really the magic part of STARKs and the closely related SNARKs. A very long calculation will take a long time to perform. However, once one person has done this, they are able to prove the result to others without requiring them to also go through the long calculation. We really do not have any right to expect that this is even possible, but luckily it is!

Next, STARKs are transparent, which means that there is no trusted setup. This is in contrast to the closely related SNARKs (scalable non-interactive argument of knowledge), where a procedure is required to initialize the contruction of the SNARKs, and involves data which must be kept secret by those involved. Revealing this data would allow ‘proofs’ to be constructed even though the prover does not have the knowledge being claimed. STARKs do not have this issue.

Finally, I mention that STARKs are non-interactive. Any argument of knowledge involves two parties, the prover and the verifier, with the former trying to convince the latter of the truth of a statement. Interactive proofs involve messages being passed back-and-forth between the two. Essentially, the verifier asks questions of the prover, and checks his responses, until she is convinced of the proof of the original statement. A non-interactive proof just consists of a single message sent from the prover to the verifier. This is what is needed in many blockchain applications, since the proof can simply be constructed and added to the chain. Anyone can be the verifier by reading the data from the blockchain, such as done by validators.

As with any argument of knowledge, STARKs are sound, meaning that it is practically impossible for someone without the claimed knowledge to construct a valid proof. However, I should point out that, in theory, it is always possible to just guess at a valid argument by blind luck. For this reason, any such construction will come with a soundness parameter. This is an upper bound on the probability that, without the claimed knowledge, any parameter choices made in its construction leads to a valid proof by chance. The idea is that this should be tiny to avoid such false positives. It is true that an untrustworthy prover could try over and over, choosing different parameters each time, to try and brute-force a solution. As long as the soundness parameter is small enough — say, about 2-100 or lower — then it becomes practically impossible to even brute-force a solution.

#### Automaton Rule 30

The toy calculation used in this post is the elementary cellular automaton Rule 30. The idea is that we have a one dimensional array of cells, indexed by the integers. Each cell can be in one of two states, either set or unset, which I will label by the numbers 0 and 1. We then iteratively update this array, one step at a time. At every step of the calculation, each cell is updated according to the value of it and its immediate neighbours.

Using si(t) to represent the state of cell i at time t, I will also denote this by just si, suppressing the time variable for brevity. Its value si(t + 1) at time t + 1, which I denote by si, is some function of si - 1, si, si + 1.

 s′i = S(si - 1, si, si + 1).

The rule is defined by the function S and, given any initial state, this can be applied to compute the state at every future time. There are many different choices for S giving different cellular automata with different properties. I choose rule 30, for which the state si is determined from si - 1sisi + 1 by looking up its value in the following table:

As there are 8 possible values which si - 1sisi + 1 can take, and each of these can lead to two possible states for si, there are 28 = 256 distinct elementary cellular automata. This particular one is called ‘rule 30’ since the second row of figure 1 is the binary expansion of the number 30.

To avoid dealing with infinite arrays, I will use a fixed window width of 200. That is, rather than an infinite array of cells, we just have 200 of them labelled as si for 0 ≤ i < 200. At the edges of the window, we allow the rule to wrap around, so that, on the left, s-1 is defined to equal s199 and, on the right, s200 is defined to be s0.

Then, if we start with the single cell number 50 set, and the others unset, repeated applications of the rule give figure 2 below. Here, the time starts from the top of the figure and increases towards the bottom. Each row of pixels represents the state at the corresponding time, with black pixels representing cells which are set and white those which are unset.

The pattern of 0s and 1s expands to fill the whole window, and becomes rather chaotic with randomly appearing triangular gaps below any consecutive sequence of 1s. Let’s try again with the following starting state:

0101101001100101011100100110111100100000010010110110111001101111011101110110110001100101011001000110011101100101

This is the ASCII data for the string “Zero Knowledge”, which I pad with zeros to be 200 cells wide. Starting with this state, the first couple hundred iterations are displayed in figure 3.

Now, let’s keep going. Starting with the same state as used in figure 3, but continuing for a billion iterations, the final couple hundred iterations are shown in figure 4.

The first hundred cells of the final row are:

 100011000110011011101110101110000110101101100111101100010001100011100100001100110001001110000101100
(1)

So, if we start with the “Zero Knowledge” message and apply rule 30 one billion times, the first 100 cells are above. We consider building a zk-STARK to represent the statement that we know of a message such that, if rule 30 is applied one billion times, then the above sequence is obtained.

This is a kind of hash function, where we start with an initial message and, after applying the rule, we obtain the output string above. Going in the opposite direction from the final string to the initial message is effectively impossible. This is very much like applying the SHA256 hash function. In fact, the ideas used here could in principle also be used for applications of SHA256. I use rule 30 since it is very simple, and we have the added complexity of using a large number of iterations so that, even if someone knew the initial message, it would take a fair amount of computing power to directly verify.

#### Algebraic Representation

The first step in building a STARK is to convert the execution rules for the calculation into algebraic equalities, known as the Algebraic Intermediate Representation (AIR). That is, they should only involve the operations of addition, subtraction and multiplication. Rule 30 scan be expressed using the logical OR and XOR operations.

 s′i = si - 1 XOR (si OR si + 1)

We will interpret the variables si as lying in a finite field 𝔽. I will take this to be the integers modulo a prime number p. For the moment, the important facts are that we can perform addition, subtraction and multiplication in 𝔽. That the cells si only take the values 0 or 1 in the field can be expressed by the algebraic identity

 si2 = si. (2)

Next, in any field, the logical operation of OR and XOR can be written algebraically by

 a OR b = a + b – ab, a XOR b = a + b – 2ab.

Let’s apply this to rule 30. It turns out to be a bit simpler if we rearrange it so that the XOR operation is on the left hand side, avoiding iterated logical operations in a single step

 s′i XOR si - 1 = si OR si + 1.

Writing the logical operators using the algebraic representations above expresses rule 30 as the second order expression

 s′i + si - 1 – 2si - 1s′i = si + si + 1 – sisi + 1. (3)

The final condition is that the first hundred values of the final state are as claimed. Letting ai be the 0-1 values in (1), we simply state the equality

 si(N) = ai (4)

for 0 ≤ i < 100.

#### Polynomial Representation

The idea is to represent the evolution of each cell over the iterations by a polynomial function. Assume that N iterations of rule 30 are applied so that, in our case, N is equal to one billion. Specifically, for each index i, cell number i will be represented by polynomial fi. Fix element ω of the field such that ωn are all distinct over the range 0 ≤ n ≤ N. We will take ω to be a root of unity of some order 2s > N. Then, let fi be the polynomial over our field 𝔽 which traces out the values of cell i after n iterations

 fi(ωn) = si(n)

as n runs through the integer values from 0 up to N. By Lagrange interpolation there does exist exactly one such polynomial of degree less than or equal to N. Of course, this is a very high degree and storing the coefficients of fi would take up a lot of space. Since N is a billion, this means storing a billion coefficients lying in the field 𝔽, which is even more space than is required just to store the original binary digits si(n). However, the polynomial will not be stored in the STARK. Instead a hash — or Merkle root — of its values is stored. Using SHA256, this is 256 bits or 32 bytes, regardless of the degree N. In addition, its values at a pseudo-randomly selected set of point will be stored. Just enough points to statistically verify the claimed properties, which will be orders of magnitude lower than N.

It will be convenient to represent fi on the set of all powers of ω,

 W = {1, ω, ω2, ω3, …} . (5)

As ω is chosen to be of order 2s, these powers will start to repeat once W is of size 2s. This will be a bit bigger that N, it does leave some elements on W on which fi is not specified, where it can be set to whatever we want. To keep the redundancy to a minimum, 2s should be taken to be the smallest power of two greater than N, which is 230 when N is one billion. The result is that fi is defined on W and can be extrapolated as a polynomial of degree less than 2s.

We define ZN to be the polynomial vanishing over the execution trace, $\displaystyle Z_N(X)=\prod_{n=0}^{N-1}(X-\omega^n).$ (6)

The fact that si(n) are 0-1 valued over the range 0 ≤ n < N means that fi2 – fi vanishes at the points ωn by identity (2). Using polynomial factorization, this is equivalent to the identity

 fi2 – fi = g1iZN. (7)

for some polynomial g1i of degree less than 2s.

Similarly, the algebraic representation (3) of rule 30 can be expressed by a polynomial identity. Using fiω to represent the polynomial fi(ωX) it is equivalent to

 fi○ω + fi - 1 – 2fi - 1fi○ω – fi – fi + 1 + fifi + 1 = g2iZN (8)

for some polynomial g2i of degree less than 2s.

The final state (4) says that fi – ai vanishes at the point ωN or, equivalently,

 fi – ai = (X - ωN)g3i. (9)

over 0 ≤ i < 100 for some polynomials g3i of degree less than 2s.

The original claim that we know of an initial state for which N applications of rule 30 gives the values ai is equivalent to the existence of polynomials satisfying identities (7,8,9). It is only really necessary for the prover to reveal values of fi, since the g polynomials can be computed from these.

 g1i ≡ (fi2 - fi)/ZN, g2i ≡ (fi○ω + fi - 1 - 2fi - 1fi○ω - fi - fi + 1 + fifi + 1)/ZN, g3i ≡ (fi - ai)/(X - ωN).
(10)

The claim that the prover needs to show is that these are also polynomials of degree less than 2s. It is enough to show this on any subset S of the field 𝔽 of size at least twice the claimed degree. Then, by polynomial extrapolation, (7,8,9) hold everywhere and, in particular, hold on the execution trace.

#### Polynomial Commitments

Using the ideas above, the argument of knowledge will require polynomials with degree up to 2s, which can be of the order of billions. Although, by any usual standards, these are very big polynomials, they are often referred to as ‘small degree’ in the literature! This is because an arbitrary function defined on a subset S of a finite field 𝔽 would usually have degree just one less than the size of S, so anything less than this places a big restriction on the set of allowable functions. In fact, there is a special name for them. They are Reed-Solomon codes and used for error correction by projecting a received message onto this relatively small subset of allowed polynomials.

Revealing all of the polynomial coefficients would take up a lot of space and require the verifier to compute a lot of terms, which is not what we want. It would also not be zero knowledge. Instead, the prover will just reveal its values at a small number of randomly chosen points. However, this also raises the problem that the prover could be just making up the values as they go, rather than being consistent with a pre-constructed polynomial.

These problems are addressed by having the prover first send a commitment to the polynomial. This is a Merkle root of the function values on the domain. Then, when he sends the function values at the selected points, a Merkle proof is also sent to guarantee that the values are indeed those of the function to which he has committed.

As an example, consider a function defined on a set S of size 210 = 1024. Label the points in order x0, x1, …, x1023. In the STARK, these points will follow a geometric sequence xn = n. We suppose that the function values are 128 bit integers, so can be represented by 16 byte strings. This gives a binary Merkle tree of depth 10. To achieve zero knowledge, the function values can be hidden by concatenating with a nonce, which I take to be a random 16 byte string, before taking its hash. This ensures that, even if you were to guess the value of the function at any point, revealing the hash does not provide any information on whether the guess is correct, since it is effectively impossible to also guess the nonce. So, the only information that the prover reveals is the function values at the specified points.

I use the SHA256 hash function, so that the hash values and Merkle root are all 32 byte strings, which I show as 64 character hexadecimal numbers. Suppose that the prover reveals the Merkle root as:

Consider selecting a random point. Say, x469. I chose this by taking the right-most 10 binary digits of the SHA256 hash of the Merkle root, which has binary expansion 0111010101 or i = 469 in decimal. This ensures that it was not cherry-picked. The prover should reveal not only the value of the function at this point, but also its merkle proof and nonce. In this case:

 Value = 775c8c7091445e6dcd26d45c6a525ff5 Nonce = bb8ea60a3e38ed487f3f5794fbff155e Merkle proof = [ 57c6a608ae818aa1227bbd274b31b87aa681e2726a8a72540699c0c7d2ae5ca7, 62adae7f75814e50f06e9516a370e06cb35ea5fbd02900ed37ec2f4124254521, 0368314612e835c54e1d5e2367fdd9debdc5d68c5f5708fc37eb16c2c661c65c, af8e55b2a734b42082928e57dac5d46259cff6433968632b106c1cff8e0d8fcd, 279de58f7b64e5287a7df215176f832883b008bc9d8bc73c469c6667f6207ac1, 9ccd17e7215de3386dd405c430f29b53fa63fdf365a10927abe155a6af43f42c, 95eeecb30f68037d23ddc0849aaa2f5a430205590f0f8fbb7d32f6cb6499902a, cc2c3eb3959c05890d9dd80088c63f2552417e7b4a6609dd097886e7cb94aff2, a716419a1b9095874dafab2b5f8c6e42c4aa99aac360e9bd2fd5ca97d4686b3b, 40a947966de220a4a6a5537f576e18dde4bd5ae41b8e28a8ca4e78ae1a6acb01 ]

The verifier can confirm that the provided value is indeed from the function with the specified commitment by applying the Merkle proof. This means performing the following calculations, where ‘mp’ denotes the Merkle proof and ‘h’ is the hash being computed. The concatenation of arguments in the call to the hash function is denoted ‖, with the order of arguments being determined by the path up the Merkle tree — when the corresponding digit of the index 0111010101 is 0, the hash h appears on the left, otherwise it is on the right.

 h = SHA256(value ‖ nonce) h = SHA256(mp ‖ h) h = SHA256(h ‖ mp) h = SHA256(mp ‖ h) h = SHA256(h ‖ mp) h = SHA256(mp ‖ h) h = SHA256(h ‖ mp) h = SHA256(mp ‖ h) h = SHA256(mp ‖ h) h = SHA256(mp ‖ h) h = SHA256(h ‖ mp) = 06e893aec8533e367ebadf5da0cfe17ce7b90d01c7bb014f97b8a43e0f71e5e7

The result agrees with the Merkle root given above, showing that the prover did provide the correct function value.

The example given here was only for a domain of size about 1000. This was just to keep it reasonably short for the purpose of displaying here. We can easily go up to size billion, so a million-fold increase, while only scaling the length of the Merkle proof by three.

So far, so good. If the prover has computed a function on a finite domain, then Merkle trees provide an efficient way to commit to this function, and reveal its values at an arbitrary set of points. For the STARK we need to construct here, the functions will be polynomials with values specified on some large number N of points, where N can be of the order of billions. We could try computing the polynomial coefficients directly using Lagrange polynomials or something similar, then evaluate separately at every point of the domain S. Each polynomial evaluation and coefficient has of the order of N complexity to compute. So, the total calculation would take the prover of order N2 time, which can be huge. Do we really expect the prover to go through such a long-winded and, possibly, infeasibly long calculation? Fortunately, we do not need to as Fast Fourier Transform methods can be applied to reduce the complexity to NlogN.

Consider a function f specified at a number d of points, which will be taken to be a geometric sequence c, , 2, …, d - 1 in the field 𝔽. Representing f as a polynomial of degree less than d, \displaystyle \begin{aligned} f&=b_0+b_1X+b_2X^2+\cdots+b_{d-1}X^{d-1}\\ &=\sum_{n=0}^{d-1}b_nX^n. \end{aligned}

Now consider evaluating at the points of the geometric sequence, $\displaystyle f(c\omega^n)=\sum_{m=0}^{d-1}b_mc^m\omega^{mn}.$

If ω is a primative d’th root of unity, then this is just the Fourier transform of the terms bmcm. We would usually choose d to be a power of 2 so that efficient Fast Fourier Transforms (FFT) can be used. If the values of f are specified on the d points of the domain, an inverse FFT generates all of the coefficients at once. Then, for any other field element c, we can scale by powers of c and apply an FFT to compute f at points {c, , 2, …, d - 1} , which we denote as the set cW.

All this means is that the prover can use the FFT algorithm to evaluate f at all points of a subset S of our field, as long as this domain is closed under multiplication by ω. Equivalently, S is the collection of points ciωn for a number M of field points ci and over the range 0 ≤ n < d. This gives a domain of size Md and, by performing M applications of FFT, computes the function on S in a time of order Mdlogd.

While this is much faster than the Md2 time required to separately perform polynomial evaluation at every point, it will still be a rather long calculation for large degrees d. It is therefore required that the prover has significant computing power at their disposal — at least, when compared to the verifier who only has to verify the polynomial identities at a comparatively small number of points.

#### The Setup

To build a STARK style argument of knowledge for an execution trace of length N, we start by choosing a power of two, 2s > N. The idea is that our polynomials representing the execution will have degree less than this and be represented on roots of unity of this order. For N around 1 billion, we take s = 30.

The next step is to choose the finite field 𝔽, which we will take to be the integers modulo a large prime p. The restrictions are that p – 1 should be a multiple of 2s in order that the required roots of unity exist, and also that 1/p should be a negligible probability. This last condition is so that there is a negligible chance of the verifier randomly choosing one of a small number of field values for which the prover could construct an argument without actually having the claimed knowledge. I take p to be slightly less than 2128, since numbers less than this can be represented conveniently in 16 byte blocks, although larger values can be used for more security. So, we can take p of the form 2128 – 230m + 1, and searching for the smallest value of m making this prime gives,

 p = 2128 – 36.230 + 1.

Next, we need to find a root of unity ω of order 2s. To do this, just choose an integer x and set ω = x2s(p - 1) (mod p). Fermat’s little theorem guarantees that this is a 2s‘th root of unity but, to ensure that it has exactly this order (rather than a fraction of it), we check that ω raised to the power of 2s - 1 is -1 (mod p). If it doesn’t we just try again with a different x. In our case, this succeeds for the choice x = 3 giving,

The powers of ω define the set W (see equation 5) on which to represent the execution trace.

Next is to choose a domain S in our field on which the prover will commit to the polynomial values which, in order that the execution values are not revealed, should be disjoint from the powers of ${\omega}$. We will want its size to be a small multiple M (say, 4) of 2s and, in order to be able to apply Fourier transforms as discussed above, S should the union of scaled copies cW of W. This can be achieved by choosing a sequence c1, …, cM in the field, and letting S consist of elements of the form ciωn,

 S = c1W ∪ c2W ∪ ⋯ ∪ cMW. (11)

We can set,

 {c1, c2, c3, c4} = {2, 3, 4, 5}

The sets ciW will be disjoint, which can be checked by raising ci to the power of 2s (mod p), and seeing that they are distinct. Also, none of them raised to the power of 2s equals 1 modulo p, so S is disjoint from W.

#### Building an Argument of Knowledge

The setup above consists of the choice of prime p defining the field, the root of unity ω of order 2s, the scaling factors c1, c2, c3, c4 defining the evaluation domain, and the polynomial identities (10) to be verified. These are all fixed beforehand before moving on to the argument of knowledge. This will be scalable in the sense that, even for statements which require a very long computation to directly verify, the procedure described here should not require very many calculations by the verifier or require a lot of information to be processed. The statement regarding the result of a billion iterations of Rule 30 above should involve much less computation by the verifier to convince herself of the truth of the result.

I will first describe the interactive version, consisting of messages between the prover 𝒫 and verifier 𝒱. The non-interactive version will be derived from this later.

Step 1: Prover 𝒫 constructs the polynomials fi, and sends their commitments on S to verifier 𝒱.

The polynomials here are computed from the execution trace of the ‘rule 30’ algorithm, taking values

 fi(ωn) = si(n)

over 0 ≤ n ≤ N. The prover needs to extrapolate these to the set S as polynomials of degree less than 2s, which can be done using Fourier transforms as described above. Using FFT algorithms, this takes of order 2ss mathematical operations which, using s = 30, is about 30 billion. This is the most computationally heavy part of the whole proof for the prover. After this, the prover needs to build up the Merkle tree and, finally, sends just the Merkle root to the verifier. At a later stage of the procedure, when the prover sends function values, he will also send Merkle proofs in order to guarantee that he is indeed sending values of the function committed to at this step.

The verifier needs to be able to trust that the fi are all polynomials of degree less that 2s, as are the functions gji defined by (10). If that was true, then they would indeed represent an execution trace for the Rule 30 automaton with the claimed final states.

Note that we have hundreds of functions here. Fortunately, there is a trick to reduce it to a single one. Simply take a random linear combination of them. If the prover can show this to be of the required degree, then almost certainly the original functions are too.

Theorem 1 Let 𝔽 be a finite field of size p and f0, f1, f2, …, fn be functions from a subset S to 𝔽. Let λ1, λ2, …, λn be independent and uniformly distributed random field elements.

Suppose that fi are not all polynomials of degree less than d. Then the probability of the linear combination $\displaystyle f_0+\lambda_1f_1+\lambda_2f_2+\cdots+\lambda_nf_n$

being a polynomial of degree less than d is at most 1/p.

Assuming that 1/p is a negligible probability, this theorem says that showing the linear combination to be of degree less than 2s is sufficient to imply that the same is true of all of the fi. So, the first thing the verifier does is to choose random coefficients in order to reduce the proof to a single polynomial.

Step 2: Verifier 𝒱 chooses field elements uniformly at random; λ1i, λ2i for 0 ≤ i < 200 and λ3i for 0 ≤ i < 100. These are sent to prover 𝒫.

The idea is that these coefficients define a new function on the domain, $\displaystyle h_0=\sum_i\lambda_if_i+\sum_i\lambda_{1i}g_{1i}+\sum_i\lambda_{2i}g_{2i}+\sum_i\lambda_{3i}g_{3i}.$

We could ask the prover to also send a commitment to this linear combination. I will not do this and, instead, the verifier can compute it directly as soon as the prover reveals values of fi.

It remains for the prover to convince the verifier that the values of h0 are indeed chosen according to a polynomial of degree less than 2s or, at least, are chosen in this way on most of the domain. By polynomial interpolation, any function can by approximated by such a polynomial on up to 2s points, so it sounds like we would need to sample h0 at more values than this, which defeats the whole scalability idea behind STARKs. Fortunately, there is a trick which can be used to efficiently prove that h0 satisfies the claimed property with a high degree of certainty. This is known as a Fast RS IOPP, an FRI or, in full, a Fast Reed-Solomon Interactive Oracle Proof of Proximity.

The idea is to first break h0 up into two polynomials of half the degree. I use S2 to denote the set of squares x2 of elements x of S.

Theorem 2 Let S be a set of nonzero elements of a field 𝔽 such that for every x in S, -x is also in S. Then, any function h from S to 𝔽 can be uniquely decomposed as

 h(x) = u(x2) + xv(x2) (12)

for functions u,v from S2 to 𝔽. Furthermore, h is a polynomial of degree less than d if and only if u,v are both polynomials of degree less than d/2.

Equation (12) is easily inverted to calculate u, v from h, \displaystyle \begin{aligned} u(x^2)&=\frac12\left(h(x)+h(-x)\right),\\ v(x^2)&=\frac1{2x}\left(h(x)-h(-x)\right). \end{aligned}

So, we have replaced our polynomial by two of half the degree. On its own, this has not simplified matters. However, the same trick as above can be be used by exploiting theorem 1 again. If the verifier chooses a random field element μ, consider the new function \displaystyle \begin{aligned} h_1(x)&=u(x)+\mu v(x)\\ &=\frac12(1+\mu/x)h_0(x)+\frac12(1-\mu/x)h_0(-x). \end{aligned}

If the prover can show that this has degree less than 2s - 1 then, using theorem 2, we can be confident (up to a probability of 1/p) that h0 has degree less than 2s.

This process can be repeated all the way until it reduces to showing that a function is constant, for which the prover just has to send the constant value rather than a Merkle root commitment.

So, let us define new sets S0, S1, …, Ss iteratively by setting S0 = S and Sk + 1 = Sk2. Recalling that the domain S was defined by equation (11),

 Sk = c12kW2k ∪ c22kW2k⋯ ∪ cM2kW2k

and W2k is just the powers of the 2s - k‘th root of unity ω2k. With these domains defined, we iteratively reduce the problem to polynomials of lower and lower degree by the following steps, run in order from k = 1 up to k = s.

Step 3k: 𝒱 chooses a field element μk uniformly at random and sends it to 𝒫.
Step 4k: 𝒫 constructs function hk on Sk and sends its commitment to 𝒱.

The claim from the prover is that hk is related to hk - 1 by, $\displaystyle h_k(x^2)=\frac12(1+\mu_k/x)h_{k-1}(x)+\frac12(1-\mu_k/x)h_{k-1}(-x).$ (13)

for all x in Sk - 1. So long as the prover does really construct the functions hk in this way, then they are guaranteed to be polynomials of degree less than 2s - k. So hs will be constant. This can be enforced by, on step 4k above with k = s, he returns the constant value of hs instead of a Merkle root commitment. Then, if equation (13) is really satisfied, we can be confident (to within a negligible probability) that each function hk really does have degree less that 2s - k and, in particular, h0 has degree less than 2s.

So far, all that the verifier has received is commitments for the functions fi and hk on their respective domains. These are Merkle roots, which will just appear as random 32 byte strings, so the verifier has learned nothing yet. The next stage is to actually ask for the values of the functions at selected points on their domains, so that the required identities can be checked.

We must decide how many points at which to evaluate each of the functions. This is iteration number n, and the larger its value, the more secure the argument. That is, the less likely it is that a prover can fool the verifier into believing a false statement. The choice of n can be decided by the verifier in the interactive procedure, otherwise it is decided up-front and is part of the initial setup for the argument. The following steps are performed for each value of k from 1 to s.

Step 5k: 𝒱 selects elements xk1, xk2, …, xkn uniformly at random on domain Sk - 1 and sends to 𝒫.
Step 6k: 𝒫 evaluates hk - 1(x), hk - 1(-x), hk(x2) for all values of x in xk1, xk2, …, xkn and sends their values and Merkle proofs to 𝒱.

For the first iteration with k = 1, by saying that the prover sends the values of h0, we really mean that he sends the values of fi and the verifier computes h0 from this using the linear combination (13). At the final iteration k = s, the prover does not really send values of hs, since this is a constant function and its value was already sent as the commitment.

This is all of the messages which need to be transmitted between prover and verifier. All that remains is for the verifier to perform some checks that the information received from the prover is as claimed. If these succeed, the argument of knowledge is validated, otherwise it is invalid.

Step 7: 𝒱 checks that the Merkle proofs received from 𝒫 are all valid.
Step 8: 𝒱 checks the identities (13)2hk(x2) = (1 + μk/x)hk - 1(x) + (1 - μk/x)hk - 1(-x) for all k = 1, 2, …, s and values of x in xk1, xk2, …, xkn.

Note that as the verifier can only check the identity (13) at the finite randomly chosen set of n points, she cannot validate that it holds everywhere. The best that can be said is that she statistically verifies that it holds at most points, so cannot be certain that h0 is a polynomial everywhere of the correct degree. Actually, it is sufficient that she determines that it is a polynomial of degree less than 2s on at least 4s points. As the polynomial degree of both sides of equations (8,9,10) will be of degree less than 4s on these points, they will extend to the entire domain and, in particular, hold on the execution trace.

Since we chose a domain S of size four times that of the domain W of the execution, we only need to show that identities (13) hold on about a proportion ρ = 1/2 of the domain. The probability of this holding by chance, if the prover had not chosen a polynomial, will be no more than ρn. So long as this is a negligible probability, the proof is sound. Taking n to be about 100 to 200 should be sufficient. This argument for the probability bound is very rough and not rigorous, but for more details see a paper on STARKs, such as Scalable, transparent, and post-quantum secure computational integrity.

Note the importance of the order of the interactions between prover and verifier. The verifier chooses points at which to verify equations (13) after receiving all of the polynomial commitments. This avoids the prover cheating by selecting function values to satisfy the identities only at the verified points and not elsewhere. Similarly, steps 3k and 4k are run in order from k = 1 to k = s so that, again, the prover cannot know the later values of μk and cannot cherry-pick functions whose values satisfy the identities for these μk but for no other choices.

Looking at scalability, steps 3k, 4k, 5k and 6k need to be repeated s times, which is of the order of the logarithm of the number of iterations of Rule 30. Each of the function values revealed by 𝒫 require 𝒱 to verify a Merkle proof of length s, so of the order of (log2N)2 ≈ 900 calculations, mainly consisting of computing SHA256 hashes for the Merkle proofs. These are repeated for a number n iterations, but this factor remains roughly constant as N becomes large.

#### Making it Zero Knowledge

We have described an interactive scalable argument of knowledge, although the procedure above is not zero-knowledge. The prover does not directly reveal either of the secret message used to initialize the calculation, nor does he reveal the execution trace. The functions fi and hk are computed from the execution trace, and are revealed at a number of points. Although these points are on a domain which does not include the execution trace, so does not directly reveal any hidden values, it does depend on them so potentially leaks some partial information to the verifier. It is better if the procedure can be made provably zero knowledge, so does not reveal anything other than the fact that the prover has a secret message with the required property along with some random values.

First, the prover can hide any information that h0 may contain by adding a polynomial blinding factor. This is a polynomial g of degree 2s chosen uniformly at random, so has independent and uniformly chosen coefficients. It is to be added to the linear combination (13) when constructing h0, $\displaystyle h_0=g+\sum_i\lambda_if_i+\sum_i\lambda_{1i}g_{1i}+\sum_i\lambda_{2i}g_{2i}+\sum_i\lambda_{3i}g_{3i}.$ (14)

As a result, h0 will be a random polynomial independent of fi,. As hk are derived from this, these are also independent of fi, so do not leak any information. Hence, step 1 is replaced by:

Step 1′: Prover 𝒫 constructs the polynomials g and fi, and sends their commitments on S to verifier 𝒱.

Also, the verifier uses equation (14) instead of (13) incorporating the blinding factor when evaluating values of h0.

Since the prover is required to reveal values of fi at points of the domain S, it is still not quite zero-knowledge. This can be remedied without changing any calculation from the verifiers perspective. All the prover needs to do is to add a random factor to the fi before computing their commitments. He can define new polynomials fi by $\displaystyle f_i' = f_i + u(X)Z_{N+1}(X)$ (15)

where u(X) is a random polynomial of low degree d.

As ZN + 1 vanishes on the execution trace, this still satisfies all of the required properties of fi except that its degree is now up to N + d. Sampling fi at up to d points will give entirely random values, and leaks no information. As the procedure above results in the prover revealing the values of fi at 4n points, we just need to ensure that d is at least this large.

So long as 2s is greater than N + d, the resulting polynomial fi will still have degree less than 2s, so can be used in place of fi without impacting the algorithm at all. It may require increasing s by one if we are unlucky, this is minor and most likely unnecessary since d will be orders of magnitude less than N. In our case with N equal to 1 billion and s = 30 there is room for the number of iterations n to be anywhere up to 18 million without having to increase s. Using of the order of 100 to 200 iterations is no problem.

Actually, the prover does not even have to explicitly compute the random factor u(X) or apply equation (15). All that is necessary is that for at least d points of the domain W on which fi is constructed, but outside of the execution trace of size N + 1, the values of fi are set to uniformly random values. This automatically adds a random factor u(X) while hardly changing the calculations performed by the prover. The format of the messages sent and the procedure carried out by the verifier is not affected at all.

Making these simple modifications turns the argument of knowledge described above into a zero knowledge proof.

#### Making it Non-Interactive

Finally, to build our zk-STARK, the above interactive argument of knowledge needs to be expressed in a non-interactive format. This is just a block of data which anyone can check and verify that it was constructed with the claimed knowledge. In our case, the knowledge of an initial message which, after 1 billion iterations of Rule 30, the claimed result (1) is obtained.

The Fiat–Shamir heuristic will be used, and the zk-STARK itself will just consist of a list of all of the messages sent by the prover according to the interactive protocol described above. This is:

▪  Commitments for polynomials fi and g.
▪  Commitments for the polynomials hk for k = 1, …, s.
▪  Values of fix), fiωx), gx) together with the Merkle proofs, for x in x11, x12, …, x1n.
▪  Values of hk - 1x), hk(x) together with Merkle proofs for x in xk1, xk2, …, xkn and k = 1, 2, …, s.

For the edge cases, the values of h0x) do not need to be stored, since they are computed from the fi and g. Similarly, values of hs(x) do not need to be stored, since it has constant value given by its commitment.

To create the zk-STARK, the prover just needs to go through the steps of the interactive protocol above recording his messages. While this also involves a verifier 𝒱, her role can instead be simulated by a pseudo-random method. Recall that the verifier makes various random choices, and that it is important when they are transmitted to the prover. This is so that the prover cannot use knowledge of these when constructing commitments for the various polynomials. Specifically, 𝒱 makes the following choices.

▪  The random field elements λ1i, λ2i for 0 ≤ i < 200 and λ3i for 0 ≤ i < 100, after 𝒫 has chosen commitments for fi and g.
▪  For each k = 1, …, s, the random field element μk after 𝒫 has chosen commitments for fi, g and hi for 1 ≤ i < k.
▪  For each k = 1, …, s and i = 1, 2, …, n, the random elements xki of Sk after 𝒫 has chosen commitments for fj, g, hl for 0 ≤ j < 100 and 1 ≤ l ≤ s.

The Fiat–Shamir method uses a cryptographically secure random number generator for these choices. Theoretically, this is done with a random oracle and, in practice, a hash function h() such as SHA256 is used. This will give outputs which are effectively uniformly random 256 bit numbers and can be used to define the verifier’s choice of field elements. Specifically, each choice made by 𝒱 is constructed from the bits of h(uv) where u is the concatenated list of commitments made by ${\mathcal P}$ before the verifier’s choice, and v is just a number incremented after each choice.

This defines the zk-STARK! To construct it, the prover steps through the protocol above using the described heuristic for the verifier’s choices, and records his messages. To validate, we would use the stored values for the prover’s messages and the heuristic for the verifier’s. Then, steps 7 and 8 of the protocol above are performed to check that the STARK is valid.

#### Reducing the size of the zk-STARK

Is it stands, constructing a zk-STARK exactly as above would be rather inefficient. Although it will be much shorter and faster to process than storing or checking the entire execution trace of a billion Rule 30 iterations, it is still much bigger than is necessary. We have 200 polynomials fi and about 30 polynomials hk, all evaluated at the order of 100-300 points, and each value takes up around 16 bytes. These all come with Merkle proofs, each of which will contain around 32 hashes of 32 bytes each. This is adding up very quickly. Fortunately, much of this can be eliminated without affecting security. I didn’t do this above for simplicity, but will finish off by outlining some ways in which the size of the zk-STARK can be significantly reduced.

1. Each of the polynomials fi have separate commitments, with their values at the leaves of the Merkle trees. As there are 200 of them, this is very wasteful. Instead, the functions fi and g can be combined into a single Merkle tree, with a single Merkle proof commitment. The leaves of this will contain an array of values, one for each function value at the point. This reduces the number of Merkle proofs for evaluating the fi by at least a factor of 200.
2. There will be some overlap between the Merkle proofs of the polynomials evaluated at different points. This results in some repetition of the internal hash values stored, and of evaluating the hashes by the verifier. Stripping out these repeated values would save around 30% from the Merkle proof space and computations.
3. Each iteration of Step 6k requires evaluating hk - 1 at pairs of points ±x. The Merkle tree can be arranged so that these pairs appear at neighbouring leaves of the tree, meaning that they share a single Merkle proof of length one less than their individual proofs.
4. Each iteration of Step 6k requires evaluating hk at points xki2, and for the next value of k, it is also evaluated at points xk + 1, i. These can be combined by taking xk + 1, i = xki2. This effectively eliminates the computation of hk(xki2) and its Merkle proof. Together with the previous point, this means only one Merkle proof is evaluated for each value of k and for each iteration. Combining all the Steps 6k, this is a total number of s multiplied by the number n of iterations.
5. The Merkle proofs for evaluations of ${h_k}$ can be reduced further by reducing the polynomial degree by a factor of 4 at each step. Instead of using Theorem 2, use the fact that a polynomial h of degree less than 2s can be written as
 h(x) = u0(x4) + xu1(x4) + x2u2(x4) + x3u3(x4)

for polynomials ui of degree 2s - 2. These can be computed using

 4xiui(x4) = h(x) + α3ih(αx) + α2ih(-x) + αih(-αx)

where α is a primitive 4’th root of unity (i.e., square root of -1) in the field. This is just an inverse Fourier transform of size 4.

Using this would require four evaluations of hk - 1 per iteration of step 6k, but we are taking twice as big steps, so it works out the same. However, if the Merkle tree is set up so that x, αx, -x, -αx are stored in neighbouring leaves, they can share the same Merkle proof. Hence, we halve the total number of Merkle proofs of the functions hk.

6. Recall that the polynomials hk are of degree 2s - k. Once this is small enough, the prover can simply send the polynomial coefficients rather than its commitment. No further values of this hk need to be sent and, since it is by definition a polynomial of the required degree, the FRI algorithm need go no further.

In our example, there are s = 30 of these polynomials to commit to. If we stop the algorithm at k = s – 8 and, instead of its commitment, return the 28 = 128 polynomial coefficients, this reduces us to 22 polynomials.

7. We worked in a finite field of size a 128 bit prime number p, to ensure that 1/p is tiny and hence have a small soundness parameter. However, for the first step of the algorithm involving committing to functions fi, g, it was only necessary that the domain contains a 230‘th root of unity and contains domains W and S.

For this to be the case, it is only necessary for p – 1 to be a small multiple of 230. For example, we could use p = 49.230 + 1. This leads to a fewer number of bits (36 instead of 128) per evaluation, which would significantly reduce the work done by the prover (specifically, when doing the Fourier transforms) and save a small amount of space in the zk-SNARK.

From steps 2 onwards, the field containing the random verifier choices λji and μk does need to be large in order to keep the soundness parameter small. These can be chosen in a finite extension of 𝔽p, which will be of size pr for extension degree r. Taking r = 4 should be sufficient.

8. In the protocol described here, the verifier has to evaluate the polynomial ${Z_N}$ defined by (6) at a random selection of points, $\displaystyle Z_N(X)=\prod_{i=0}^{N-1}(X-\omega^i).$

While these terms do not take up space in the zk-SNARK itself, it is required that the verifier computes its values, involving multiplying N terms for each evaluation. As N is of the order of a billion, this will take significant resources. It is possible to make use of the identity

 ZN(ωX)(X - ωN - 1) = ωNZN(X)(X - ω-1) (16)

enabling its values to be computed at consecutive points i in constant time for each successive evaluation. This is very useful for the prover, who needs its values on the entire domain S. For the verifier, it is not so much help, as she only requires its values at a small number of non-consecutive random points.

One method is for the prover to commit to and send the values of ZN along with those of fi, g. This adds a tiny bit of extra data to the zk-STARK. A random multiple of ZN is added to the linear combination (14) to verify that ZN is indeed a polynomial of degree less than 2s. The verifier checks that identity (16) holds at the sampled values stored in the zk-STARK. This removes the computational complexity but, if we go through the maths, it will result in a slight increase in the soundness parameter.

An alternative method is to make use of the identity $\displaystyle \prod_{i=0}^{2^s-1}(X-\omega^i)=X^{2^s}-1$

to write ZN as $\displaystyle Z_N(X)=(X^{2^s}-1)/\prod_{i=N}^{2^s-1}(X-\omega^i).$

If N is very close to 2s, this can be used by the verifier to quickly compute the values of ZN. For this reason, it can be useful to extend the execution trace length to be as close to 2s as possible, by adding in extra iterations of the rule even though they may not be used in the original calculation.