This post will step through how to build a simple zkSTARK. Whereas I have outlined the ideas already in a couple of posts (STARKS I and II), these took a very general approach and consequently were rather abstract. This post will be more concrete by detailing how one can be built for a specific calculation. For this, I pick a ‘toy’ example for ease of exposition, but which is still very interesting. The calculation will be based on the rule 30 elementary cellular automaton. There are many ways of actually building a STARK, some of which will perform better than others. I make the choices aiming for simplicity and to avoid complex constructions.
First, let’s recap what a zkSTARK is. It is an acronym for zero knowledge Scalable Transparent Argument of Knowledge. While they are often phrased as proofs of existence of some quantity (e.g., existence of 3colorings of a graph), more properly, they are a proof that the person constructing the STARK has knowledge of this quantity. We can be a bit more precise. The setup must include some algorithm or computable function F and a value b which it may or may not return. This information will be publicly known, so that anyone can verify what F(a) does evaluate to on given input a. Anyone can also make a statement such as:
I know of a value a such that F(a) evaluates to b.
However, such a statement does not itself prove that they do really know of such a value. An argument of knowledge is a method of proving the truth of this statement. As an example, consider digital signatures. In this scenario, the value a is a private key and F is the calculation deriving the associated public key b. A digital signature is an argument of knowledge of a private key. For elliptic curve based schemes as used in ECDSA and Schnorr signatures, it can be shown mathematically that for any public key there does exist an associated private key, although it can be practically impossible to actually find it. Hence, a digital signature is not a proof that a private key exists, which we know already. Instead, it is a proof that it was known to the person constructing the signature.
One method of proving a statement like the one given above is to simply display the value of a, so that anyone can evaluate F(a) and verify that it does indeed equal b. There are two reasons why this may not be a desired or viable approach, corresponding to the other words making up the acronym zkSTARK. Firstly, the value of a may be a secret, such as with the digital signature example. It should be zero knowledge. Ideally, the proof does not reveal any information beyond the fact that the prover knows of a valid value for a. Secondly, it should be scalable.
By scalable, we mean that the size of a zkSTARK should remain small and reasonably quick to validate, even if F is very computationally expensive to compute. For example, we could take the value a and hash it billions of times to get the result b. If we want to convince others of the result, then expecting them to go through the whole tedious calculation might not be realistic. Instead, a STARK can be constructed, which anyone can check proves that the result is b as claimed without having to redo the entire calculation. This is especially useful for blockchains where the use of STARKS can greatly reduce the work required by validators and avoid using up a lot of the limited block space. A long list of transactions with their associated digital signatures could be replaced by a much shorter proof of the combined effect of these transactions.
The scalability property is really the magic part of STARKs and the closely related SNARKs. A very long calculation will take a long time to perform. However, once one person has done this, they are able to prove the result to others without requiring them to also go through the long calculation. We really do not have any right to expect that this is even possible, but luckily it is!
Next, STARKs are transparent, which means that there is no trusted setup. This is in contrast to the closely related SNARKs (scalable noninteractive argument of knowledge), where a procedure is required to initialize the contruction of the SNARKs, and involves data which must be kept secret by those involved. Revealing this data would allow ‘proofs’ to be constructed even though the prover does not have the knowledge being claimed. STARKs do not have this issue.
Finally, I mention that STARKs are noninteractive. Any argument of knowledge involves two parties, the prover and the verifier, with the former trying to convince the latter of the truth of a statement. Interactive proofs involve messages being passed backandforth between the two. Essentially, the verifier asks questions of the prover, and checks his responses, until she is convinced of the proof of the original statement. A noninteractive proof just consists of a single message sent from the prover to the verifier. This is what is needed in many blockchain applications, since the proof can simply be constructed and added to the chain. Anyone can be the verifier by reading the data from the blockchain, such as done by validators.
As with any argument of knowledge, STARKs are sound, meaning that it is practically impossible for someone without the claimed knowledge to construct a valid proof. However, I should point out that, in theory, it is always possible to just guess at a valid argument by blind luck. For this reason, any such construction will come with a soundness parameter. This is an upper bound on the probability that, without the claimed knowledge, any parameter choices made in its construction leads to a valid proof by chance. The idea is that this should be tiny to avoid such false positives. It is true that an untrustworthy prover could try over and over, choosing different parameters each time, to try and bruteforce a solution. As long as the soundness parameter is small enough — say, about 2^{100} or lower — then it becomes practically impossible to even bruteforce a solution.
Automaton Rule 30
The toy calculation used in this post is the elementary cellular automaton Rule 30. The idea is that we have a one dimensional array of cells, indexed by the integers. Each cell can be in one of two states, either set or unset, which I will label by the numbers 0 and 1. We then iteratively update this array, one step at a time. At every step of the calculation, each cell is updated according to the value of it and its immediate neighbours.
Using s_{i}(t) to represent the state of cell i at time t, I will also denote this by just s_{i}, suppressing the time variable for brevity. Its value s_{i}(t + 1) at time t + 1, which I denote by s′_{i}, is some function of s_{i  1}, s_{i}, s_{i + 1}.
s′_{i} = S(s_{i  1}, s_{i}, s_{i + 1}). 
The rule is defined by the function S and, given any initial state, this can be applied to compute the state at every future time. There are many different choices for S giving different cellular automata with different properties. I choose rule 30, for which the state s′_{i} is determined from s_{i  1}s_{i}s_{i + 1} by looking up its value in the following table:
As there are 8 possible values which s_{i  1}s_{i}s_{i + 1} can take, and each of these can lead to two possible states for s′_{i}, there are 2^{8} = 256 distinct elementary cellular automata. This particular one is called ‘rule 30’ since the second row of figure 1 is the binary expansion of the number 30.
To avoid dealing with infinite arrays, I will use a fixed window width of 200. That is, rather than an infinite array of cells, we just have 200 of them labelled as s_{i} for 0 ≤ i < 200. At the edges of the window, we allow the rule to wrap around, so that, on the left, s_{1} is defined to equal s_{199} and, on the right, s_{200} is defined to be s_{0}.
Then, if we start with the single cell number 50 set, and the others unset, repeated applications of the rule give figure 2 below. Here, the time starts from the top of the figure and increases towards the bottom. Each row of pixels represents the state at the corresponding time, with black pixels representing cells which are set and white those which are unset.
The pattern of 0s and 1s expands to fill the whole window, and becomes rather chaotic with randomly appearing triangular gaps below any consecutive sequence of 1s. Let’s try again with the following starting state:
0101101001100101011100100110111100100000010010110110111001101111011101110110110001100101011001000110011101100101
This is the ASCII data for the string “Zero Knowledge”, which I pad with zeros to be 200 cells wide. Starting with this state, the first couple hundred iterations are displayed in figure 3.
Now, let’s keep going. Starting with the same state as used in figure 3, but continuing for a billion iterations, the final couple hundred iterations are shown in figure 4.
The first hundred cells of the final row are:

(1) 
So, if we start with the “Zero Knowledge” message and apply rule 30 one billion times, the first 100 cells are above. We consider building a zkSTARK to represent the statement that we know of a message such that, if rule 30 is applied one billion times, then the above sequence is obtained.
This is a kind of hash function, where we start with an initial message and, after applying the rule, we obtain the output string above. Going in the opposite direction from the final string to the initial message is effectively impossible. This is very much like applying the SHA256 hash function. In fact, the ideas used here could in principle also be used for applications of SHA256. I use rule 30 since it is very simple, and we have the added complexity of using a large number of iterations so that, even if someone knew the initial message, it would take a fair amount of computing power to directly verify.
Algebraic Representation
The first step in building a STARK is to convert the execution rules for the calculation into algebraic equalities, known as the Algebraic Intermediate Representation (AIR). That is, they should only involve the operations of addition, subtraction and multiplication. Rule 30 scan be expressed using the logical OR and XOR operations.
s′_{i} = s_{i  1} XOR (s_{i} OR s_{i + 1}) 
We will interpret the variables s_{i} as lying in a finite field 𝔽. I will take this to be the integers modulo a prime number p. For the moment, the important facts are that we can perform addition, subtraction and multiplication in 𝔽. That the cells s_{i} only take the values 0 or 1 in the field can be expressed by the algebraic identity
s_{i}^{2} = s_{i}.  (2) 
Next, in any field, the logical operation of OR and XOR can be written algebraically by

Let’s apply this to rule 30. It turns out to be a bit simpler if we rearrange it so that the XOR operation is on the left hand side, avoiding iterated logical operations in a single step
s′_{i} XOR s_{i  1} = s_{i} OR s_{i + 1}. 
Writing the logical operators using the algebraic representations above expresses rule 30 as the second order expression
s′_{i} + s_{i  1} – 2s_{i  1}s′_{i} = s_{i} + s_{i + 1} – s_{i}s_{i + 1}.  (3) 
The final condition is that the first hundred values of the final state are as claimed. Letting a_{i} be the 01 values in (1), we simply state the equality
s_{i}(N) = a_{i}  (4) 
for 0 ≤ i < 100.
Polynomial Representation
The idea is to represent the evolution of each cell over the iterations by a polynomial function. Assume that N iterations of rule 30 are applied so that, in our case, N is equal to one billion. Specifically, for each index i, cell number i will be represented by polynomial f_{i}. Fix element ω of the field such that ω^{n} are all distinct over the range 0 ≤ n ≤ N. We will take ω to be a root of unity of some order 2^{s} > N. Then, let f_{i} be the polynomial over our field 𝔽 which traces out the values of cell i after n iterations
f_{i}(ω^{n}) = s_{i}(n) 
as n runs through the integer values from 0 up to N. By Lagrange interpolation there does exist exactly one such polynomial of degree less than or equal to N. Of course, this is a very high degree and storing the coefficients of f_{i} would take up a lot of space. Since N is a billion, this means storing a billion coefficients lying in the field 𝔽, which is even more space than is required just to store the original binary digits s_{i}(n). However, the polynomial will not be stored in the STARK. Instead a hash — or Merkle root — of its values is stored. Using SHA256, this is 256 bits or 32 bytes, regardless of the degree N. In addition, its values at a pseudorandomly selected set of point will be stored. Just enough points to statistically verify the claimed properties, which will be orders of magnitude lower than N.
It will be convenient to represent f_{i} on the set of all powers of ω,
W = {1, ω, ω^{2}, ω^{3}, …} .  (5) 
As ω is chosen to be of order 2^{s}, these powers will start to repeat once W is of size 2^{s}. This will be a bit bigger that N, it does leave some elements on W on which f_{i} is not specified, where it can be set to whatever we want. To keep the redundancy to a minimum, 2^{s} should be taken to be the smallest power of two greater than N, which is 2^{30} when N is one billion. The result is that f_{i} is defined on W and can be extrapolated as a polynomial of degree less than 2^{s}.
We define Z_{N} to be the polynomial vanishing over the execution trace,
(6) 
The fact that s_{i}(n) are 01 valued over the range 0 ≤ n < N means that f_{i}^{2} – f_{i} vanishes at the points ω^{n} by identity (2). Using polynomial factorization, this is equivalent to the identity
f_{i}^{2} – f_{i} = g_{1i}Z_{N}.  (7) 
for some polynomial g_{1i} of degree less than 2^{s}.
Similarly, the algebraic representation (3) of rule 30 can be expressed by a polynomial identity. Using f_{i}○ω to represent the polynomial f_{i}(ωX) it is equivalent to
f_{i}○ω + f_{i  1} – 2f_{i  1}f_{i}○ω – f_{i} – f_{i + 1} + f_{i}f_{i + 1} = g_{2i}Z_{N}  (8) 
for some polynomial g_{2i} of degree less than 2^{s}.
The final state (4) says that f_{i} – a_{i} vanishes at the point ω^{N} or, equivalently,
f_{i} – a_{i} = (X  ω^{N})g_{3i}.  (9) 
over 0 ≤ i < 100 for some polynomials g_{3i} of degree less than 2^{s}.
The original claim that we know of an initial state for which N applications of rule 30 gives the values a_{i} is equivalent to the existence of polynomials satisfying identities (7,8,9). It is only really necessary for the prover to reveal values of f_{i}, since the g polynomials can be computed from these.

(10) 
The claim that the prover needs to show is that these are also polynomials of degree less than 2^{s}. It is enough to show this on any subset S of the field 𝔽 of size at least twice the claimed degree. Then, by polynomial extrapolation, (7,8,9) hold everywhere and, in particular, hold on the execution trace.
Polynomial Commitments
Using the ideas above, the argument of knowledge will require polynomials with degree up to 2^{s}, which can be of the order of billions. Although, by any usual standards, these are very big polynomials, they are often referred to as ‘small degree’ in the literature! This is because an arbitrary function defined on a subset S of a finite field 𝔽 would usually have degree just one less than the size of S, so anything less than this places a big restriction on the set of allowable functions. In fact, there is a special name for them. They are ReedSolomon codes and used for error correction by projecting a received message onto this relatively small subset of allowed polynomials.
Revealing all of the polynomial coefficients would take up a lot of space and require the verifier to compute a lot of terms, which is not what we want. It would also not be zero knowledge. Instead, the prover will just reveal its values at a small number of randomly chosen points. However, this also raises the problem that the prover could be just making up the values as they go, rather than being consistent with a preconstructed polynomial.
These problems are addressed by having the prover first send a commitment to the polynomial. This is a Merkle root of the function values on the domain. Then, when he sends the function values at the selected points, a Merkle proof is also sent to guarantee that the values are indeed those of the function to which he has committed.
As an example, consider a function defined on a set S of size 2^{10} = 1024. Label the points in order x_{0}, x_{1}, …, x_{1023}. In the STARK, these points will follow a geometric sequence x_{n} = cω^{n}. We suppose that the function values are 128 bit integers, so can be represented by 16 byte strings. This gives a binary Merkle tree of depth 10. To achieve zero knowledge, the function values can be hidden by concatenating with a nonce, which I take to be a random 16 byte string, before taking its hash. This ensures that, even if you were to guess the value of the function at any point, revealing the hash does not provide any information on whether the guess is correct, since it is effectively impossible to also guess the nonce. So, the only information that the prover reveals is the function values at the specified points.
I use the SHA256 hash function, so that the hash values and Merkle root are all 32 byte strings, which I show as 64 character hexadecimal numbers. Suppose that the prover reveals the Merkle root as:
Merkle root = 06e893aec8533e367ebadf5da0cfe17ce7b90d01c7bb014f97b8a43e0f71e5e7 
Consider selecting a random point. Say, x_{469}. I chose this by taking the rightmost 10 binary digits of the SHA256 hash of the Merkle root, which has binary expansion 0111010101 or i = 469 in decimal. This ensures that it was not cherrypicked. The prover should reveal not only the value of the function at this point, but also its merkle proof and nonce. In this case:
Value = 775c8c7091445e6dcd26d45c6a525ff5 
Nonce = bb8ea60a3e38ed487f3f5794fbff155e 
Merkle proof = [ 
57c6a608ae818aa1227bbd274b31b87aa681e2726a8a72540699c0c7d2ae5ca7, 62adae7f75814e50f06e9516a370e06cb35ea5fbd02900ed37ec2f4124254521, 0368314612e835c54e1d5e2367fdd9debdc5d68c5f5708fc37eb16c2c661c65c, af8e55b2a734b42082928e57dac5d46259cff6433968632b106c1cff8e0d8fcd, 279de58f7b64e5287a7df215176f832883b008bc9d8bc73c469c6667f6207ac1, 9ccd17e7215de3386dd405c430f29b53fa63fdf365a10927abe155a6af43f42c, 95eeecb30f68037d23ddc0849aaa2f5a430205590f0f8fbb7d32f6cb6499902a, cc2c3eb3959c05890d9dd80088c63f2552417e7b4a6609dd097886e7cb94aff2, a716419a1b9095874dafab2b5f8c6e42c4aa99aac360e9bd2fd5ca97d4686b3b, 40a947966de220a4a6a5537f576e18dde4bd5ae41b8e28a8ca4e78ae1a6acb01 
] 
The verifier can confirm that the provided value is indeed from the function with the specified commitment by applying the Merkle proof. This means performing the following calculations, where ‘mp’ denotes the Merkle proof and ‘h’ is the hash being computed. The concatenation of arguments in the call to the hash function is denoted ‖, with the order of arguments being determined by the path up the Merkle tree — when the corresponding digit of the index 0111010101 is 0, the hash h appears on the left, otherwise it is on the right.
h =  SHA256(value ‖ nonce) 
h =  SHA256(mp[0] ‖ h) 
h =  SHA256(h ‖ mp[1]) 
h =  SHA256(mp[2] ‖ h) 
h =  SHA256(h ‖ mp[3]) 
h =  SHA256(mp[4] ‖ h) 
h =  SHA256(h ‖ mp[5]) 
h =  SHA256(mp[6] ‖ h) 
h =  SHA256(mp[7] ‖ h) 
h =  SHA256(mp[8] ‖ h) 
h =  SHA256(h ‖ mp[9]) 
=  06e893aec8533e367ebadf5da0cfe17ce7b90d01c7bb014f97b8a43e0f71e5e7 
The result agrees with the Merkle root given above, showing that the prover did provide the correct function value.
The example given here was only for a domain of size about 1000. This was just to keep it reasonably short for the purpose of displaying here. We can easily go up to size billion, so a millionfold increase, while only scaling the length of the Merkle proof by three.
So far, so good. If the prover has computed a function on a finite domain, then Merkle trees provide an efficient way to commit to this function, and reveal its values at an arbitrary set of points. For the STARK we need to construct here, the functions will be polynomials with values specified on some large number N of points, where N can be of the order of billions. We could try computing the polynomial coefficients directly using Lagrange polynomials or something similar, then evaluate separately at every point of the domain S. Each polynomial evaluation and coefficient has of the order of N complexity to compute. So, the total calculation would take the prover of order N^{2} time, which can be huge. Do we really expect the prover to go through such a longwinded and, possibly, infeasibly long calculation? Fortunately, we do not need to as Fast Fourier Transform methods can be applied to reduce the complexity to NlogN.
Consider a function f specified at a number d of points, which will be taken to be a geometric sequence c, cω, cω^{2}, …, cω^{d  1} in the field 𝔽. Representing f as a polynomial of degree less than d,
Now consider evaluating at the points of the geometric sequence,
If ω is a primative d’th root of unity, then this is just the Fourier transform of the terms b_{m}c^{m}. We would usually choose d to be a power of 2 so that efficient Fast Fourier Transforms (FFT) can be used. If the values of f are specified on the d points of the domain, an inverse FFT generates all of the coefficients at once. Then, for any other field element c, we can scale by powers of c and apply an FFT to compute f at points {c, cω, cω^{2}, …, cω^{d  1}} , which we denote as the set cW.
All this means is that the prover can use the FFT algorithm to evaluate f at all points of a subset S of our field, as long as this domain is closed under multiplication by ω. Equivalently, S is the collection of points c_{i}ω^{n} for a number M of field points c_{i} and over the range 0 ≤ n < d. This gives a domain of size Md and, by performing M applications of FFT, computes the function on S in a time of order Mdlogd.
While this is much faster than the Md^{2} time required to separately perform polynomial evaluation at every point, it will still be a rather long calculation for large degrees d. It is therefore required that the prover has significant computing power at their disposal — at least, when compared to the verifier who only has to verify the polynomial identities at a comparatively small number of points.
The Setup
To build a STARK style argument of knowledge for an execution trace of length N, we start by choosing a power of two, 2^{s} > N. The idea is that our polynomials representing the execution will have degree less than this and be represented on roots of unity of this order. For N around 1 billion, we take s = 30.
The next step is to choose the finite field 𝔽, which we will take to be the integers modulo a large prime p. The restrictions are that p – 1 should be a multiple of 2^{s} in order that the required roots of unity exist, and also that 1/p should be a negligible probability. This last condition is so that there is a negligible chance of the verifier randomly choosing one of a small number of field values for which the prover could construct an argument without actually having the claimed knowledge. I take p to be slightly less than 2^{128}, since numbers less than this can be represented conveniently in 16 byte blocks, although larger values can be used for more security. So, we can take p of the form 2^{128} – 2^{30}m + 1, and searching for the smallest value of m making this prime gives,
p = 2^{128} – 36.2^{30} + 1. 
Next, we need to find a root of unity ω of order 2^{s}. To do this, just choose an integer x and set ω = x^{2–s(p  1)} (mod p). Fermat’s little theorem guarantees that this is a 2^{s}‘th root of unity but, to ensure that it has exactly this order (rather than a fraction of it), we check that ω raised to the power of 2^{s  1} is 1 (mod p). If it doesn’t we just try again with a different x. In our case, this succeeds for the choice x = 3 giving,
ω = 0x17ead9889fdb09b21c85d0cfd3bdee85. 
The powers of ω define the set W (see equation 5) on which to represent the execution trace.
Next is to choose a domain S in our field on which the prover will commit to the polynomial values which, in order that the execution values are not revealed, should be disjoint from the powers of . We will want its size to be a small multiple M (say, 4) of 2^{s} and, in order to be able to apply Fourier transforms as discussed above, S should the union of scaled copies cW of W. This can be achieved by choosing a sequence c_{1}, …, c_{M} in the field, and letting S consist of elements of the form c_{i}ω^{n},
S = c_{1}W ∪ c_{2}W ∪ ⋯ ∪ c_{M}W.  (11) 
We can set,
{c_{1}, c_{2}, c_{3}, c_{4}} = {2, 3, 4, 5} 
The sets c_{i}W will be disjoint, which can be checked by raising c_{i} to the power of 2^{s} (mod p), and seeing that they are distinct. Also, none of them raised to the power of 2^{s} equals 1 modulo p, so S is disjoint from W.
Building an Argument of Knowledge
The setup above consists of the choice of prime p defining the field, the root of unity ω of order 2^{s}, the scaling factors c_{1}, c_{2}, c_{3}, c_{4} defining the evaluation domain, and the polynomial identities (10) to be verified. These are all fixed beforehand before moving on to the argument of knowledge. This will be scalable in the sense that, even for statements which require a very long computation to directly verify, the procedure described here should not require very many calculations by the verifier or require a lot of information to be processed. The statement regarding the result of a billion iterations of Rule 30 above should involve much less computation by the verifier to convince herself of the truth of the result.
I will first describe the interactive version, consisting of messages between the prover 𝒫 and verifier 𝒱. The noninteractive version will be derived from this later.
The polynomials here are computed from the execution trace of the ‘rule 30’ algorithm, taking values
f_{i}(ω^{n}) = s_{i}(n) 
over 0 ≤ n ≤ N. The prover needs to extrapolate these to the set S as polynomials of degree less than 2^{s}, which can be done using Fourier transforms as described above. Using FFT algorithms, this takes of order 2^{s}s mathematical operations which, using s = 30, is about 30 billion. This is the most computationally heavy part of the whole proof for the prover. After this, the prover needs to build up the Merkle tree and, finally, sends just the Merkle root to the verifier. At a later stage of the procedure, when the prover sends function values, he will also send Merkle proofs in order to guarantee that he is indeed sending values of the function committed to at this step.
The verifier needs to be able to trust that the f_{i} are all polynomials of degree less that 2^{s}, as are the functions g_{ji} defined by (10). If that was true, then they would indeed represent an execution trace for the Rule 30 automaton with the claimed final states.
Note that we have hundreds of functions here. Fortunately, there is a trick to reduce it to a single one. Simply take a random linear combination of them. If the prover can show this to be of the required degree, then almost certainly the original functions are too.
Theorem 1 Let 𝔽 be a finite field of size p and f_{0}, f_{1}, f_{2}, …, f_{n} be functions from a subset S to 𝔽. Let λ_{1}, λ_{2}, …, λ_{n} be independent and uniformly distributed random field elements.
Suppose that f_{i} are not all polynomials of degree less than d. Then the probability of the linear combination
being a polynomial of degree less than d is at most 1/p.
Assuming that 1/p is a negligible probability, this theorem says that showing the linear combination to be of degree less than 2^{s} is sufficient to imply that the same is true of all of the f_{i}. So, the first thing the verifier does is to choose random coefficients in order to reduce the proof to a single polynomial.
The idea is that these coefficients define a new function on the domain,
We could ask the prover to also send a commitment to this linear combination. I will not do this and, instead, the verifier can compute it directly as soon as the prover reveals values of f_{i}.
It remains for the prover to convince the verifier that the values of h_{0} are indeed chosen according to a polynomial of degree less than 2^{s} or, at least, are chosen in this way on most of the domain. By polynomial interpolation, any function can by approximated by such a polynomial on up to 2^{s} points, so it sounds like we would need to sample h_{0} at more values than this, which defeats the whole scalability idea behind STARKs. Fortunately, there is a trick which can be used to efficiently prove that h_{0} satisfies the claimed property with a high degree of certainty. This is known as a Fast RS IOPP, an FRI or, in full, a Fast ReedSolomon Interactive Oracle Proof of Proximity.
The idea is to first break h_{0} up into two polynomials of half the degree. I use S^{2} to denote the set of squares x^{2} of elements x of S.
Theorem 2 Let S be a set of nonzero elements of a field 𝔽 such that for every x in S, x is also in S. Then, any function h from S to 𝔽 can be uniquely decomposed as
h(x) = u(x^{2}) + xv(x^{2}) (12) for functions u,v from S^{2} to 𝔽. Furthermore, h is a polynomial of degree less than d if and only if u,v are both polynomials of degree less than d/2.
Equation (12) is easily inverted to calculate u, v from h,
So, we have replaced our polynomial by two of half the degree. On its own, this has not simplified matters. However, the same trick as above can be be used by exploiting theorem 1 again. If the verifier chooses a random field element μ, consider the new function
If the prover can show that this has degree less than 2^{s  1} then, using theorem 2, we can be confident (up to a probability of 1/p) that h_{0} has degree less than 2^{s}.
This process can be repeated all the way until it reduces to showing that a function is constant, for which the prover just has to send the constant value rather than a Merkle root commitment.
So, let us define new sets S_{0}, S_{1}, …, S_{s} iteratively by setting S_{0} = S and S_{k + 1} = S_{k}^{2}. Recalling that the domain S was defined by equation (11),
S_{k} = c_{1}^{2k}W^{2k} ∪ c_{2}^{2k}W^{2k}⋯ ∪ c_{M}^{2k}W^{2k} 
and W^{2k} is just the powers of the 2^{s  k}‘th root of unity ω^{2k}. With these domains defined, we iteratively reduce the problem to polynomials of lower and lower degree by the following steps, run in order from k = 1 up to k = s.
Step 4k: 𝒫 constructs function h_{k} on S_{k} and sends its commitment to 𝒱.
The claim from the prover is that h_{k} is related to h_{k  1} by,
(13) 
for all x in S_{k  1}. So long as the prover does really construct the functions h_{k} in this way, then they are guaranteed to be polynomials of degree less than 2^{s  k}. So h_{s} will be constant. This can be enforced by, on step 4k above with k = s, he returns the constant value of h_{s} instead of a Merkle root commitment. Then, if equation (13) is really satisfied, we can be confident (to within a negligible probability) that each function h_{k} really does have degree less that 2^{s  k} and, in particular, h_{0} has degree less than 2^{s}.
So far, all that the verifier has received is commitments for the functions f_{i} and h_{k} on their respective domains. These are Merkle roots, which will just appear as random 32 byte strings, so the verifier has learned nothing yet. The next stage is to actually ask for the values of the functions at selected points on their domains, so that the required identities can be checked.
We must decide how many points at which to evaluate each of the functions. This is iteration number n, and the larger its value, the more secure the argument. That is, the less likely it is that a prover can fool the verifier into believing a false statement. The choice of n can be decided by the verifier in the interactive procedure, otherwise it is decided upfront and is part of the initial setup for the argument. The following steps are performed for each value of k from 1 to s.
Step 6k: 𝒫 evaluates h_{k  1}(x), h_{k  1}(x), h_{k}(x^{2}) for all values of x in x_{k1}, x_{k2}, …, x_{kn} and sends their values and Merkle proofs to 𝒱.
For the first iteration with k = 1, by saying that the prover sends the values of h_{0}, we really mean that he sends the values of f_{i} and the verifier computes h_{0} from this using the linear combination (13). At the final iteration k = s, the prover does not really send values of h_{s}, since this is a constant function and its value was already sent as the commitment.
This is all of the messages which need to be transmitted between prover and verifier. All that remains is for the verifier to perform some checks that the information received from the prover is as claimed. If these succeed, the argument of knowledge is validated, otherwise it is invalid.
Step 8: 𝒱 checks the identities (13)2h_{k}(x^{2}) = (1 + μ_{k}/x)h_{k  1}(x) + (1  μ_{k}/x)h_{k  1}(x) for all k = 1, 2, …, s and values of x in x_{k1}, x_{k2}, …, x_{kn}.
Note that as the verifier can only check the identity (13) at the finite randomly chosen set of n points, she cannot validate that it holds everywhere. The best that can be said is that she statistically verifies that it holds at most points, so cannot be certain that h_{0} is a polynomial everywhere of the correct degree. Actually, it is sufficient that she determines that it is a polynomial of degree less than 2^{s} on at least 4^{s} points. As the polynomial degree of both sides of equations (8,9,10) will be of degree less than 4^{s} on these points, they will extend to the entire domain and, in particular, hold on the execution trace.
Since we chose a domain S of size four times that of the domain W of the execution, we only need to show that identities (13) hold on about a proportion ρ = 1/2 of the domain. The probability of this holding by chance, if the prover had not chosen a polynomial, will be no more than ρ^{n}. So long as this is a negligible probability, the proof is sound. Taking n to be about 100 to 200 should be sufficient. This argument for the probability bound is very rough and not rigorous, but for more details see a paper on STARKs, such as Scalable, transparent, and postquantum secure computational integrity.
Note the importance of the order of the interactions between prover and verifier. The verifier chooses points at which to verify equations (13) after receiving all of the polynomial commitments. This avoids the prover cheating by selecting function values to satisfy the identities only at the verified points and not elsewhere. Similarly, steps 3k and 4k are run in order from k = 1 to k = s so that, again, the prover cannot know the later values of μ_{k} and cannot cherrypick functions whose values satisfy the identities for these μ_{k} but for no other choices.
Looking at scalability, steps 3k, 4k, 5k and 6k need to be repeated s times, which is of the order of the logarithm of the number of iterations of Rule 30. Each of the function values revealed by 𝒫 require 𝒱 to verify a Merkle proof of length s, so of the order of (log_{2}N)^{2} ≈ 900 calculations, mainly consisting of computing SHA256 hashes for the Merkle proofs. These are repeated for a number n iterations, but this factor remains roughly constant as N becomes large.
Making it Zero Knowledge
We have described an interactive scalable argument of knowledge, although the procedure above is not zeroknowledge. The prover does not directly reveal either of the secret message used to initialize the calculation, nor does he reveal the execution trace. The functions f_{i} and h_{k} are computed from the execution trace, and are revealed at a number of points. Although these points are on a domain which does not include the execution trace, so does not directly reveal any hidden values, it does depend on them so potentially leaks some partial information to the verifier. It is better if the procedure can be made provably zero knowledge, so does not reveal anything other than the fact that the prover has a secret message with the required property along with some random values.
First, the prover can hide any information that h_{0} may contain by adding a polynomial blinding factor. This is a polynomial g of degree 2^{s} chosen uniformly at random, so has independent and uniformly chosen coefficients. It is to be added to the linear combination (13) when constructing h_{0},
(14) 
As a result, h_{0} will be a random polynomial independent of f_{i},. As h_{k} are derived from this, these are also independent of f_{i}, so do not leak any information. Hence, step 1 is replaced by:
Also, the verifier uses equation (14) instead of (13) incorporating the blinding factor when evaluating values of h_{0}.
Since the prover is required to reveal values of f_{i} at points of the domain S, it is still not quite zeroknowledge. This can be remedied without changing any calculation from the verifiers perspective. All the prover needs to do is to add a random factor to the f_{i} before computing their commitments. He can define new polynomials f′_{i} by
(15) 
where u(X) is a random polynomial of low degree d.
As Z_{N + 1} vanishes on the execution trace, this still satisfies all of the required properties of f_{i} except that its degree is now up to N + d. Sampling f′_{i} at up to d points will give entirely random values, and leaks no information. As the procedure above results in the prover revealing the values of f_{i} at 4n points, we just need to ensure that d is at least this large.
So long as 2^{s} is greater than N + d, the resulting polynomial f′_{i} will still have degree less than 2^{s}, so can be used in place of f_{i} without impacting the algorithm at all. It may require increasing s by one if we are unlucky, this is minor and most likely unnecessary since d will be orders of magnitude less than N. In our case with N equal to 1 billion and s = 30 there is room for the number of iterations n to be anywhere up to 18 million without having to increase s. Using of the order of 100 to 200 iterations is no problem.
Actually, the prover does not even have to explicitly compute the random factor u(X) or apply equation (15). All that is necessary is that for at least d points of the domain W on which f_{i} is constructed, but outside of the execution trace of size N + 1, the values of f_{i} are set to uniformly random values. This automatically adds a random factor u(X) while hardly changing the calculations performed by the prover. The format of the messages sent and the procedure carried out by the verifier is not affected at all.
Making these simple modifications turns the argument of knowledge described above into a zero knowledge proof.
Making it NonInteractive
Finally, to build our zkSTARK, the above interactive argument of knowledge needs to be expressed in a noninteractive format. This is just a block of data which anyone can check and verify that it was constructed with the claimed knowledge. In our case, the knowledge of an initial message which, after 1 billion iterations of Rule 30, the claimed result (1) is obtained.
The Fiat–Shamir heuristic will be used, and the zkSTARK itself will just consist of a list of all of the messages sent by the prover according to the interactive protocol described above. This is:
▪ Commitments for the polynomials h_{k} for k = 1, …, s.
▪ Values of f_{i}(±x), f_{i}(±ωx), g(±x) together with the Merkle proofs, for x in x_{11}, x_{12}, …, x_{1n}.
▪ Values of h_{k  1}(±x), h_{k}(x) together with Merkle proofs for x in x_{k1}, x_{k2}, …, x_{kn} and k = 1, 2, …, s.
For the edge cases, the values of h_{0}(±x) do not need to be stored, since they are computed from the f_{i} and g. Similarly, values of h_{s}(x) do not need to be stored, since it has constant value given by its commitment.
To create the zkSTARK, the prover just needs to go through the steps of the interactive protocol above recording his messages. While this also involves a verifier 𝒱, her role can instead be simulated by a pseudorandom method. Recall that the verifier makes various random choices, and that it is important when they are transmitted to the prover. This is so that the prover cannot use knowledge of these when constructing commitments for the various polynomials. Specifically, 𝒱 makes the following choices.
▪ For each k = 1, …, s, the random field element μ_{k} after 𝒫 has chosen commitments for f_{i}, g and h_{i} for 1 ≤ i < k.
▪ For each k = 1, …, s and i = 1, 2, …, n, the random elements x_{ki} of S_{k} after 𝒫 has chosen commitments for f_{j}, g, h_{l} for 0 ≤ j < 100 and 1 ≤ l ≤ s.
The Fiat–Shamir method uses a cryptographically secure random number generator for these choices. Theoretically, this is done with a random oracle and, in practice, a hash function h() such as SHA256 is used. This will give outputs which are effectively uniformly random 256 bit numbers and can be used to define the verifier’s choice of field elements. Specifically, each choice made by 𝒱 is constructed from the bits of h(u‖v) where u is the concatenated list of commitments made by before the verifier’s choice, and v is just a number incremented after each choice.
This defines the zkSTARK! To construct it, the prover steps through the protocol above using the described heuristic for the verifier’s choices, and records his messages. To validate, we would use the stored values for the prover’s messages and the heuristic for the verifier’s. Then, steps 7 and 8 of the protocol above are performed to check that the STARK is valid.
Reducing the size of the zkSTARK
Is it stands, constructing a zkSTARK exactly as above would be rather inefficient. Although it will be much shorter and faster to process than storing or checking the entire execution trace of a billion Rule 30 iterations, it is still much bigger than is necessary. We have 200 polynomials f_{i} and about 30 polynomials h_{k}, all evaluated at the order of 100300 points, and each value takes up around 16 bytes. These all come with Merkle proofs, each of which will contain around 32 hashes of 32 bytes each. This is adding up very quickly. Fortunately, much of this can be eliminated without affecting security. I didn’t do this above for simplicity, but will finish off by outlining some ways in which the size of the zkSTARK can be significantly reduced.
 Each of the polynomials f_{i} have separate commitments, with their values at the leaves of the Merkle trees. As there are 200 of them, this is very wasteful. Instead, the functions f_{i} and g can be combined into a single Merkle tree, with a single Merkle proof commitment. The leaves of this will contain an array of values, one for each function value at the point. This reduces the number of Merkle proofs for evaluating the f_{i} by at least a factor of 200.
 There will be some overlap between the Merkle proofs of the polynomials evaluated at different points. This results in some repetition of the internal hash values stored, and of evaluating the hashes by the verifier. Stripping out these repeated values would save around 30% from the Merkle proof space and computations.
 Each iteration of Step 6k requires evaluating h_{k  1} at pairs of points ±x. The Merkle tree can be arranged so that these pairs appear at neighbouring leaves of the tree, meaning that they share a single Merkle proof of length one less than their individual proofs.
 Each iteration of Step 6k requires evaluating h_{k} at points x_{ki}^{2}, and for the next value of k, it is also evaluated at points x_{k + 1, i}. These can be combined by taking x_{k + 1, i} = x_{ki}^{2}. This effectively eliminates the computation of h_{k}(x_{ki}^{2}) and its Merkle proof. Together with the previous point, this means only one Merkle proof is evaluated for each value of k and for each iteration. Combining all the Steps 6k, this is a total number of s multiplied by the number n of iterations.
 The Merkle proofs for evaluations of can be reduced further by reducing the polynomial degree by a factor of 4 at each step. Instead of using Theorem 2, use the fact that a polynomial h of degree less than 2^{s} can be written as
h(x) = u_{0}(x^{4}) + xu_{1}(x^{4}) + x^{2}u_{2}(x^{4}) + x^{3}u_{3}(x^{4}) for polynomials u_{i} of degree 2^{s  2}. These can be computed using
4x^{i}u_{i}(x^{4}) = h(x) + α^{3i}h(αx) + α^{2i}h(x) + α^{i}h(αx) where α is a primitive 4’th root of unity (i.e., square root of 1) in the field. This is just an inverse Fourier transform of size 4.
Using this would require four evaluations of h_{k  1} per iteration of step 6k, but we are taking twice as big steps, so it works out the same. However, if the Merkle tree is set up so that x, αx, x, αx are stored in neighbouring leaves, they can share the same Merkle proof. Hence, we halve the total number of Merkle proofs of the functions h_{k}.
 Recall that the polynomials h_{k} are of degree 2^{s  k}. Once this is small enough, the prover can simply send the polynomial coefficients rather than its commitment. No further values of this h_{k} need to be sent and, since it is by definition a polynomial of the required degree, the FRI algorithm need go no further.
In our example, there are s = 30 of these polynomials to commit to. If we stop the algorithm at k = s – 8 and, instead of its commitment, return the 2^{8} = 128 polynomial coefficients, this reduces us to 22 polynomials.
 We worked in a finite field of size a 128 bit prime number p, to ensure that 1/p is tiny and hence have a small soundness parameter. However, for the first step of the algorithm involving committing to functions f_{i}, g, it was only necessary that the domain contains a 2^{30}‘th root of unity and contains domains W and S.
For this to be the case, it is only necessary for p – 1 to be a small multiple of 2^{30}. For example, we could use p = 49.2^{30} + 1. This leads to a fewer number of bits (36 instead of 128) per evaluation, which would significantly reduce the work done by the prover (specifically, when doing the Fourier transforms) and save a small amount of space in the zkSNARK.
From steps 2 onwards, the field containing the random verifier choices λ_{ji} and μ_{k} does need to be large in order to keep the soundness parameter small. These can be chosen in a finite extension of 𝔽_{p}, which will be of size p^{r} for extension degree r. Taking r = 4 should be sufficient.
 In the protocol described here, the verifier has to evaluate the polynomial defined by (6) at a random selection of points,
While these terms do not take up space in the zkSNARK itself, it is required that the verifier computes its values, involving multiplying N terms for each evaluation. As N is of the order of a billion, this will take significant resources. It is possible to make use of the identity
Z_{N}(ωX)(X  ω^{N  1}) = ω^{N}Z_{N}(X)(X  ω^{1}) (16) enabling its values to be computed at consecutive points xω^{i} in constant time for each successive evaluation. This is very useful for the prover, who needs its values on the entire domain S. For the verifier, it is not so much help, as she only requires its values at a small number of nonconsecutive random points.
One method is for the prover to commit to and send the values of Z_{N} along with those of f_{i}, g. This adds a tiny bit of extra data to the zkSTARK. A random multiple of Z_{N} is added to the linear combination (14) to verify that Z_{N} is indeed a polynomial of degree less than 2^{s}. The verifier checks that identity (16) holds at the sampled values stored in the zkSTARK. This removes the computational complexity but, if we go through the maths, it will result in a slight increase in the soundness parameter.
An alternative method is to make use of the identity
to write Z_{N} as
If N is very close to 2^{s}, this can be used by the verifier to quickly compute the values of Z_{N}. For this reason, it can be useful to extend the execution trace length to be as close to 2^{s} as possible, by adding in extra iterations of the rule even though they may not be used in the original calculation.