STARKS I: Polynomials, everywhere!

Terms such as zk-SNARKs, zk-STARKs and zk-Rollups are becoming increasingly common when discussing scalability solutions for blockchains. The idea is that, we do not send long and complex calculations to be computed on-chain, since it is expensive and inefficient, and there is limited capacity for such contracts. Instead, the calculation is performed off-chain, and only the result together with a proof that it is the result of the calculation is sent for inclusion in the blockchain. It turns out that it is possible to create a proof that a calculation leads to a particular result, such that the proof is much shorter than the calculation itself. Of course, the process of constructing these proofs is rather complicated, but that is done off-chain. In-depth descriptions of zk-SNARKS and zk-STARKS are usually rather long, and consist of several different and complex ideas all brought together. Here, I look at one of these ideas, which is replacing the original computation by expressions involving polynomials.

Alice and Bob

Suppose that Alice and Bob each have a data file of length N bits, and they want to check that they are identical. Of course, Bob could just transmit his information to Alice, and she can confirm if they are indeed the same. However, these are large files, and communication is slow. Instead, Bob could send only the bits from a randomly chosen set of places in his file. Alice can check that, at least, their data agrees in these locations. If a significant proportion of their data bits are different, then this has a high probability catching the discrepancy. However, it is possible that they differ at only one place, in which case Bob would need to send all or almost all of the data to have a good chance of catching this.

How can they do this is an efficient manner? How many bits does Bob have to send to Alice in order for her to have a high probability of telling whether or not their data agrees for every bit. At first thought, you may think that he needs to send all or almost all of it, so about N bits. On the other hand, if you are familiar with hash functions, then you would probably answer that he could compute something like the SHA256 hash and send this instead, so only 256 bits need to be transmitted. While that is true, there is also an efficient approach using polynomials, where we can prove its efficacy with some simple maths and without assuming any of the properties of hash functions. Also, as we will see, polynomials allow us to do so much more than just comparing bits.

The idea is that, if two polynomials differ even slightly for one coefficient, then they evaluate to different values almost everywhere. Bob constructs the polynomial whose coefficients are the bits from his data,

\displaystyle  f(X)=c_0+c_1X+c_2X^2+\cdots+c_{N-1}X^{N-1}. (1)

Fixing a prime p, he chooses a random integer 0 ≤ x < p, and sends it to Alice along with the polynomial value f(x), taking the result modulo p. This is 2log2p binary bits of data. Alice also forms the polynomial with coefficients from her data, evaluates it at x, and compares the result with Bob’s value f(x). If they agree (modulo p) then she declares that her data is the same as Bob’s, otherwise they are different.

The point is, two distinct polynomials of degree less than N can only agree at fewer than N points. If Alice and Bob’s data are not identical, then the probability of them getting the same values for their polynomial value (modulo p) is less than N/p. Let us fix a tiny value ε. If the prime p is chosen greater than N/ε then the probability of Alice giving an incorrect statement is less than ε. Taking p close to N/ε, the number of bits sent to Alice is

\displaystyle  2\log_2p\approx2\log_2(N/\varepsilon).

This is logarithmic in N, which is much less than sending all of the bits. By sending polynomial values rather than the original raw data, the number of bits that Alice needs to transmit has been drastically reduced. This idea is also used in Reed–Soloman error-correcting codes.

Using polynomials also enables us to represent various relationships between the data bits, unlike hash functions which just completely scramble it. For example, maybe Bob’s data is not arbitrary, but consists of a sequence of calculations, with each term depending on the previous ones. In this situation, Bob is fulfilling the role of a prover, who is attempting to demonstrate some knowledge or calculation results to the verifier, Alice, and the data represents the execution trace. The calculations are represented by some polynomial identities so that, by finding polynomials satisfying these identities, she demonstrates that the calculation has been carried out. He can prove this to Alice by sending her the values of the polynomials at a randomly chosen point. This is a key idea behind SNARKs (Succinct Non-Interactive Argument of Knowledge) and STARKs (Scalable Transparent Argument of Knowledge).

The Setup

Before proceeding, I will briefly recap the setup for the problem being addressed in this post. There are two parties, Bob (the prover) and Alice (the verifier). Bob performs a computation involving a large number N of calculation steps. Each calculation state consists of a fixed number of variables y0, y1, …, ym, the values of which lie in some set F. Using yi,j for the value of variable j at execution time i, the calculation state is yi = (yi,0, …, yi,m). The initial state y0 is specified . Then, at each step, the state is updated according to a transition function yi + 1 = f(yi). The calculation result is given by the final state yN - 1.

y0,0 y0,1 y0,2 y0,3
y1,0 y1,1 y1,2 y1,3
y2,0 y2,1 y2,2 y2,3
yN - 2,0 yN - 2,1 yN - 2,2 yN - 2,3
yN - 1,0 yN - 1,1 yN - 1,2 yN - 1,3

Figure 1: Execution trace.

Bob sends the result to Alice. However, she wants to verify that the calculation was performed correctly, but does not have the resources to perform the full computation herself. Therefore, Bob needs to also send some proof that Alice can use to convince herself of its correctness.

It is required that Alice be able to perform a check of correctness in much less than N steps. We also require the data transferred between the two to be much less than size N. For example, Alice could request the states (yi, yi + 1) at a random sample of calculation times i and check that the transition function was applied correctly. At best, this will only give a statistical upper bound on the number of computation errors. Unfortunately, even a single error in the calculation completely invalidates the result. Alice needs to be able to verify every part of the calculation in much fewer than N steps. We really have no right to expect that this is possible, so it is quite amazing that it can be done at all.

In a bit more detail, the state variables lie in a finite field F. The execution trace of variable yj is interpolated by polynomial gj, so that yi,j = gj(i). The transition function is represented by multivariate polynomials p0, p1, …, pr so that the calculation step yi + 1 = f(yi) is equivalent to

\displaystyle  p_0({\bf y}_i,{\bf y}_{i+1})=p_1({\bf y}_i,{\bf y}_{i+1})=\cdots=p_r({\bf y}_i,{\bf y}_{i+1})=0.

This is known as the algebraic intermediate representation for the calculation. In terms of the interpolation polynomials,

\displaystyle  p_0({\bf g}(i),{\bf g}(i+1))=\cdots=p_r({\bf g}(i),{\bf g}(i+1))=0.

This has to hold for all 0 ≤ i <N - 1, but the equalities can be converted to identities which hold if i is replaced by any element of the field. Alice chooses some random points in the field and asks Bob for the corresponding polynomial values g. She just needs to verify the identities at these points, as well as check the initial and final states are as claimed. This supposes that Alice and Bob can transfer information in both directions, via a two-way communication channel.

Of course, Bob could be lying. What he claims are the polynomial values could be any numbers that he has just selected to satisfy the identities that Alice is checking. We cannot ask him to send the entire polynomial, since this would be the same as sending the entire calculation trace. For now, we will just trust him on this point but, for practical application, we will need a method of also verifying that he is really sending values consistent with a polynomial.

In blockchains, particularly those designed to represent general calculations (smart chains) such as Ethereum, we want to represent a calculation and its result on the chain. For anything beyond very simple calculations, this could take up a lot of valuable space on the blockchain and require nodes to spend a lot of computation validating it. Such methods of representing calculations can instead move the heavy work off-chain, with the on-chain data being much smaller and requiring much less computation to validate. The on-chain information is just the result of the calculation together with a proof that this is indeed the correct result, rather than the full computation itself.

We can think of Bob as sending a transaction to the chain containing the calculation result and proof. Alice is a validator, who is verifying correctness of the blockchain. Communication is only one-way, from Bob to Alice, but this is outside of the scope of the current post. However, the ideas here can be built on by including some additional steps to eliminate both the trust in Bob that he is correctly computing polynomial values, remove any communication from Alice to Bob and also hide any private information used in the calculation. Then, we arrive at zk-STARKS and zk-SNARKS.

Finite Fields

Above, we assumed that Alice takes the results of her polynomial evaluation modulo a prime number p. This is exactly the same as evaluating it in the finite field of integers modulo p. Throughout the remainder of this post, I will assume that F is a finite field of size q = pr for a fixed prime number p and positive integer exponent r. It is known that fields exist of such prime-power sizes and, in the case q = p, it is just the integers modulo p. This case is enough to follow the discussion below, so it is only really necessary to understand modulo-p arithmetic rather than general finite fields.

I write n.x for the product of an integer n with field element x. For the field with pr elements, p.x always evaluates to zero. That is, adding together p copies of the same element gives zero or, equivalently, the field has characteristic p. Then, two such products m.x and n.x for nonzero field element x are equal if and only if m and n are equal modulo p.

I will also optionally interpret an integer n as the field element n.1 given by adding together n copies of the field unit ‘1’. In this way, integer arithmetic interpreted in the field F is the same thing as modulo p arithmetic.

Sometimes, we will want to look at powers of a nonzero field element γ. Its order is the smallest nonnegative integer s satisfying γs = 1. Then, two powers γi and γj are equal if and only if i and j are equal modulo s. Multiplication of powers of γ corresponds to integer addition modulo s. The nonzero field elements under multiplication form a group of size q – 1 and, it is known to be cyclic. That is, there exists cyclic generators, which are elements of order q – 1. Any other nonzero field element can be expressed as a power of any such generator, and it also follows that there are elements of any order which is a factor of q – 1.

Polynomial Interpolation

All polynomials will be assumed to have coefficients in F. The symbol X will be used for the indeterminate in polynomial expressions, as in (1) above. A polynomial can be written as f(X) or, more succinctly, as f with the indeterminate X being implicit. In the remainder of this post, I will state several simple results on polynomials which are very useful for the procedures considered here.

As in the argument above with Alice and Bob, the fact that distinct polynomials of degree N can agree at no more than N places provides a bound on the probability that they take the same value at a random point.

Theorem 1 Let f and g be distinct polynomials of degree no more than N. Then, the probability that they take the same value at a uniformly random point x of the field F is bounded by,

\displaystyle  {\mathbb P}\left(f(x)=g(x)\right)\le \frac Nq.

So long as N/q is negligible, we can therefore check for equality of the two polynomials by just comparing their values at a single randomly selected point.

This does raise a slight notational convention that I glossed over above. What does it even mean for two polynomials to be equal? Regarding them as functions, this could be taken to mean that they take the same values at all points. Or, it could mean that they have the same coefficients. For polynomials over an infinite field such as the rational or real numbers, these two statements are equivalent. Over finite fields, however, they are not. For example, all elements of the field F of size q satisfy the identity

\displaystyle  x^q=x,

which is Fermat’s little theorem. The polynomials Xq and X take the same values at all points, but do not have the same coefficients. In keeping with convention, I consider equal polynomials to, by definition, have the same coefficients. So, Xq ≠ X. For polynomials of degree less than q, the distinction is moot, since equality of coefficients is the same as evaluating to the same values everywhere.

In the situation at the top of the post Bob used the bits of his data as the coefficients of his polynomial. Instead, he could have used them as the values when evaluated at certain pre-selected points. Polynomial values can be used to represent an arbitrary sequence y0, y1, …, yN - 1 of elements of the field F of length N ≤ p. In fact, there exists a unique polynomial f of degree less than N and satisfying

\displaystyle  f(i)=y_i (2)

for each i = 0, 1, …, N – 1.

An explicit construction of the polynomial in (2) can be given by Langrange’s method. Consider representing the sequence at an arbitrary distinct set of points x0, x1, …, xN - 1 of the field F. For each 0 ≤ i < N, we start by constructing a polynomial which is nonzero at xi but is zero at each xj for j ≠ i,

\displaystyle  \ell_i^0(X)=\prod_{j\not=i}(X-x_j).

Now, normalize this so that it takes the value 1 at xi,

\displaystyle  \begin{aligned} \ell_i(X)&=\ell_i^0(X)/\ell^0_i(x_i)\\ &=\prod_{j\not=i}\frac{X-x_j}{x_i-x_j}. \end{aligned}

These are degree N – 1 polynomials satisfying,

\displaystyle  \ell_i(x_j)=\begin{cases} 0,&\textrm{if }i\not=j,\\ 1,&\textrm{if }i=j. \end{cases} (3)

Taking linear combinations of these Lagrange polynomials allows us to represent an arbitrary sequence.

Theorem 2 Let x0, x1, …, xN - 1 be a distinct sequence of elements the field F. Then, for each sequence y0, y1, …, yN - 1 in F, there exists a unique polynomial f of degree less than N satisfying

\displaystyle  f(x_i)=y_i (4)

for each i = 0, 1, …, N – 1. This can be written explicitly as,

\displaystyle  f(X)=\sum_iy_i\ell_i(X). (5)

From the values (3) of the Lagrange polynomials, it is immediate that the polynomial defined by (5) satisfies f(xi) = yi. Also, as i have degree N – 1, it follows that f has degree less than N. As distinct polynomials of degree less than N cannot coincide at N points, it is the unique such polynomial satisfying (4).

Equation (2) is just an application of theorem 2 for the points xi = i, and the condition N ≤ p is only required so that these points are all distinct modulo p.

There is one issue with representing the sequence yi at the linearly spaced points xi = i. Specifically, we have the restriction that the maximum length N of the sequence is bounded by the characteristic p of the field. Otherwise, the points xi would not all be distinct when interpreted in F. This may not be much of a problem when the field is the integers modulo a large prime p, but sometimes we might want to work in a field of size q = pr for a small prime p raised to a relatively large power r. In fact, power-of-two field sizes are often used, which would make (2) almost useless since it couldn’t handle a sequence longer than two. So, instead, we can use geometrically spaced points xi = γi, for a nonzero field element γ of some integer order s. As mentioned above, this order can be chosen to be the size, q – 1 of the full set of nonzero field elements. Alternatively, it can be any factor of this. The points xi are pairwise distinct if and only if N is no greater than s, allowing us to represent any sequence up to length q – 1. Applying theorem 2 gives unique polynomial f of degree less than N satisfying

\displaystyle  f(\gamma^i)=y_i (6)

for each i = 0, 1, …, N – 1. For the reasons just explained, as well as some efficiency considerations, using the interpolating polynomial (6) is often preferred over (2).

Representing Zero

For any subset S of the field F, I will use ZS to denote the lowest order nontrivial polynomial vanishing on S,

\displaystyle  Z_S(X)=\prod_{x\in S}(X-x).

A polynomial f vanishes at a point x if and only if it has the factor X – x. Applying this for all points x in S shows that f vanishes on S if and only if it has ZS as a factor or, equivalently, there exists a polynomial g such that,

\displaystyle  f(X)=g(X)Z_S(X).

Next, suppose that we have two polynomials f and g, and want to verify that they take the same values on the set S. We could of course, just evaluate them both at each of these points and check. It would be preferable to instead have a polynomial identity which, as in the situation at the top of the post, can be validated by checking at a single random point. Equality f = g is clearly sufficient, but not necessary, since it is possible that they may not be equal at points outside of the given set. Instead, using the fact that their difference f – g must vanish on S and, hence, have ZS as a factor, we obtain the following.

Theorem 3 Let S be a subset of the field F and f,g be polynomials. Then, f(x) = g(x) for each x in S if and only if there exists a polynomial h satisfying,

\displaystyle  f(X)=g(X)+h(X)Z_S(X). (7)

So, if Bob wants to prove to Alice that f and g are equal on S, he can achieve this through the quotient polynomial h. Then, if they satisfy (7) at a random point, Alice can conclude that they do indeed coincide on S.

As an example application of theorem 3, suppose that Bob has a Boolean sequence, interpolated by a polynomial f, each element of which can only take the values 0 or 1. How can he prove that is indeed Boolean? It is a straightforward fact that a field element y is equal to 0 or 1 if and only if y2 = y.

Example 1 (Boolean function) A polynomial f takes only values 0 and 1 on the set S if and only if there exists a polynomial g satisfying,

\displaystyle  f(X)^2=f(X)+g(X)Z_S(X).

Next, suppose that Bob claims that his polynomial f takes some small number of stated values yi at a sequence of points xi. He may, for example, be stating that his execution trace has the required initial and final values. Without evaluating it these specific points, how can this be verified? Theorem 2 allows us to construct a polynomial g of low degree with the required property and, then, we apply theorem 3.

Corollary 4 Let S = {x0, x1, …, xN - 1} be a distinct sequence of elements of the field F, and y0, y1, …, yN - 1 be any elements of F. Using theorem 2 choose polynomial g to satisfy g(xi) = yi.

Then a polynomial f satisfies f(xi) = yi for each 0 ≤ i < N if and only if there exists a polynomial h satisfying,

\displaystyle  f(X)=g(X)+h(X)Z_S(X).

We need a polynomial which vanishes at a large number of consecutive points in order to make statements such as that the calculation is consistent with the state transition function at every single step. Specifically, for a nonnegative integer N ≤ p,

\displaystyle  \begin{aligned} Z_N(X) &\equiv Z_{\{0,1,\ldots,N-1\}}(X)\\ &=X(X-1)\ldots(X-N+1). \end{aligned}

vanishes at each 0 ≤ iN. As, typically, N will be a large number, it would be very inefficient to require the verifier to have to compute the values of ZN by multiplying together each of the factors. This could potentially be millions of terms. Fortunately, for the special value N = p, it reduces to an especially simple form which can be readily evaluated.

Theorem 5

\displaystyle  Z_p(X) = X^p-X (8)

The proof of equation (8) follows from the observation that both sides are degree p polynomials whose roots are the integers 0 ≤ i < p (using Fermat’s little theorem for the right hand side).

Even in the case where N is strictly less than p, theorem 5 can still be useful since it allows us to re-express ZN as

\displaystyle  Z_N(X)=\frac{X^p-X}{\prod_{i=N}^{p-1}(X-i)}.

This has p – N terms in the denominator, so can give an efficient method of evaluation if p is greater than, but close to, N.

An alternative approach is to make use of the simple identity,

\displaystyle  XZ_N(X-1)=(X-N)Z_N(X). (9)

This follows straight from the definition but, the important point here, is that this identity is sufficient to characterize ZN. The idea is that the prover Bob can do the work of evaluating ZN and send the result to Alice. She just needs to check the identity above, which is sufficient to guarantee that Bob’s polynomial is, at least, a multiple of ZN.

Theorem 6 If a polynomial Z satisfies the identity

\displaystyle  XZ(X-1)=(X-N)Z(X) (10)

then it is a multiple of ZN.

Theorem 6 can be proven by induction on N. The case N = 0 is immediate since, here, ZN is constant. For positive N, we assume the induction hypothesis that the result holds with N replaced by N – 1. Expressing identity (10) as

\displaystyle  (X+1)Z(X)=(X-N+1)Z(X+1)

it follows that X – N + 1 is a factor of Z,

\displaystyle  Z(X)=(X-N+1)Z^\prime(X).

Substituting into (10) gives the same identity for Z′, but with N replaced by N – 1. So, by the induction hypothesis, is a multiple of ZN - 1. Then, Z is a multiple of (X – N + 1)ZN - 1 = ZN, completing the proof.

As discussed above, it can be convenient to represent a sequence at points γi for a nonzero field element γ of order s. With that in mind, for each nonnegative integer N ≤ s, lets also define the polynomial

\displaystyle  \begin{aligned} \tilde Z_N(X) &\equiv Z_{\{1,\gamma,\ldots,\gamma^{N-1}\}}(X)\\ &=(X-1)(X-\gamma)\ldots(X-\gamma^{N-1}), \end{aligned}

which vanishes at each γi for 0 ≤ iN. As above, this has a particularly simple form for the special case N = s which significantly reduces the calculation cost.

Theorem 7

\displaystyle  \tilde Z_s(X)=X^s-1. (11)

The proof of equation (11) follows from observing that both sides are polynomials of degree s vanishing at the s distinct points γi. Even in the case where N is strictly less than s, equation (11) can be useful since it allows us to re-express N as,

\displaystyle  \tilde Z_N(X)=\frac{X^s-1}{\prod_{i=N}^{s-1}(X-\gamma^i)}.

The denominator here has s – N terms so, choosing the field element γ to have order close to, but greater than or equal to, the number N of calculation steps can significantly reduce the computation required to verify the proof.

Alternatively, it can be checked that N satisfies the identity

\displaystyle  (X-1)\gamma^N\tilde Z_N(\gamma^{-1}X)=(X-\gamma^N)\tilde Z(X). (12)

As already explained for ZN, Bob could compute the values of N and send the result to Alice, who verifies that it satisfies this identity. We have the analogous result to theorem 6.

Theorem 8 If a polynomial Z satisfies the identity

\displaystyle  (X-1)\gamma^NZ(\gamma^{-1}X)=(X-\gamma^N)Z(X)

then it is a multiple of Z̃N.

This result can be proven inductively just as for the proof of theorem 6 above, so I do not give it here.

Example: Fibonacci series

For a simple example applying the results above, consider the Fibonacci Sequence

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, …

It is defined by the initial conditions y0 = 0, y1 = 1 and the transition function yn + 2 = yn + 1 + yn. We suppose that Bob claims that the Nth Fibonacci number yN is equal to some value a (modulo a prime p ≥ N), for a large value of N. Although it may not be the best use of the ideas described here, since there are already fast methods of computing terms in the Fibonacci sequence, let us show how it can be verified.

Bob uses theorem 2 to construct a polynomial of degree no more than N satisfying f(i) = yi over 0 ≤ i ≤ N. Theorem 3 gives a polynomial g satisfying

\displaystyle  f(X+2)=f(X+1)+f(X)+Z_{N-1}(X)g(X). (13)

By inspection, it can be seen that g is of degree at most 1. We also construct a polynomial satisfying the initial conditions u(0) = 0, u(1) = 1 and the claimed final state u(N) = a,

\displaystyle  u(X)=X+(X^2-X)(a-N)/(N^2-N).

Theorem 3 gives a polynomial h satisfying

\displaystyle  f(X)=u(X)+X(X-1)(X-N)h(X). (14)

Again, by inspection, it can be seen that h has degree no more than N – 3 which is large, so we would not expect Alice to evaluate it.

Now, Alice can ask Bob for the values of f, h and possibly Z<N - 1 at a random point of the field and verifies identities (13) and (14). By theorem 1, this validates the claimed result with a probability of error of no more than N/p. By repeating this process a number n times, this probability can be reduced to (N/p)n if necessary.

Alternatively, by representing the calculation at field points γi for nonzero field element γ of order s > N, identities (13) and (14) are replaced by,

\displaystyle  \begin{aligned} &f(\gamma^2X)=f(\gamma X)+f(X)+\tilde Z_{N-1}(X)g(X),\\ &f(X)=u(X)+X(X-1)(X-N)h(X), \end{aligned}

where u is the polynomial of degree 2 or less satisfying u(1) = 0, u(γ) = 1 and u(γN) = a,

Note that this example does not exactly fit into the exact framework described in the setup described above, since the calculation state yi depends on both yi - 1 and yi - 2, not just the previous state. This is not important, and it can easily be written in the setup above by using pairs of consecutive Fibonacci numbers as the calculation state, but this just complicates things.

Example: An Iterated Quadratic

Consider the sequence yi given by the initial condition y0 = 0 and transition function yn + 1 = yn2 + n.

0, 0, 1, 3, 12, 148, 21909, 480004287, …

Bob claims that the Nth value of this series, yN, is equal to some value a modulo a large prime p > N. Using theorem 2 he constructs an interpolating polynomial f(i) = yi (mod p) for 0 ≤ i ≤ N. By theorem 3, there exists quotient polynomials g and h such that

\displaystyle  \begin{aligned} &f(X+1)=f(X)^2+X+Z_N(X)g(X),\\ &f(X)=aX/N+X(X-N)h(X). \end{aligned}

The first of these enforces the transition function yn + 1 = yn2 + n, and the second enforces the initial and final conditions f(0) = 0, f(N) = a.

All Alice needs to do is ask for the values of of f, g and h at some random points and verify that the identities hold.

Example: Collatz Sequence

The Collatz sequence with starting value a positive integer y0 is defined by the transition function

\displaystyle  y_{i+1}=\begin{cases}y_i/2,&{\rm if\ }y_i{\rm\ is\ even},\\ 3y_i+1,&{\rm if\ }y_i{\rm\ is\ odd}.\end{cases} (15)

For example, starting at 12, the sequence is

12, 6, 3, 10, 5, 16, 8, 4, 2, 1, 4, 2, 1, 4, 2, 1, …

The unsolved Collatz conjecture states that for any starting value, this sequence will always hit 1 eventually. We suppose that Bob computes the sequence starting from 97, and sees that y118 = 1. He wants to prove this to Alice, using the method outlined above. How can this be done?

97, 292, 146, 73, 220, 110, …, 5, 16, 8, 4, 2, 1

First, it should be noted that 118 steps is not many, but it is just for illustration and the idea extends to much longer calculations. The first issue in creating an algebraic intermediate representation is that (15) depends on whether yi is even or odd. One solution is to represent the binary digits at each stage of the computation, so that we can just check the least significant bit to determine if it is even.

Bob computes the terms of the sequence, and notes that its maximum value is 9232, which can be represented by a 14 digit binary number. Hence, the execution state will be represented by 14 binary digits.


Figure 2: Collatz sequence execution.

We work modulo a prime p ≥ 216 which is large enough to represent 3y + 1 for all 14 bit binary numbers y. Using theorem 2, Bob creates 14 polynomials fj for 0 ≤ i ≤ 13 such that fj(i) is the jth binary digit of yi. As in example 1 above, the fact that these take the values 0 and 1 is expressed by the identity

\displaystyle  f_j(X)^2=f_j(X)+Z_{119}(X)p_j(X) (16)

for some quotient polynomials pj. While we could work with the binary digits, it will be convenient here to also introduce a polynomial g representing the numbers in the Collatz sequence themselves. Since g(i) = yi is given by summing 2jfj(i), theorem 3 gives the identity

\displaystyle  g(X)=\sum_{j=0}^{13}2^jf_j(X)+Z_{119}(X)q_1(X) (17)

for quotient polynomial q1. Next, the transition function (15) is given by the identity

\displaystyle  (g(X+1)+2g(X)+1)f_0(X)-2g(X+1)+g(X)=Z_{118}(X)q_2(X) (18)

for a quotient polynomial q2. Note that if yi is even then f0(i) = 0, so (18) reduces to 2g(i + 1) = g(i). If, on the other hand, yi is odd then it reduces to g(i + 1) = 3g(i) + 1. The initial condition g(0) = 97 and final state g(118) = 1 give the identity

\displaystyle  g(X)=97-48X/59+X(X-118)q_3(X). (19)

To verify Bob’s statement that y118 = 1, Alice just needs to verify that identities (16,17,18,19) hold at some randomly selected values for x.

Note I: The example of a Collatz sequence shows how we can represent a state as a set of binary digits, and update it by a transition function. All of the standard logic gates can be represented algebraically with, for example, a NOR gate given by

\displaystyle  (1-y_1)(1-y_2).

It follows that any Turing machine can be simulated using these methods. The difficulty is doing it in an efficient way.

Note II: Above, we used a binary expansion to represent the computation state, which was helpful to be able to algebraically express whether the number is odd or even at each step. Really, though, the important effect of using a binary representation is to bound the terms in the sequence. Since we used 14 binary digits, each term was guaranteed to be less than 214. Algebraically expressing whether an integer y is odd or even is straightforward. There will always exist unique integer values u and v satisfying

\displaystyle  \begin{aligned} &y=u+2v,\\ &u^2=u. \end{aligned} (20)

Then, u is equal to 0 if y is even and 1 if it is odd. However, modulo an odd prime p, (20) always have two solutions, one with u equal to 0 and one with it equal to 1, so it does not help. For 0 ≤ y < p – 1, equation (20) does have a unique solution satisfying the bound 0 ≤ v < p/2 – 1, and we can use the value of u to determine if y is even or odd. However, how can we restrict to this solution? We need to be able to express the bound on v algebraically. One way to bound v is to express its binary expansion with a fixed number of digits, restricting its range of possible values. If another efficient method of expressing such a bound via algebraic equalities exists, then it could be used to remove the need for the binary expansion above.

Polynomial Evaluation

In the scenarios above, the prover Bob has an execution trace of length some large number N. These are sequences of length N, for which he needs to construct interpolation polynomials. If he does this by naively applying equation (5) of theorem 2 as it is, then it would be very inefficient. Each term i(X) is a product of n terms, so directly computing these for each value of i would take a time of order O(N2), which could be infeasible. Since the sequence is of length N, Bob must take at least order O(N) computing time, but going all the way up to N2 is a bit much.

We start with a set of points S = {x0, x1, …, xN - 1} and sequence y0, y1, …, yN - 1, all of which are elements of F. Equation 4 for the interpolation polynomial can be rearranged as,

\displaystyle  f(X)=Z_S(X)\sum_{i=0}^N\frac{y_i}{\ell^0_i(x_i)(X-x_i)} (21)


\displaystyle  \ell^0_i(x_i)=\prod_{j\not=i}(x_i-x_j). (22)

The overall multiplicative factor ZS(X) in (21) is a product of N terms, which only needs to be performed once, but we still have the issue that each of the individual i0(xi) is a product of O(N) terms. Fortunately, it can be simplified for the cases of interest.

In the case where xi = i, (22) simplifies to,

\displaystyle  \ell^0_i(x_i)=(-1)^{N-i-1}i!(N-i-1)! (23)

The factorials from 0! up to (N - 1)! can be computed and stored with a single iteration of length N, so (23) enables all of the i0(xi) terms to be computed at once in O(N) time. Putting these back into (23) gives the value of the polynomial interpolant f(x) for any point x in O(N) time.

Alternatively, if xi = γi, then (22) gives the iterative equation,

\displaystyle  \ell^0_{i+1}(x_{i+1})=\frac{\gamma^{-1}-\gamma^i}{1-\gamma^{i+1-N}}\ell^0_{i}(x_i)

This again enables all of the i0(xi) terms to be computed in an iteration of length N, giving f(x) in O(N) time.

The full algorithm used by STARKs actually require the prover to initially compute the values of f at a large number M ≥ N of points. If we were to apply the above O(N) algorithm at all of these, the total algorithm would be of order MN, which is at least N2. Consider the case where xi = i and we want to evaluate f at N consecutive points x + j over 0 ≤ j < N. Putting this into (21) gives,

\displaystyle  f(x+j)=Z_N(x+j)\sum_{i=0}^{N-1}\frac{y_i}{\ell^0_i(x_i)(x+j-i)}.

The ZN multiplying factors can be computed by an iteration of length N using the recursive formula (9), so takes time O(N). The summation can be expressed as a convolution,

\displaystyle  \begin{aligned} &\sum_{i=0}^{N-1}a_ib_{j-i},\\ &a_i\equiv y_i/\ell^0_i(x_i),\\ &b_i\equiv 1/(x+i). \end{aligned}

There exist convolution algorithms of order O(NlogN), such as by the use of fast Fourier transforms (FFT). This allows f to be computed on the entire sequence of N points with a complexity O(NlogN).

So, if Bob is required to compute the values of f at a number M ≥ N of points then, so long as they can be split into M/N sets of N consecutive values, the entire calculation can be done in time O(MlogN).

The case with xi = γi can be handled in a similar way. Using (21) to compute f at a sequence of N points j over 0 ≤ j < N,

\displaystyle  f(x\gamma^j)=\tilde Z_N(x\gamma^j)\sum_{i=0}^{N-1}\frac{y_i}{\ell^0_i(x_i)(x\gamma^j-\gamma^i)}.

Again, the N factors can be computed using iterative formula (12) and the sum is a convolution,

\displaystyle  \begin{aligned} &\sum_{i=0}^{N-1}a_ib_{j-i},\\ &a_i=\gamma^{-i}y_i/\gamma^i\ell^0_i(x_i),\\ &b_i=1/(x\gamma^i-1). \end{aligned}

So, the values of f on the entire sequence of length N can be computed in O(NlogN). Therefore, if a set of M ≥ N points can be arranged into M/N sequences of the form i, then f can be computed on the entire set in time O(MlogN).

I note that there are various different ways in which the polynomials can be evaluated while still leading to an O(MlogN) algorithm. For example, if we have a polynomial of degree less than N

\displaystyle  f(X)=c_0+c_1X+c_2X^2+\cdots+c_{N-1}X^{N-1},

then its values at the points i are,

\displaystyle  f(x\gamma^i)=\sum_{j=0}^{N-1}c_jx^j\gamma^{ij}.

If γ has order N, then this is just the Fourier transform of the coefficients cjxj. An inverse FFT can be used to back out the coefficients from its values at the points γi, followed by an FFT to obtain its values at the above points.

Further Reading

I include some useful links for further information on STARKS and the ideas discussed in this post.

  1. Alan Szepieniec. Anatomy of a STARK.
  2. StarkWare, STARK Math. Part 1, Part 2, Part 3, Part 4, Part 5.
  3. Vitalik Buterin, STARKs. Part 1, Part 2, Part 3.
  4. Eli Ben-Sasson, Iddo Bentov, Ynon Horesh, and Michael Riabzev. Scalable, transparent, and post-quantum secure computational integrity.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s