Schnorr Signatures

As part of the upcoming taproot upgrade, Bitcoin will support the Schnorr digital signature algorithm, in addition to the existing ECDSA implementation. In many ways, Schnorr signatures are superior to ECDSA. Not only are they simple to implement, they are also theoretically easier to understand, more amenable to mathematical proofs of security, have smaller signature strings, and are consistent with key and signature aggregation as well as batch verification. Probably, the main reason that Schnorr signatures were not used from the start, is likely due to the fact that it was under patent until 2008, so was not standard at the time of Bitcoin’s creation. Both the ECDSA and Schnorr implementations are based on the same secp256k1 elliptic curve. I give a description of Schnorr signatures, and how they are implemented by Bitcoin, in this post.


The Setup

The Schnorr signature method requires us to first agree on the following components, which define the implementation.

  • A cyclic group E of prime order p. This is usually taken to be an elliptic curve over a finite field which, for Bitcoin, is taken to be the secp256k1 curve. This is the curve defined by the equation y2 = x3 + 7, with components (x,y) belonging to the field of integers modulo the 256 bit prime number
    q = 2256 – 232 – 29 – 28 – 27 – 26 – 24 – 1.

    The group itself is not of size q. Rather, it is cyclic with order equal to another 256 bit prime number

    p=0xfffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e8cd0364141

    To be precise, we should also specify exactly how elements of the group E are represented as binary strings, so that they can be transmitted across the internet and stored in Bitcoin transactions. I will describe this further below.

    There is also a minor choice of notation to be made, although it does not impact the implementation at all. Groups can be written either multiplicatively or additively. In this post, I will use the additive notation. This is just a personal choice in writing out the formulas, but should be mentioned since some descriptions such as the Wikipedia page use multiplicative notation, meaning that the formulas look slightly different.

  • A generator G for the group. As the group is of prime order, a generator is just any element other than the identity. Although it is arbitrary, for completeness I state the choice used by Bitcoin, which is (x,y) with,
    x=0x79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798
    y=0x483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8
  • A cryptographic hash function h which, in the Bitcoin implementation, is SHA256.

With these choices made, we need to describe three things to have a complete digital signature algorithm. These are, the method of constructing a public/private key pair, the algorithm to verify the digital signature associated with a message, and an algorithm for the owner of the private key to construct such signatures.


Key Pairs

The first step in any digital signature algorithm is the selection of a public/private key pair. This is performed by the user who needs to be able to validate messages or transactions. They should choose a private key at random, and compute the public key from this. The public key is, as the name suggests, distribute publicly, while the private key must be kept secret. Anyone in possession of the private key would be able to sign messages, and spend the Bitcoin secured by the associated public key.

In the Schnorr algorithm, the private key is just an integer x modulo p. This is represented by a unique integer value in the range 0 ≤ x < p. For security, the private key should be chosen at random.

The associated public key is a point in the group E. It is computed by multiplying the base point G by the private key,

P = x.G. (1)

As long as x is chosen at random, then P will be a completely random point of E.

Recall that the private key should be kept secret, since anyone in possession of this would be able to sign transactions. However, looking at equation (1), it appears that the private key can simply be backed out from the public one. By construction, it is the unique integer (mod p) which, when multiplied by G, gives the public key. In principle this can indeed be done, but the idea is that this is a hard problem for the choice of group E. This is the discrete logarithm problem (in our choice of additive group notation, it looks more like a discrete division problem). For elliptic curves such as the secp256k1 used by Bitcoin, there is no known way to solve for x other than a brute-force search which, given the huge size of the group, is computationally infeasible.

It is the hardness of the discrete logarithm problem that gives this method its security. Reconstructing private keys from public ones or, alternatively, producing valid signatures without knowledge of the private key, requires solving this problem which, for the Bitcoin elliptic curve, has no known feasible method of solution. More specifically, given an element P of the group, chosen uniformly at random, it should be impossible to solve (1) for x with anything other than a vanishingly small probability of success.


Signature Verification

Once a public/private key pair has been selected, the next components of a digital signature algorithm are the processes of signing a message and verifying the signature. The signing is performed by the person in possession of the secret key, which is not made publicly available, so cannot form part of the validation procedure. Hence, the validators are only required to perform the verification step so this is the only bit that, strictly speaking, is part of the protocol. I describe the signatures and validation now, and will describe how valid signatures can be created using the private key in a moment.

So, suppose that we have a message m, and a public key P. Although it can be expressed in a few different ways, a signature essentially consists of three things, (R,e,s). These are a point R of the group E, and integers e and s (modulo p). The signature is valid if it satisfies the identities,

s.G = R + e.P,
e = h(RPm).
(2)

Here, I am using RPm to represent the string formed by concatenating the binary representation of the points R,P of the group and the message m. We then apply the hash function to this to obtain e. The idea is that e depends on the point R and the message in a way which is deterministic, but pseudorandom, which is provided by the use of a cryptographic hash function. Creating valid signatures without knowing the private key is very difficult, and practically impossible, since finding a number s satisfying (2) requires solving the discrete logarithm problem for the group.

There is not really any need to include the point P inside the hash function for computing e, in which case we instead would have e = h(Rm). I am describing the implementation used in Bitcoin, which includes the point P in the argument of the hash function as stated in (2). The reason for this is to do with security when combining keys in the MuSig scheme, although I will not discuss that here.

In practice, representing all three components (R,e,s) in the signature string would be a waste of space, since they are over-specified. Instead, we only need to specify one of R or e, and the other can easily be backed out. The following are three ways in which signatures can be represented, along with the associated verification process.

  • Given the public key P and signature (R,s).
    1. compute e = h(RPm).
    2. verify s.G = R + e.P.
  • Given the public key P and signature (e,s).
    1. compute R = s.Ge.P.
    2. verify e = h(RPm).
  • The following method applies to the implementation where the public key P is not included in the argument to the hash function. Given a hash of the public key H = h(P) and signature (R,s).
    1. compute e = h(Rm).
    2. compute P = e-1.(s.G – R).
    3. verify H = h(P).

    In the second step, e-1 refers to the inverse of e modulo p.

The Bitcoin taproot implementation has made the first choice above, so that a digital signature is a pair (R,s) consisting of a point on the curve E and an integer modulo p. The third method above is similar to the approach used by Ethereum, although there the ECDSA signature method is used rather than Schnorr, the idea is similar. Rather than directly distributing the public key itself, instead an address H is computed as a hash of this key. I note that the hash used to produce the address need not be the same as the hash function used in the rest of the algorithm. In fact, Ethereum uses the rightmost 20 bytes of the Keccak-256 hash for creating addresses. Then, when verifying a signature we do not need the public key itself. Instead, it is constructed by the verify procedure, and we check that its hash matches the associated address.


Signing

For the digital signatures described above to be of use, it should be possible for someone in possession of the private key x to produce a valid signature corresponding to any message m. The idea is that if R is chosen to be a known multiple of the generator G then, since P is also a known multiple of G, it is straightforward to solve (2) for s. Explicity, if R = k.G and P = x.G then (2) becomes,

s.G = k.G + ex.G

for which the solution is just s = k + ex (mod p).

The signing process generates the signature values (R,e,s) as follows.

  1. Choose a number k uniformly at random (modulo p), called the nonce.
  2. Set R = k.G.
  3. Compute e = h(RPm).
  4. Compute s = k + ex (mod p).

As long as k is chosen at random, then R = k.P will be an entirely random point on the curve, and the resulting signature will be secure. There are a couple of points that should be noted, and could lead to leaking the private key if care is not taken.

  • If someone knows the nonce chosen to sign a message, they can hack the private key. This is done by rearranging the equality s = k + ex to obtain,
    x = e-1(s – k).
  • If the same nonce is used to sign more than one message, a hacker can recover the private key. As the nonce is the same for two messages, the value of R will be the same, although the values of s and e will be different. So, (2) applied to the two signatures gives,
    s1.G = R + e1.P,
    s2.G = R + e2.P.
    (3)

    By subtracting these, we obtain

    (s1 – s2).G = (e1 – e2).P

    Hence, the private key can be backed out as

    x = (e1 – e2)-1(s1 – s2) (mod p). (4)

To avoid the vulnerabilities above, the nonce should be chosen uniformly at random and discarded after use, so that no third party can ever have access to it. If multiple messages are to be signed with the same key pair, the nonce should be chosen afresh each time. This is a potential weakness, since it requires the signer to have access to a secure source of randomness. It would also be easy for third party implementations to introduce bugs or vulnerabilities in this aspect of the procedure. For this reason, the nonce can alternatively be chosen in a deterministic way by applying the hash function to the private key x and message m,

k = h(xPm).

Due to the cryptographically secure pseudorandom properties of the hash function, this choice of nonce is effectively random to anyone not in possession of the private key. It will also be different, and effectively independent, for each different message that is signed.

It should be noted that considerations such as the choice of nonce is not part of the protocol of the Bitcoin blockchain. This is because all that matters for verification is that the signature is valid, and validation does not otherwise care about how the signature was produced.


Security

The question has to be asked, how do we know that the signatures described above are secure? That is, if someone knows the public key but not the private key, how can we be sure that they cannot create a valid signature for a message m? I will give a brief argument which, although it falls short of a rigorous mathematical proof, should be reasonably convincing, and the ideas can be made rigorous.

First, from (2), creating a valid signature requires finding a point R in the group E and an integer s satisfying,

s.G = R + h(RPm).P. (5)

Next, as it is a hash function, the value of h(RPm) is essentially random, given any value of R. This is the random oracle model for the hash function. So, any algorithm capable of solving (5) is effectively able to solve,

s.G = R + e.P

where e is random and independent of R. Hence, for some choice of R, it must be able to solve this for many different values of e. If we can solve for only two values,

s1.G = R + e1.P,
s2.G = R + e2.P

then, as these are just the same as equations (3), the private key can be backed out by (4) in the same way as above. This would be a solution to the discrete logarithm problem in the group E, which was assumed to be hard.

Next, what about if the owner of the private key has already signed some messages. Is it possible for a hacker to use this information to gain access to the private key or, at least, sign some other message? If the messages were not signed carefully, such as by not choosing the nonces at random, or if the nonce values were leaked, then it is possible. This was already described above and used to illustrate why the signer should always choose random nonces independently for each message that is signed.

We assume that the signer was careful and chose random (or, cryptographically secure pseudorandom) nonces, and did not leak the values. Then, all that the hacker knows is some solutions to (5) for random values of R. By the properties of the hash function, the right hand side of (5) is a random point in the group E effectively independent of h(RPm). So, all the information that the hacker has available is effectively just some solutions to

s.G = P

for some random points P′ and integer s. This is not useful at all, since it is easy for anyone to generate such random solutions simply by generating random values for s and computing P′ from these. Knowing valid signatures produced securely as described above does not provide any useful information to a hacker who wants to sign a different message.


Bitcoin Representation

When using the Schnorr signature method above, we should consider which information is actually required to be stored in a transaction in the blockchain, and then have a strict specification of how these terms are represented as binary strings. In fact, there are only two bits on information which need to be distributed on the blockchain. Namely, the public key P, which is a point on the elliptic curve E, and the signature (R,s), which is a point on the curve together with an integer in the range 0 ≤ s < q.

Let us first consider the representation of a point P on the elliptic curve E. This consists of a pair of integers (x,y) modulo the 256 bit prime number q. So, it can be represented by a pair of 256 bit integers. This is an uncompressed representation, taking up almost twice as much space as is necessary. Using 256 bits for y is a waste of space since, for the given value of x, there are only two possible values,

y = ±√x³ + 7.

Here, the square root is to be understood in the integers modulo q. With a bit of modulo arithmetic, it can be computed using

y = ±(x³ + 7)(q + 1)/4 (mod q).

As these two values of y are of the form y and qy, one will be odd and the other even. So, rather than a full 256 bit representation, it just requires 1 bit to specify which of the two possibilities for y is being used. This is 257 bits in total to represent the point P of the elliptic curve.

Since 257 bits is a rather inconvenient and odd size, in Bitcoin it is assumed that the value of y used is an even number, in order to uniquely specify the point P in terms of its x-coordinate only. This means that the private key must be such that the value of P given by (1) has an even y-coordinate. If not, we simply negate the private key (mod q) to ensure that this is the case. So, in the Bitcoin Schnorr implementation, a public key is a 256 bit (32 byte) string, representing the x-coordinate of the point P.

The signature format used by Bitcoin is a pair (R,s) consisting of a point R on the curve E and a 256 bit number s. As with the public key, it is assumed that the y-coordinate of R is even. It is always possible for the signer to ensure that this is the case, by reversing the sign of the nonce if necessary. Then, as before, only the x-coordinate is represented, which is a 256 bit number. Hence, in Bitcoin, a Schnorr signature is a 512 bit (or 64 byte) string.

It could be thought that restricting the public key P and signature point R to have even y-coordinates breaks some of the assumptions used above, which is true. However, this does not affect security in any significant way since, if anyone was able to break the security using this setup they would, with only a miniscule amount of extra effort, be able to break the security using completely random private key and nonce choices.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s