The highly anticipated Bitcoin upgrade Taproot is locked in and set to be activated in November 2021. This will have various advantages, such as supporting Schnorr signatures and incorporating multiple possible unlocking scripts in an efficient fashion. The aim is to increase scalability, privacy and security, especially with more complex ‘smart’ transactions. These updates are all about how bitcoin utxos (unspent transaction outputs) can be spent, by introducing new ways of parsing the scripts. I previously posted about Bitcoin Script, describing both the legacy and SegWit methods. The current post extends the description to include Taproot.
The first thing to note, is that Taproot is a kind of SegWit implementation. This means that the scriptPubKey in the transaction output takes a standard form, and the witness field of the corresponding transaction input is used to validate the spend. As with other SegWit methods, the scriptSig of the transaction input is not used, so is left empty. The standard Taproot spend method is then as follows.
|P2TR — Pay to Taproot|
The Bitcoin core script recognizes a scriptPubKey of this special form, consisting only of the version number
OP_1 followed by a 256 bit (32 byte) public key, and interprets as a Taproot spend. Then, for a witness field consisting of a single command, this is interpreted as the standard Taproot spend method with the witness field containing just the signature. To be valid, the signature needs to be a valid Schnorr signature using the provided public key and using the spending transaction as the message.
Note that this is very similar to the standard SegWit P2PKH method, with three main differences.
- The scriptPubKey contains the public key itself, rather than a hash of it.
- Schnorr signatures are used, instead of the legacy ECDSA signatures.
- The scriptPubKey has version number 1 instead of 0.
I note that Taproot is a soft fork, since any Bitcoin node which does not include the Taproot upgrade will see the P2TR scriptPubKey as being an ‘anyone can spend’ address. As it would be interpreted as putting the public key on the top of the stack, such spends will still be seen as valid. This avoids a fork of the blockchain where non-upgraded nodes would potentially disagree with upgraded nodes on which is the valid chain. However, non-upgraded nodes would not be completely validating Taproot spends (potentially accepting invalid spends from Taproot addresses), reducing their effectiveness in securing the integrity of the blockchain.
Already, the simple spend method described above has some advantages, due to the use of the Schnorr signature method which can incorporate key aggregation to allow multisig to be used, as well as allowing more efficient batch validation of a collection of transactions in one go. However, the power of Taproot is in it incorporating alternative spend scripts, described below.
An important feature of the Taproot upgrade is the possibility to incorporate any number of alternative payment scripts (or witness scripts) in an efficient way. Regardless of whether or not the payment address has any such scripts associated to it, the scriptPubKey just contains the opcode
OP_1 and a public key as described above. This means that it is always possible to spend the output by simply providing a valid signature as with the standard spend method. Alternatively, one of the possible alternative scripts can be provided by the spending transaction, together with the necessary signature data for this alternative spend method. If this is done, then only the specific spend script being used is transmitted to the network for inclusion in the blockchain. All of the potential but, ultimately, unused scripts never appear in any transactions. This can improve privacy but, importantly, it saves precious blockchain capacity. We can have an arbitrarily large number of alternative scripts which, combined, could take up a lot of space. But, at most one of these is actually used and, most likely, none of them will actually appear on the blockchain.
The idea is as follows. We take all of the witness scripts and combine them into a Merkle tree, represented by a single 256 bit Merkle root (or hash root). This is referred to as a Merkelized Alternative Script Tree (MAST). Here, I just give an outline of the idea and leave the details of the construction of these trees to later in this post. The Merkle root is merged into the public key to produce a new key which is used instead. We can still sign and validate messages with the new key but, alternatively, given the hash root there is a simple verification procedure to validate that the hashroot is indeed associated with this new public key. The format of the scriptPubKey and witness fields for the alternative spend method from a Taproot address is as below.
|P2TR — pay to taproot|
<script> is a serialization of the specific spend script to be used,
<control> is a string containing a Merkle proof for the script, and
<sig_n> are just stack items required by the spend script in question. More details on this control string are given below, so I do not describe it in detail now. When validating the spend, the Bitcoin core code will first check that the control string gives a valid proof that the serialized script is indeed associated to the public key. Next, the items
<sig_n> are pushed onto the stack. Finally,
<script> is deserialized and executed. As with legacy Bitcoin addresses, if the value at the top of the stack is
True then the spend is valid, otherwise it is invalid.
The contents of
script used in the Taproot spend is just Bitcoin Script code as explained in an earlier post. There are however some updates to Script used in Taproot, such as the commands OP_CHECKSIG and OP_CHECKSIGVERIFY now using Schnorr signatures rather than ECDSA. This modified Bitcoin Script is referred to as Tapscript.
Note that the scriptPubKey is exactly the same as for a standard Taproot spend described further above. The Bitcoin code is able to tell the difference between the standard and alternative script spends simply by the number of items in the witness field of the transaction input spending the utxo. Even if we have alternative scripts associated to the public key, it is not necessary to actually use these and, if desired, we can always revert to the standard spend type by providing a valid Schnorr signature. In that case, none of the alternative scripts will appear on the blockchain, and it is not even possible to distinguish the Taproot address from a standard one.
This completes the high-level outline of Taproot. Even though we just have the two spending methods given above, it is actually extremely flexible and opens up many possibilities for Bitcoin. This is in combination with the ability of the Schnorr signature algorithm to combine multiple keys into a single one, so that the standard spend method described above can be used for an ‘n of n‘ multisig account. To demonstrate, I repeat a couple of ideas from the post on key and signature aggregation:
- m of n multisig: Consider 2 of 3 multisig, where any two of Alice, Bob and Carol are required to sign a transaction. There will usually be an expected way in which signing will occur. For example, Alice and Bob will sign but, in the event that either of these cannot be contacted, Carol will fill their place as the second signatory. Then, the standard Taproot spend can be used for an aggregated ‘2 of 2’ public key where Alice and Bob can sign transactions. However, we would also create additional scripts where Alice and Carol, or Bob and Carol, can sign. In the event that Carol is required, one of these alternative scripts would be used. This would mean transmitting the additional script along with the signatures, requiring some additional space on the Blockchain and increasing transaction costs. As long as Alice and Bob are available to sign, the expected 2 of 2 standard Taproot spend is used, which will be efficient and not require transmission of alternative scripts.
- Smart contracts: A smart contract will generally involve multiple parties which I assume to be Alice, Bob and Carol. There will then be a set of conditions, or clauses, under which the different parties are able to spend the coins. For example, maybe Alice and Bob can jointly claim the coins as long they do this within a set time period, otherwise Carol can claim. This could be done by a Bitcoin script, although creating a long script covering all features of the contract would take up a lot of space on the blockchain. The alternative approach, using Taproot, is to split these up into separate scripts with one for each way in which the coins can be spent. Only the script corresponding to the clause in the contract that is being applied needs to be submitted when spending the coins, reducing the transaction size. However, the important point is that, so long as all parties agree to a transaction, they should be able to validate it without any explicit recourse to the contract itself. In reality, the alternative scripts describing the contract only exist to enforce compliance. We therefore lock up the coins using a 3 of 3 aggregated Schnorr signature with the expected method being that all of Alice, Bob and Carol sign when the coins are spent. Only in the event that there is a dispute, or when one of the parties cannot be contacted, do the hardcoded alternative scripts need to be used. This has the advantage that smart contracts on the Bitcoin blockchain are indistinguishable from standard single-signature Taproot accounts, except when there is a dispute or, for whatever reason, the contract needs to be explicitly enforced on-chain.
Merkelized Alternative Script Tree
As mentioned above, Taproot alternative spend scripts associated with a public key are used with a Merkle tree, or Merkelized Alternative Script Tree (MAST). I describe how these work now. I will not give full details on how it is programmed since, for that, you can refer to the bip 341 specification on GitHub.
I start by noting that, originally, the proposition for taproot was to use special scripts which incorporate a Merkle tree so that the full script does not need to be revealed on-chain. This had the acronym MAST, for Merkelized Abstract Syntax Tree. The proposal was, however, simplified to use more standard scripts but, instead, arrange different alternative scripts into a Merkle tree. The name ‘Merkelized Alternative Script Tree’ allows the MAST acronym to be kept. Also, consider the name for the whole upgrade, ‘Taproot’. According to Wikipedia a taproot is “a large, central, and dominant root from which other roots sprout laterally”. Examples include carrots, beetroot, radishes, parsnips, and many other plants. Presumably, the name was chosen for the Bitcoin upgrade since we have the main public key, with the alternative scripts ‘sprouting’ off this.
Now, suppose that we have alternative scripts labelled scriptA through scriptH. These are arranged into a Merkle tree as in figure 1 below.
Each of the leaf node hash values are given by the hash of the associated script, the internal node hashes are given by the hash of the concatenation of its two children. The Bitcoin Taproot implementation uses SHA256 hashes, with some hardcoded prefactors added before taking the hash in order to distinguish leaf nodes from internal nodes (hence, avoiding second preimage attacks).
Suppose that we start with the public key P for our Taproot address. This is modified to create a new public key Q which incorporates both the original key and the Merkle root h.
|Q = P + H(P‖h).G.||(1)|
Here, H represents a hash function (actually, an SHA256 hash with a hardcoded prefactor applied to the argument). I am using the notation of the post on Schnorr signatures, so that the public keys lie in an abelian group E (actually, the sep256k1 elliptic curve) with generator G.
This modified public key Q is used in the transaction output scriptPubKey instead of P. For a standard Taproot spend, the signature algorithm is easily modified to produce signatures valid for this new key. In fact, it is just the same as adding H(P‖h) to the original private key.
Alternatively, suppose that we want to spend using scriptF, for example. As described above, the signature field of the spending transaction input contains,
All that needs explaining is the control string. In fact, this consists of a leaf version byte v, followed by the original public key P and Merkle proof (h100, h11,h0) of the script,
Verifying that the script is indeed valid for this control string is a matter of performing the following steps,
- Use the Merkle proof to compute the Merkle root. As described in the post on Merkle trees, this consists of performing the following computations.
h101 = H(scriptF) h10 = H(h100‖h101) h1 = H(h10‖h11) h = H(h0‖h1)
Here, I am using the symbol H for the hash function on each line although, in practice, different prefactors are applied to the arguments for leaf and internal node hashes. See the specification for BIP 341 on GitHub for a precise description.
- Using the public key Q from the scriptPubKey, the key P from the control string, and the Merkle root h computed in the step above, verify that equality (1) holds.
According to the post on Merkle trees, the binary string 101 should be included as part of the Merkle proof denoting the position of the leaf node in the tree, and telling us in which order the child hashes should be concatenated in the argument of the hash function when applying the Merkle proof. For brevity, this is not included in the Taproot Merkle proof. Instead, it is assumed that the child hashes of each internal node are arranged in increasing lexicographic order so that, for example, h100 ≤ h101. Doing this effectively reorders the leaf nodes of the tree but, as the order in which the alternative scripts are listed is unimportant, this does not matter.
If the validation of the script and control string fails, the spending transaction is invalid. Otherwise, as explained further above, the items
<sig_n> are pushed onto the stack. Finally,
<script> is deserialized and executed. If the value at the top of the stack is
True then the spend is valid, otherwise it is invalid.
This completes my description of MAST Taproot spends. Recall that, under usual circumstances, we would expect a standard spend to be used, which does not require any of this. If the MAST is used, then it has the overhead of including the control string in the witness field of the spending transaction, which is of size 33 + 32n bytes, with n being the depth of the Merkle tree (n = 3 in the example above).