Learn about Utreexo, an improved MIT Bitcoin solution in 5 minutes

Write in front:

Utreexo was proposed by Tadge Dryja, one of the authors of the Lightning Network (LN) paper. It is a hash accumulator scheme suitable for Bitcoin. At present, the scheme is mainly funded by the MIT digital currency program. The author of this article Calvin Kim is one of the active developers of the project. In this article, he briefly explained the principle of Utreexo, and summarized the 4 major advantages of the program and 2 negative effects.

5分钟了解MIT比特币改进方案Utreexo,手机运行全节点不再是梦 (Picture from: tucheng.com)

Let me talk about the advantages first:

  1. It can achieve a full node of several KB, and the synchronization speed of the hard disk drive (hdd) can be as fast as the solid state drive (ssd)
  2. Allow parallel processing of initial block downloads;
  3. Enhance the security of Bitcoin by allowing consensus to be implemented independently of the database;
  4. Bring Utreexo to Bitcoin without forking;

Then there are disadvantages:

  1. The bandwidth usage of the initial block download has increased by about 20%;
  2. Additional storage requirements for Utreexo archive nodes;

How Utreexo works

Quick overview: In Utreexo, a full node can retain only one hash for each block, while the traditional pruned full node must retain all UTXO for each block .

In order to understand Utreexo, we must first understand how hash trees work. What is to be studied here is the Merkle tree, which is also the hash tree used by Bitcoin.

A Merkel tree with 8 cotyledons will look like this:

5分钟了解MIT比特币改进方案Utreexo,手机运行全节点不再是梦

Figure 1: Typical Merkel tree

Each number in the tree represents a hash.

In Bitcoin, all the numbers in the bottom row are the transaction ID (TXID), the connection hash result of 00 and 01 is 08, and 13 is the connection hash result of 10 and 11.

Currently in Bitcoin, Merkel trees are used to generate merkle roots in block headers. Utreexo adopted the Merkel tree concept and applied it to UTXO. It should be noted that Utreexo will not replace the Merkel tree in the block header .

Currently, to run a full Bitcoin node, you must store all existing UTXOs, while in Utreexo, the full node you run only needs to store the root of UTXO. Then the tree will look like this:

5分钟了解MIT比特币改进方案Utreexo,手机运行全节点不再是梦

Figure 2: Utreexo tree storing only root data. Note that all other hashes have been deleted

All other hashes from 00 to 13 are discarded after verification is completed, and only the root 14 is retained.

If the user wants to spend UTXO 07, they must prove to you the existence of the transaction. This will be done by providing ( 06, 07, 10, 12 ). The verification node will then use the received hash to create a separate tree:

5分钟了解MIT比特币改进方案Utreexo,手机运行全节点不再是梦

Figure 3: The tree used for verification, pay attention to how to calculate the root from the tree

The blank areas 11, 13, and 14 can be calculated by verifying the node. If the root of this tree is 14, it matches the root we stored, and we can say that the transaction exists.

Utreexo is a bit more complicated, and it works differently than this example, but this is a simple version of its concept, which explains how to store only one hash (not all hashes), but still a Full node.

Advantages of Utreexo

Advantages 1, can achieve a full node of several kilobytes, and the synchronization speed of the hard disk drive (hdd) can be as fast as the solid state drive (ssd)

At present, there are two types of full nodes: Archival and pruned. In pruned full nodes, users only keep unused transaction output (also known as UTXO). Utreexo allows another full node mode called dense state node (or CSN), in which we only store root information and wallet information. This allows the data of a full node to be less than 1 KB, while the current full node used by Bitcoin, users need to store data measured in GB units.

As bitcoin applications increase, the ability to store only UTXO belonging to users becomes more and more important. Since a user needs at least one UTXO (and for privacy reasons, more UTXO is needed), this will lead to an increase in UTXO. From the table below, we can clearly see the growth experienced by UTXO.

5分钟了解MIT比特币改进方案Utreexo,手机运行全节点不再是梦

Figure 4: UTXO count, exclude OP_RETURN

Therefore, the storage requirements of pruning nodes will increase, thereby increasing the minimum storage requirements of all Bitcoin nodes. Utreexo prevents this by allowing users to trim UTXOs that do not belong to them.

In addition, because the dense state node (or CSN) can represent the entire Utreexo state in a size of less than kilobytes, there is no need to query the disk during the initial block download. This allows the initial block download to occur only on RAM, allowing hdd nodes to synchronize as fast as ssd nodes.

Advantage 2: Allow parallel download of initial blocks

UTXO snapshot means saving all UTXO state at a specific block height. One potential application in this regard is the assumeUTXO project, which allows highly synchronized blockchains from snapshots. The main obstacle to the snapshot is that its size is quite large, currently about 5GB, and its size will increase as the status in "Advantage 1" grows. For Utreexo, in the worst case [1] , less than 1 KB is possible (the best case is about 100 bytes). Using Utreexo to take UTXO snapshots is very simple. The ZKvM project has implemented Utreexo and is using it to save the blockchain state by including the Utreexo root in the block header.

Since the implementation of snapshots has become very economical, it becomes feasible to have a snapshot at each block height. In this way, we can complete the implementation of blockchain synchronization in parallel, which means that one computer (or CPU core) can be synchronized from block height 0 to 300,000, while another computer can be synchronized from block height 300,001 to 600,000. With the optimization of CPU and the rise of GPGPU, this asynchronous block synchronization will help to further reduce the time required to start a Bitcoin full node.

Advantage 3, Utreexo can enhance the security of Bitcoin (allowing the separation of the consensus code from the database)

The libconsensus project aims to separate the consensus code from Bitcoin Core in order to:

  1. Non-consensus code can be changed without worrying about breaking consensus;
  2. Allow a consensus API to be used in different Bitcoin implementations;

But because it is difficult to separate the database (leveldb) from the consensus code, it was eventually abandoned.

This is a very important issue, because in 2013, Bitcoin Core moved from Berkeley DB to levelDB and encountered an unexpected temporary hard fork, an unexpected temporary soft fork, and a hard fork (BIP50).

Currently, Bitcoin's consensus relies on the normal operation of levelDB, which means that if levelDB does not work properly, there may be a fork using another database.

After using Utreexo, you can verify the incoming transaction or block based on the Utreexo tree without the database. This checks the existence of the UTXO that the incoming transaction is spending by using the attached certificate.

Advantage 4, deployment does not require a fork

In terms of reducing the size of the blockchain, the RSA accumulator proposed by Boneh et al. Is indeed more effective than Utreexo. However, a soft fork must be used to implement this scheme. For a conservative system like Bitcoin, even a soft fork needs to be very careful. Therefore, these types of accumulators are difficult to apply to Bitcoin. The deployment of Utreexo does not require any forks. Users only need to use Utreexo by running Utreexo nodes .

Disadvantages of Utreexo

Disadvantage 1: bandwidth requirements will increase by an additional 20%

Assuming that someone lives in a very remote and very small area, and they use a powerful computer to synchronize a Bitcoin node, then Utreexo is not a help but a harm. The aforementioned proof must be sent with the TXO, which results in about 20% more data being downloaded from the peer node.

In this sense, Utreexo can be seen as a trade-off between bandwidth and storage requirements. If you think that storage price (hdd, ssd) is a bigger obstacle than Internet speed (and cost), then Utreexo can help decentralization. And if you think that Internet speed is a bigger obstacle, then Utreexo will hurt decentralization.

Disadvantage 2: Additional storage requirements for Utreexo archive nodes

The so-called Utreexo archive node refers to the existing Bitcoin archive node that stores the above-mentioned proofs required by the Utreexo node.

Such nodes will store:

  1. All blocks starting from the founding block;
  2. All proofs starting from the founding block;

Because of # 2, this will bring additional storage burden to the archive node. If a Utreexo archive node stores all proofs for each block, this will be approximately 100% additional data storage.

However, this can be improved by not storing proofs for each block. For example, it is possible to store proofs for odd blocks, and if a node requests proofs from even blocks, a "resync" is performed. For example, if proof of block 566 is requested, the Utreexo archive node will:

  1. Retrieve block 566;
  2. Retrieve the Utreexo tree formed at block 565;
  3. Apply the transaction from block 566 to the Utreexo tree, and then regenerate the proof;
  4. Send the generated certificate to the node requesting it;

Then, this can be further optimized, such as storing proofs every 10 blocks, and so on, thereby further reducing the storage burden.

Finally, users can make trade-offs between CPU usage and storage. If users have access to cheap data storage, they can choose to store all proofs for all blocks. If the user's storage space is limited, but it has idle CPU time, you can choose to reduce the amount of storage and perform more calculations.

in conclusion

In summary, Utreexo, like other solutions, represents a trade-off. I believe that Utreexo will help users achieve a higher degree of decentralization by allowing users to choose the most suitable compromise.

Developers are currently actively developing Utreexo (github.com/mit-dci/utreexo), for any contribution, we would appreciate it 🙂

Many thanks to Tadge Dryja, Ruben Somsen, Paul Grau and Janus Troelsen for reviewing this article.

footnote:

1. In Utreexo, you sometimes have to keep multiple roots, and the number of roots reserved for different block heights is different. The Utreexo paper and the MIT Bitcoin Expo Utreexo introduction video have more detailed explanations on this issue. ↵