Beam Sync: a new way to synchronize Ethereum nodes

Author: Jason Carver

Translation & Proofreading: Chen Liang & A Jian

Source: Ethereum enthusiasts

Why improve the node synchronization experience?

Whenever I think of how many people are still using Infura (via Metamask, Gnosis-Safe, etc.) to interact with on-chain applications, I feel a bit uncomfortable. Infura's service is great, but if most users don't run their own nodes, it's obviously not quite right. Even very capable and motivated developers cannot completely get rid of their dependence on Infura. From this point of view, we have not completed an important part of Ethereum's "autonomous verification" vision.

Our team wants to do our part to reverse this trend. Our mission is to maximize the number of nodes on the network, especially those run by hobbyists, researchers, and developers. When we asked why they did not run their own nodes, the answer was nothing more than: "I installed the client software and tried to synchronize the blockchain, but it always seemed to fail to synchronize. So I stopped because I have something else to do. "

Therefore, if we want more people to run the node, we need to make the node synchronize faster and be able to feedback progress information during the synchronization process. Many teams work in this area. Dedicated hardware may be an important approach. But in this article, we want to talk about how the "Beam Sync" synchronization method can greatly increase the synchronization speed.

Existing synchronization methods

In order to better understand the principle of Beam Sync, let's take a look at several existing synchronization methods.

Full Sync (full synchronization method)

The full synchronization method is to execute every block since the genesis block. The genesis block marks an initial genesis state (the contents of the state include account balance, contract bytecode, contract storage content, etc.). The so-called "execution block" means that every time a block is downloaded, the previous state is read and a new state is generated (based on the block content), and the new state is used to verify the state root in the block header (to verify the region Block is not a valid block). Full synchronization on the Ethereum mainnet is very slow, and as the network ages, it will take longer and longer to use the full synchronization method to synchronize to the latest block. So people developed a "fast sync" method.

Fast Sync

The fast synchronization method is to download all past blocks and block headers, and select the nearest block as the "start block". The blocks before the start block are skipped, and the block is started before the block is executed. This approach assumes that all EVM rules are followed correctly from the genesis block to the startup block. This assumption is reasonable because miners are motivated to follow the principle of good faith and do no evil, produce normal blocks, and reject blocks that may be offensive.

Before fast synchronization can execute the startup block, the required block states include: contract bytecode, account, and contract storage content. It may be necessary to read any of these values ​​when executing a transaction. Therefore, the fast synchronization method requires a snapshot of the state of the pre-start block from other peers. Snapshots are marked with a state root hash value; the so-called state root hash value is the hash Merkel tree root value of all state content. The node uses this state root hash to verify whether the state data downloaded from other peer nodes matches the state declared by the miner in the block.

After the fast synchronization method downloads all the required states, it means that the node already has all the data needed to execute the transaction. Then at this time, the node can switch to full synchronization mode, and the blocks can be executed one by one from the start block, just like the node that completed the full synchronization process before the start block.

The simplified process looks like the following animation:

Other methods

Other fast synchronization methods include Warp Sync and some synchronization methods that have not yet been validated. In abstract terms, they all belong to different forms of fast synchronization methods. In addition, even understanding the principles of these other synchronization methods will not help understand the Beam synchronization strategy, so these synchronization principles are not the focus of our article, and I will talk about it later.

How fast is the fast sync method?

The fast synchronization method faces some challenges in the current mainnet operating environment, because synchronization needs to download a lot of data, even more than 100GB of data, so it may be in the second step shown in the figure above, "Get All State" It's going to get stuck for a long time.

What's worse is that (in the fast synchronization mode) the peer node will not provide you with status data block by block, but only provide the state for a period of time before starting the block, such as 100 blocks before starting the block (that is, 30 minutes). The default setting of the Geth client is the first 120 blocks.

If you ca n’t download all the status data in 30 minutes ( spoiler warning: you really ca n’t finish it ), you need to do a pivot, that is, change to a new startup block and restart the synchronization, although not from 0 starts, but also increases the time to download and verify blocks.

The Geth client has made remarkable achievements in improving the synchronization speed. Both the fast synchronization and the full synchronization mode have made great progress, and every update of the Geth client will be advanced, but even if you have very perfect computer hardware, the synchronization process is still It takes at least 4 hours. For the first synchronization, this process is indeed slightly difficult.

So, our team is developing a client written in Python called "Trinity". Python is not faster than Go in terms of speed performance. If the performance-centric Geth code doesn't sync as fast as we would like, what opportunities does the Trinity client have? There is every reason to expect that the Trinity client will take a few weeks to perform a quick sync. But it doesn't make sense if the client can't synchronize the mainnet, and it doesn't make sense to spend a few weeks syncing. For this need, we conceived a new synchronization strategy, which we now call: Beam synchronization method.

Beam synchronization method

Overview

The Beam synchronization method is the result of directly improving the fast synchronization method. The difference between the two synchronization methods is that the Beam synchronization method directly executes the startup block at the beginning and requests only the state data that is missing from the local database, and compares the input status and output status Save locally. After executing one block, sync to the next block and repeat the process, requesting missing data as needed.

Over time, there will be less and less data. Note that if a state has never been accessed, the client will never request it (and therefore never get this part of the state data), so we run another process in the background to fill these gaps. Through this backfill process, Beam Sync will eventually get all state data and save it locally, and then the node can switch to the fully synchronized state.

We refer to the data set required to execute each block as "block witness data." Thanks to the structure of the Merkel tree, we don't need to download a certain state in its entirety to prove that the witness data is really taken out of this state.

Block witness data size

For simplicity, we use "block witness data size" to refer to the number of data elements required to execute a block. This type of data element may be a node on the main account status tree, or a node on the contract storage tree, or the complete bytecode of a contract (Translator's Note: "State Tree", The "storage tree" is all Merkel tree structured data).

Analyzing the block witness data size is the key to understanding the performance of Beam's synchronization method. The fast synchronization method must download all state data before executing the first block (that is, the startup block), and Beam synchronization only needs to download the witness data for one block. If the downloaded block witness data contains one-third of the complete state One, then Beam sync will run about three times faster than fast sync.

So, obviously, the next step is to see how big the witness data of the main network is actually. It may be too early to draw conclusions directly, but early experimental results show that the data volume of 3000 state tree nodes is a reasonable estimate (90% confidence). The overall status information of the main network has more than 300 million tree nodes.

Beam sync speed increase

Let's define a new standard: "from launch to execution" time. This is the time from starting the node with an empty database to completing the full import of the most recent block.

If Beam only needs to download 3000 state tree nodes, and fast synchronization needs to download 300 million tree nodes, then we can determine the speed limit of the Beam Sync method: When synchronizing the mainnet, the time from "start to execution" Get up to 100,000 times improvement!

However, the Beam Sync method often cannot really achieve a 100,000-fold improvement. Reasons include, but are not limited to:

  1. The download of state data is not all the work of establishing a full node. For example, we also need to download the block header to verify that the blockchain we are synchronizing with is the longest chain.
  2. Block witness data is determined on-demand, which means that we cannot predict in advance which state data is needed (receive one data to determine what the next required data is). So when we request data from peer nodes, we can only request one state data at a time. In contrast, fast synchronization can request up to 384 tree nodes at a time, which makes Beam synchronization more sensitive to the network latency of peer nodes.
  3. Finding high-quality, low-latency synchronization nodes takes time. To be honest, what kind of peer node will be encountered, that is a true random event.

Unlike the fast synchronization method, the Beam synchronization method will continuously download the block state after the startup block, which will also slow down the block import time. If you have some intuitive understanding of this, then you may notice that if the average time for collecting block witnesses is longer than the average time for block production, the problem will be more serious.

Beam synchronization lag

The time required to obtain the first block witness data must be longer than the time it takes for the network to generate a block (about 15 seconds). Similarly, we can also foresee that it takes several blocks to deliver the witness data. We call this situation "lag", which is about the time interval between the latest imported block and the top block of the chain.

The problem of delay in collecting block witness data may gradually increase, and then you will find that your Beam sync node is delayed by 5 minutes. That is to say, the latest block that you have locally is actually a block generated by the entire network 5 minutes ago, which means that when RPC calls your current node account balance, your node will feedback the balance before 5 minutes.

In the field test, a very common phenomenon is that the lag time varies greatly, ranging from 1 minute to 20 minutes. Fortunately, we have some techniques to make the block synchronization recover from the lag situation. In fact, in general, the more backward, the better the recovery, which leads to great fluctuations in the lag time: first, keep falling , And then quickly catch up and cycle.

One reason we can catch up faster when the blocks are behind is that we can generate witness data for multiple blocks in advance and at the same time. After all, you can only take advantage of these future blocks when you are behind. Of course, if the time required to collect the required block data always exceeds the block generation time, then there is no doubt that you will be more and more behind the growth of the blockchain. We hope this will never happen, but we need Plan for it.

Beam Sync Pivot

Just like fast synchronization, if you are too far behind, peers may be reluctant to provide you with the data you need. The pivoting mechanism is the key to solving this problem.

The pivoting mechanism in Beam synchronization is just like in fast synchronization. Your node selects a series of blocks to be skipped, and selects a new block near the top of the chain. The block header is then started. It doesn't sync completely from scratch, it still has all the data from the last sync.

Whether you use Beam synchronization or fast synchronization, as long as you use the switching mechanism, you need to pay the corresponding cost. The switching mechanism means that you need to download more data. There will also be some of your data that your nodes have not verified their execution in person. Block. The good news is: if you are not behind more than 30 minutes, the Beam synchronization method does not need to activate the switching mechanism. On the contrary, when using the fast synchronization method, you have to switch several times.

OK, let's see what the real situation looks like .

Beam sync on Trinity client  

Prototype announced

A new Alpha version of the Trinity client was announced last week. This version includes a prototype Beam synchronization method that runs on high-end hardware.

We have been testing the synchronization process for the mainnet. The first block can usually be executed in the first hour, and most of the time it can be done in 5 minutes! This does not include the time to download the block headers and the occasional delay due to the lack of good peer nodes.

Note: Increasing the block Gas Limit from 8M to 10M seems to increase the average lag time. After the upgrade in Istanbul, the lag time may be reduced because the gas consumption required to write state data on the blockchain is increased.

Currently Trinity is still the Alpha version. The latest version still has a lot of problems. For example, synchronization may be abruptly suspended after one or two days. Even if there is no suspension, it will lag behind a lot and cause the activation switching mechanism. Installing Trinity client requires extra work, and the command line output is even more messy. Therefore, Trinity is currently only prepared for developers and researchers who are curious and don't mind doing it.

The outstanding problems are basically exposed in daily development, so these problems are "bugs in the background log". At this point, it can be said that no one is worried that Beam Sync will be an unattainable dream. This confidence in Beam sync is brand new. But we also think that there may be sudden unknown problems that need to be solved in the future, just like a month ago!

Remaining work

In addition to the basic debugging and implementation work, we still have a lot of work to do. Trinity has yet to implement a backfill mechanism for states (that is, download old events, transactions, and data before the startup block). Currently the only way to activate the switching mechanism is to restart Trinity, and it is unclear the minimum hardware configuration required for the Beam synchronization method (we welcome everyone to help us collect relevant data).

All of the above issues are under active R & D, and this is just one of many tasks on Trinity. Thanks to the Ethereum Foundation for their sponsorship and support.

What's innovative about Beam Sync?

The Beam synchronization method, like other theories, is based on previous research work. The method of speeding up synchronization by downloading the latest state and skipping the execution of old blocks (fast synchronization method) was not invented by us. The method of relying on witness data instead of complete state to execute the block to speed up execution is also not invented by us. For details, please refer to the stateless client.

The real progress is that we combine the two: first use guided fast synchronization to simulate a stateless client, and then gradually transition to full synchronization mode. We lost one of the advantages of stateless clients, which is low hard disk overhead, but we retained the advantage of fast execution of the last block. By saving state input and state output data locally, we mitigate an important worry about stateless clients, namely the risk of being DOS attacked by a large amount of witness data. The longer the execution time of the Beam synchronization method, the lower the risk of DOS attack.

When we run the Ethereum local client, the Beam synchronous mode can provide better feedback and faster execution results. We think this is an important step for users to happily run the local node.

Thanks to Piper Merriam, Brian Cloutier, Alex Stokes, Voith Mascarenhas, and Noel Maersk.

(Finish)

Original link: https://medium.com/@jason.carver/intro-to-beam-sync-a0fd168be14a

The author of this article uses the CC4.0 copyright agreement to limit the copyright of this article. Just attach the author's signature and maintain the same copyright rules, you can use the copyright of this article freely. Therefore, the copyright of this translation follows the same copyright regulations (the author's signature is guaranteed, and the subsequent use maintains the CC4.0 agreement).