Dry goods | Starting from three bottlenecks to solve blockchain scalability issues

Author: Ittai Abraham

Translation & Proofreading: ViloaH & A Jian

Source: Ethereum enthusiasts

If someone asks you directly: how can you extend the state machine replication (blockchain) system?

You should ask: What is the bottleneck of your system? data? consensus? Or does it?

1. Data: Data is the carrier that transmits all instructions to all state machine copies. For example, if a block contains 1MB of instructions, then you need to send 1MB of data to all copies responsible for verification. Obviously, in this case, the channel capacity (bandwidth) of the system is the bottleneck of the system's scalability.

2. Consensus: After the instruction arrives locally, the state machines will participate in the consensus protocol (like the partial synchronization or synchronization protocol discussed here). For example, if a consensus protocol requires two message roundtrips and the state machines participating in the verification are distributed around the world, then the obvious bottleneck here is the delay caused by the speed of light and the size of the earth.

3. Execution: After the instruction arrives and the consensus reaches a consensus on the ordering of the instructions, the copy needs to execute the instruction. The execution engine is a function that accepts the old state and applies instructions to calculate the new state (and calculate the output). As another example, if the execution requires a lot of cryptographic calculations, then it is obvious that the bottleneck here is the cryptographic calculations to be repeatedly performed by the copy.

It should be noted that these three bottlenecks are not the pursuit of a compromise, nor the dilemma, nor the trilemma. They are independent of each other. The scalability of all state machine replication systems is limited by these three factors (and, like the principle of barrels, by the one with the worst conditions). This article will introduce some solutions to these bottlenecks.

1. Improve scalability from data

Better network solutions

For cryptocurrencies such as Bitcoin, the ability to scale throughput depends on reducing latency-because a block mined by a miner needs a certain delay to propagate to all other miners. Systems like FIBRE, Falcon, and bloXroute reduce latency by using dedicated pipelining, and use a forward error correction code to propagate blocks. Another way to improve data scalability is to discover peer nodes and access content via a content addressable network. For details, please refer to Kademlia, which not only inspired Ethereum's RLPx encoding specification, but also was popularized on libp2p.

Migrate data to layer-2

Another idea is that since the bottleneck stems from the need to copy all instructions to all state machines, I will not finish without copying! This is true of solutions like Lightning, Plasma, and other Layer-2 solutions-propagate intermediate commands to a smaller semi-public group to reduce data duplication, and report to the entire system on a regular basis. ). Naturally, the disadvantage of this method is that not copying all the data can cause data availability problems. And security depends on the timely response of at least one honest participant in each semi-public group that owns the data.

2. Improve scalability from consensus

Trade-off between throughput and latency

Some people use the number of transactions per second (TPS) as a measure of protocol scalability. TPS is a measure of throughput, and there is a misunderstanding-that it is possible to achieve consensus scalability by optimizing it alone. Consensus scalability solutions must focus on both throughput and confirmation latency.

Increasing consensus throughput (but increasing latency) through batch processing is simple: it only takes one day instead of every few seconds to allow people to reach a consensus on the hash value of all data being batched. Obviously, since consensus is reached only once a day, costs will be shared, and in terms of throughput alone, the consensus process is no longer a bottleneck that hinders scalability. Obviously, although batch processing can improve the throughput of the consensus protocol, it will also increase the delay of transaction confirmation. It is not a panacea for expanding the performance of the consensus protocol.

The PBFT journal version article fully discusses the latency and throughput of BFT state machine replication.

For Nakamoto Consensus-based protocols, there are many protocols that attempt to increase throughput and latency, such as: Bitcoin-NG, Fruitchains, and Prism.

A trade-off between performance and security

It has been suggested to reach consensus within a smaller copy of the state machine to optimize the performance of the consensus process. Reducing the size of the verification state machine group does improve performance, but it comes at the cost of reduced security. So, the real challenge is not to reduce the number of participating state machines while improving the performance of the consensus process.

Increasing the complexity of the consensus protocol is expected to have both fish and bear's paw, for example: reduce the number of rounds, or change the complexity of message passing, so that the number of messages that grows squarely can become linear. This article discusses some protocol improvements in partial synchronization and protocol improvements in synchronization.

The trade-off between scalability and adaptability

Consensus protocols based on the PBFT view paradigm are vulnerable to adaptive attacks by attackers. The security of the consensus protocol is not only related to the size of the attacker (determined by the total number of state machine replicas), but also to the adaptability of the adversary.

Dealing with protocols of adaptive opponents usually results in higher costs and also encounters greater difficulties in scalability. Algorand suggests using round-based password sampling to extend the Byzantine consensus to protect it from adaptive attackers. The simulation results of this method look very good. Adaptive adversaries can use Denial-of-Serivice attacks to prevent the system from advancing. HoneyBadger proposed the first practical asynchronous BFT protocol-this protocol can guarantee the activity without making any timing assumptions.

Avoid full ordering of all commands

If all instructions are interdependent, there is no alternative but to order all instructions in full. But in many workloads, instructions do not depend on each other and interfere with each other. For example, in some cases, the instructions A pays to B and the instructions C pays to D will not interfere with each other; in this case, we do not need to waste expensive consensus resources to internalize these two instructions Sorting, there is no reason to make it a bottleneck in the system. This approach is used in the epaxos non-Byzantine model (not all sorts at all times). Like Avalanche and other DAG-based protocols, the throughput of consensus is increased by allowing concurrent instructions to be presented without interfering with instructions.


In abstract terms, sharding partitions the state and the set of state machine replicas. Each shard controls a certain part of the state, and the consensus process is completed by a certain part of the overall state machine verification. This of course also requires some cross-shard interaction mechanism. Ethereum's "Sharding FAQ" (Editor's Note: The Chinese translation is at the end of the article) is a very comprehensive resource.

Sharding is a method of parallelizing the three major bottlenecks of processing data, consensus, and execution. The key to parallelizing data and execution is the low state contention of the workload. From a consensus perspective, sharding is essentially a trade-off between performance and security: instead of using all state machine copies to guarantee a state, the sharding technology creates multiple partitions, and each verifier copy will protect itself Partition.

(If the state contention is low) Dividing many partitions can significantly improve performance. However, because there are fewer verification state machines per partition, security is naturally reduced.

To learn about using sharding, see Omniledger and Ethereum 2.0. Ethereum 2.0 plans to combine the low security of each partition with the high security of the global chain. Just like the Layer-2 solution, a low-security shard can periodically upload its own state to a high-security global chain and determine the status update. This is also a trade-off between security and latency-if you want to obtain high security, you have to wait for the global chain to be finalized periodically.

3. Improve scalability from implementation

The separation of consensus and execution is one of the basic architectural designs of the state machine replication system (see Base 20013). The benefits of separation can be found in Yin et al 2003. In a traditional state machine replication system (SMR), commands must not only be replicated and propagated to all replicas, but also executed on all replicas.

In many systems, the bottleneck for scalability is the cost of executing instructions. A major denial-of-service attack section of the SMR system is to issue legitimate commands, which wastes the entire system on execution (for details, see: Example 1 and Example 2). Many systems are designed to avoid attacks by designing a Domain Specific Language. Bitcoin uses a Bitcoin script to carefully limit the computational complexity of each transaction. Ethereum uses the gas mechanism to limit the complexity of execution and uses efficiency to motivate people to use gas.

Parallel execution

Parallelizing the execution of the state machine is also a way to improve execution. This method is effective when most of the commands in the block have no state competition (independent or interchangeable order). Its main idea is to imagine a protocol that executes in parallel without competition and maintains security when there is competition, and uses this protocol to simulate the results of continuous execution. See Eve 2012, Dickerson, Gazzillo, Herlihy, Koskinen 2017 and Saraph and Herlihy 2019 for details.

Not executed within SMR, verified using financial incentives and false proof

(Optimistic rollups type)

In this type of solution, the instructions are submitted into the SMR as data, but execution is not done by verifying a copy of the state machine. The state machine copy serves only as a data availability layer.

Instead of using a copy to execute instructions, use economic incentives-players can become bonds by issuing bonds. Executors who have locked the deposit can submit execution results, while others can report incorrect execution results by submitting erroneous certificates. If this error proves correct, the executor will be punished and the submitter will be partially rewarded. If the challenger lied on false proof, his margin would be severely forfeited.

The agreement to achieve the efficient challenge originated from Feige Kilian 2000, and Canetti, Riva, Rothblum 2011 moved along this path, and eventually evolved into TrueBit Teutsch, Reitwießner 2017 and Buterin's Off-Chain Oracles, which adopted on-chain incentives. Today, this method is further developed in a scheme called optimistic rollups (for details see merged consensus, Adler, Mikerah, Quintyne-Collins, Al-Bassam, Sonnino, Buterin, and LazyLedger).

Not executed in SMR, verified with concise proof

(Zk rollups type)

In this scheme, the instructions are also submitted to the SMR as data, and the execution is also not related to verifying the copy of the state machine. A copy is just a data availability layer as an instruction.

Rather than verifying the execution results with challenge games and erroneous proofs, it is also possible to use concise non-interactive proofs (PCP, Groth 10, Groth 16, Ben-Sasson, Chiesa, Tromer, Virza 2013-2019, and survey). These cryptographic techniques allow verifiers to generate very short proofs, while the verification of these proofs is highly reliable and complete in cryptography. Execution (and proof generation) can only be done by the same entity. With concise proofs, verifying a copy of the state machine only needs to verify concise proofs, without re-executing long transactions. Zexe uses this method to build a nano-kernel-based certification system, so people can implement private transactions in UTXO.

Buterin's article on zk-roll-up and Ben-Sasson's podcast highlight this approach to expanding transaction throughput. Check out Buterin's video for more details on how to add privacy (zero knowledge) to concise proofs (zk zk rollups).

This concise proof has many benefits: the cost of verifying the correctness of the evidence is very low. The disadvantage is that the proof of constructing the instruction execution is usually much higher than the cost of executing the instruction alone. Another disadvantage is that these protocols add a lot of complexity. In addition, some protocols require a verbose trusted initial setup ceremony.

It should be noted that the method described above is intended to overcome the bottleneck of execution scalability, but not to change the bottleneck of data scalability.


We would like to take this opportunity to thank Ling Ren, Kartik Nayak, Alin Tomescu, Pratyush Mishra, Louis Guthmann, and John Adler. Thank them for their informative feedback on this article.