Babbitt Column | Blockchain Essence: Performance Essence

Blockchain essence

Blockchain is a good thing for everyone, but it is very unwilling to see all kinds of blockchain projects blindly launched, resulting in waste of various social resources. The blockchain technology itself is still in the development stage, and there are still many core technical issues to be broken. The nature of the blockchain technology is still full of different understandings. In the next few days, we will share with you the computational nature of the blockchain, technical difficulties, business adjustment, and social impact.

(3) Performance: throughput, acknowledgment delay, and state capacity

I bought a computer, everyone knows how fast it depends on the processor (CPU), how many instructions can be processed per second; it also depends on the memory capacity, which basically determines how many applications we can open at the same time, how many files are opened and No card. The blockchain system is similar. The throughput is similar to the CPU processing speed of a computer, which determines how many transactions can be processed per second. The state capacity is similar to the memory capacity of a computer, which determines how many systems can be carried by the entire system. The status of the user (address) in each application.

There is an additional indicator to confirm the delay, which stems from a very strange phenomenon. The blockchain system can accept a transaction, but if it is not processed or not processed, it will accept the next transaction. At this time you will find that, for example, the processing throughput of 100 transactions per second is not that you send a transaction. After 1/100 seconds, the transaction is processed and confirmed. Usually each transaction takes more time before it is finally confirmed.

Bitcoin performance is very weak, and everyone knows that it processes about 7 transactions per second on average. Each transaction only contains the addition and subtraction of several large integers. If you simply calculate these transactions, you can count millions of transactions per second by taking a notebook. Is that so bitcoin so slow? Let me conclude that this matter cannot be on the consensus algorithm. This is not a PoW problem. The matter of the consensus algorithm can be seen first.

Bitcoin is slow. Some people say that PoW is very difficult. It takes ten minutes to count as a block. Others say that bitcoin requires a lot of computing power. These are actually misunderstandings and reverse the cause and effect. It took only ten minutes to make a block because the agreement was so fixed and it was not allowed to be too fast. If you count more power, the block is out. The bitcoin network's power adjustment algorithm will automatically increase the PoW difficulty, so that the block speed is always maintained in ten minutes. It takes a lot of computing power because everyone has to rush to participate in this consensus process of Bitcoin, because every time you grab an opportunity, you will reward 25 bitcoins. Look at it, today more than 9,000 US dollars.

Therefore, the more and more computing power involved in the Bitcoin consensus, the difficulty of PoW is getting higher and higher, which is what we see today. However, this PoW computing power is only competing for the opportunity to make a block. This computing power is not big enough, and it will not make the speed of bitcoin processing transactions increase slightly. In the Bitcoin system, the reason for the low throughput is because when it was first launched, the communication capability of the underlying Internet in the right time was set to one block every ten minutes, and each block was 1 Mbyte in size. Then each transaction is less than 200 to 300 bytes, and then combined is the throughput of about 7 transactions per second.

Why is the Bitcoin system designed like this? Can't the block spacing be smaller, or is each block bigger? What is it that is restricted by what? the answer is,

Network bandwidth fundamentally limits the throughput of the blockchain

That's why, when I was looking at the project last year, I was working on projects that used hundreds of thousands of TPS. I usually ask a question, how much bandwidth does your system need to run? To have such a high bandwidth, you can only put all the nodes in a computer room and connect them with a LAN. This is not a blockchain. This is called a cloud service…..

As I mentioned in the first part of the book , the blockchain is a process of performing calculations between a loosely distributed node. Then, the calculation process needs to be relayed, which means that each node needs to get the latest context, the latest data and state of this calculation. In the blockchain, the specific approach is that after a node has a block, there must be enough time for most other participants to synchronize to the new block, and then the next block. It is for this reason that throughput is limited by network bandwidth and cannot be too high, otherwise the blockchain network will not be able to achieve consensus consistency.

Then, after all, our current Internet is much faster than it was 10 years ago. Therefore, the speeding up of the Bitcoin network by several tens of times does not require any improvement of the algorithm at all. By changing the size of the block, it is easy to achieve at least several times the throughput improvement by changing the block interval. But if you want to improve further, and also to ensure the extent of its decentralization, there is a big theoretical challenge.

It is important to emphasize here that the degree and performance of going to the center is contradictory. The degree of going to the center requires that as many people as possible can participate in the network, and you can run a node by yourself. Then, the bandwidth requirements can't be too high to match the average bandwidth level of Internet access. Why is DPOS (such as EOS) so high throughput? Because it has no central decentralization and is no different from cloud services, then he can use very high bandwidth, which is thousands of times the bandwidth of ordinary Internet access, so the throughput is of course easily improved.

With the increase in throughput, in addition to bandwidth, the CPU processing power of each node, the disk read and write capabilities will also have higher requirements. Bandwidth is emphasized here, not only because bandwidth is the primary bottleneck, but also because high bandwidth is extremely detrimental to the extent of decentralization, because high-bandwidth access is limited by geographic location, basically only in the data center. The upgrade of the CPU and disk is completely independent of the geographical location.

(4) Performance: State capacity

The state capacity is in the blockchain, just like the memory size of a computer. But this is not like the throughput of TPS is everyone's attention. The state capacity is rarely boasted by various projects because the capacity is not too good to measure, and it is extremely difficult to expand the state capacity.

The state specifically refers to the state of each block (ie, user) and the status of each application in the blockchain. The sum of all the information that needs to be prepared to verify the incoming transaction is the state of the blockchain. Typically, for example, this state contains the account balance for each address. When the application is rich in the chain, each address will have more information to indicate the status of each address in each application.

State capacity refers to how much effective memory space a blockchain system can have to represent the state of the entire chain. The state on the chain must be resident in memory at any time, ready to verify the transaction at any time. This part of the information cannot be placed on the hard disk, otherwise the throughput of transaction verification will be greatly reduced, thus greatly constraining the blockchain as a whole. Throughput. So what is the state capacity?

In the first computational essence, we introduced that in the blockchain network, each participating node needs to be ready to verify and update the next incoming block, that is, each node completely stores each link in the chain. An address, the state of each application. Then the state capacity is essentially limited by the memory capacity of each participating node. In this sense, as long as the memory capacity of each participating node is increased, the effective state capacity of a blockchain network can be increased.

However, the first increase in single-node memory capacity is very limited. The bigger problem is that it increases the entry barrier for participants and seriously damages the decentralization of a blockchain network. For the federated chain system, it is acceptable; but for the public chain system, this elevated state capacity is not desirable. If you want to increase the state capacity in essence, and not increase the memory pressure of a single node, the only way out is full fragmentation, at least state fragmentation. However, this is still a very forward-looking academic research direction. Interested students can read this article in depth: How can the blockchain public chain get up quickly?

Here, by the way, another involves storing information: transaction history. The transaction history is the sum of each confirmed transaction from the beginning of the founding block to the present. This part of the information will continue to accumulate, only increase or decrease. For example, the Bitcoin system, this history has more than 200 GB. This part of the information, once the block and transaction are confirmed and executed, will no longer involve the subsequent block verification and confirmation process, so it can be completely placed on the hard disk without taking up memory space.

Of course, this 200 GB is also a matter, and the hard disk capacity is not unlimited. As far as Bitcoin is concerned, the growth rate of consumer-grade hard drives may not be able to keep up with the growth of its historical transaction records (up to about 50GB per year). But there are already two types of more mature technical problems to solve this challenge. One is the checkpoint technology, which allows the node to discard the old historical transaction records, and the other is the RSA accumulator technology, which allows the transaction records to be stored in a distributed manner, and no longer requires each node to record a complete network-wide transaction record.

Author: Wang Jiaping

About the author: Dr. Wang Jiaping was a researcher at Microsoft Research, focusing on distributed systems, computer graphics and vision, and GPU clusters for machine learning. Dozens of research results were published at ACM SIGGRAPH/ToG International journals, more than ten US patents have been authorized. He studied under Dr. Shen Xiangyang (now Microsoft's global executive vice president) and received his Ph.D. from the Institute of Computing Technology of the Chinese Academy of Sciences. His doctoral thesis won the 2009 National Outstanding Doctoral Thesis Award and was the only winner of the computer science major that year.