The voters of Baihuacun Block Chain Mountain are super fun!

There is a mountain next to Baihua Village called the block chain mountain, which is owned by the villagers collectively. Company A outside the village is preparing to develop tourism resources for the block chain mountain. Company A and the villagers committee jointly established Baihua Tourism Development Co., Ltd. and signed a joint-stock cooperation agreement. The following is a dialogue between the villagers Li Da and Liu Wu during the Spring Festival holiday:

Li Da: Regarding the tourism development block chain mountain, the village committee and the company A signed the contract.

Liu Wu: What are the benefits of us?

Li Da: We are all shareholders of Blockchain Travel Co., Ltd.

Liu Wu: What kind of work do shareholders have to do?

Li Da: As for the important decision on the development of the blockchain, shareholders must vote.

Liu Wu: That can't be done. I have to go out to work after the Spring Festival, and it is not necessarily where. How can I come back to vote?

Li Da: It doesn't matter. We can choose a few representatives, such as Teacher Wang. He will stay in the village primary school to teach, will not go, and people are reliable and creditworthy.

Liu Wu: I also elected Teacher Wang to vote on our major resolutions.

……

In the above dialogue, there is a consensus that in the blockchain world, we call it the Entrusted Person Proof Mechanism (DPoS). It is important to understand the consensus mechanism because it solves the problem of how the blockchain achieves consistency in a distributed scenario and is the key to ensuring the continuous operation of the blockchain system.

From PBFT to PoW, to PoS, to DPoS, what do they represent? What kind of application scenarios are applicable? What are the advantages and disadvantages? This article tells you all about it.

This article is long, mainly contains the following 8 key content, recommended collection!

1. Consistency of distributed systems

2, Paxos algorithm

3, Raft algorithm

4, PBFT algorithm

5, proof of workload – PoW

6. Equity Equity Proof – PoS

7. Entrusted Equity Proof – DPoS

8. Sociological discussion of consensus algorithms

Distributed system consistency

The so-called consistency means that the data is complete and needs to be synchronized.

For example, we placed an order online and paid for it . The mall system will record the data. After that, no matter where we access our orders, we can see the same result. Even if the mall system fails, it is generally Will return an error instead of giving us a different result.

Another example is the bank. We have saved a sum of money . No matter where we go, the amount will not change when we check the bank account. That is to say, in these systems, the results of the data are always consistent and synchronized. Even if there are more than one server in these systems, they all belong to the same central cluster, and they can be synchronized efficiently and consistently.

The blockchain system is essentially a distributed application. The primary problem with distributed systems is how to solve the problem of consistency, that is, how to reach consensus among multiple independent nodes. It should be noted that what is said here is to reach an agreement, but does not say that the guarantee must be correct. For example, all nodes fail to achieve a state of failure. To put it bluntly, it is to be a concerted person. If it is a centralized scenario, there is almost no problem in reaching agreement, but the distributed environment is not so ideal. For example, the communication between nodes may be unreliable, there will be delay, there will be a failure, and even the node will be directly down.

And between a myriad of nodes, if you use synchronous methods to call, it is equal to losing the advantages of distribution, because the absolute ideal consistency, the cost is usually very large. Such a model usually requires no failure, and there is no time difference in communication between all nodes. This is almost equivalent to a computer. Such strong consistency often means weak performance.

Historical experience has shown that consistency issues in multiprocessors and distributed systems are very difficult to resolve. The difficulty lies in the following aspects.

  • The distributed system itself may fail.
  • Communication between distributed systems can be faulty or have huge delays.
  • Distributed systems can run at very different speeds, with some systems running fast and others slow.

However, consistency is an issue that must be considered when designing a distributed system. Consistency issues have a long history and are notorious. Traditional methods for dealing with consistency problems include two-stage commit, token ring, timestamp, etc., and should be heard by computer professional readers. This article will focus on consistency issues and algorithms related to blockchain.

1, consistency issues

We use state machines to explain the problem of consistency. A state machine is a mathematical model that represents a finite number of states and behaviors such as transitions and actions between them. A state can refer to a data variable of a system at a certain time. It has the following characteristics :

  • The total number of states is limited;
  • At any one moment, only in one state;
  • Under certain conditions, it will change from one state to another.

For example, the inventory status at a certain moment in the invoicing system, the account status of the banking system at a certain time, etc., for a distributed system, refers to the state of data owned by each node. Suppose we have n machines, machines in different locations work together through the network, and the initial state of all machines is exactly the same. Give them a set of identical instructions, we want to see the same output, and we want to see the same change in state. For example, the state of machine A is from state A to B to C, and if the state of machine B is from A to C, this situation is inconsistent.

In summary, consistency requires that each node in a distributed system produce the same result or have the same state, which looks like a machine, provided that there is no central server as a dispatcher, which is distributed over the Internet, not The difficulty of a distributed system in the same equipment room that does not belong to the same manager is very large. For the sake of system availability, we generally want the following capabilities for distributed systems.

  • A distributed system as a logical whole should not return incorrect results.
  • As long as most of the machines in the system work properly, the entire distributed system can operate effectively, which is an advantage of distributed system applications against single points of failure.
  • The performance of the system can be scaled out. For distributed systems, the bucket principle does not work.
  • Distributed systems must be asynchronous, meaning that each node can work independently at its own pace, without a full sequence of time.

To meet these requirements, it is not easy! From life, we can also find that even if there is a unified command and command, it may not be completely uniform, let alone such a commander! In the scene of the Internet, we can't control the state of any node, such as bitcoin nodes, who can control those nodes in the network! It may be closed, disconnected, or even a maliciously disguised node. Everything seems to have no solution.

However, in fact, many times our requirements for consistency are not so urgent. Under certain constraints, the so-called ultimate consistency can be achieved, that is to say, the system reaches a consistent state at a certain moment. This node is now disconnected from the network, no problem, wait for recovery, then follow up, synchronize other data through other nodes; that node is down, no problem, resume and follow. As long as most of the nodes in the entire network are working properly, the entire system can always achieve the same data state at some time in the future.

2, two principles: FLP and CAP

FLP theorem

It is called FLP because the paper that proposed the theorem was published in 1985 by the three authors Fischer, Lynch, and Patterson, taking the first letter of each name as the theorem name.

Look at the definition: In a minimized asynchronous model system with reliable network and no node failure (even if there is one), there is no deterministic algorithm that can solve the consistency problem. Under the premise of this principle, it also tells people: Don't waste time designing an algorithm for asynchronous distributed systems that can achieve consensus in any scenario. In the case of allowing node failure, purely asynchronous systems cannot ensure consistency for a limited time. Completed inside.

This is actually very well understood. For example, three people answer questions in different rooms. Although three people can communicate with each other by telephone, there are often people who open the gap from time to time. For example, Alice and Bob both answer a question, Lily receives When the answer to both is answered, then the game is gone, and the reply is forgotten, the three people will never get a final and consistent response within a limited time. This theorem theoretically proves that this road is not feasible, which saves the research time of the latecomers.

CAP theorem

The CAP theorem was first conjectured by Eric Brewer at a seminar organized by ACM in 2000, and was later proved by Lynch et al. Let us first look at the definition: distributed computing systems can not ensure consistency, availability and partition fault tolerance at the same time, these three can not have both. By definition we can know that this is a typical impossible triangle. So what do these three terms mean? The meaning is as follows.

  • Consistency : All nodes can see the same data at the same time, that is, "strong consistency."
  • Availability : Ensure that each request receives a response that determines its success and is for a limited time.
  • Partition tolerance : The system partition caused by network failure does not affect the normal operation of the system. For example, Module 1 and Module 2 cannot be used, but No. 3 and No. 4 can still provide services.

The intuitive argument is simple : if the network is split in half, I send 10 coins to A in half of the network and 10 coins to B in the other half of the network, then the system is not available because A transaction or all two will not be processed, or the system will become inconsistent because half of the network will complete the first transaction and the other half will complete the second transaction.

Since it can't be satisfied at the same time, if you weaken the support for a feature?

Weak consistency

For example, after the software is upgraded to a new version, other talents are updated successfully after a while; after the website updates the content, the browser also refreshes the content after the refresh. Many times there is no high requirement for strong consistency in real time. There are also examples in life : if things are not so urgent, they will send a short message or send an email, and wait for the other party to see it again; if it is urgent, Just call directly, so the phone is a strong connection, but many friends certainly do not like to call in from time to time.

Weaken availability

In some cases, it is very sensitive to consistency. For example, a bank teller machine will refuse service if the system fails, and if it is a control system on the plane, if it cannot be processed in real time at this time, it is terrible. Friends who are familiar with computer use know that many server operating systems do not have a graphical interface. They can only be handled by the command line. Because the server needs to improve performance and maintain reliability, it will try to avoid loading unnecessary modules. An example of sacrificing usability.

Weakened partition fault tolerance

For distributed systems, partition tolerance is inevitable. Fortunately, the probability of this happening is not very large. If a large-scale service is really unavailable, no matter what kind of system is not working properly.

The design of a computer system is sometimes very similar to the scene in life, and you have to make some compromises to ensure that you most want the results. In the blockchain system, especially the public chain system, various consensus algorithms are used, and the priority is to ensure the fault tolerance of the entire system, which is also one of the purposes of designing a distributed or decentralized structure.

3. Byzantine General

The classic description of the Byzantine general question is:

 

The Byzantine army was composed of small units, each of which was commanded by a general, who planned a series of actions through the commanders. Some generals are traitors who deliberately prevent the loyal generals from reaching a consensus plan. The goal of this problem is to make the loyal generals agree on a plan, even if the betrayed generals have been tempting them to adopt bad plans.

It has been proved that if the betrayed general exceeds one third of the total number of generals, it is impossible to achieve the above goal. It is important to note that the issue of General Byzantine and the issues of the two armies should be distinguished. The model of the two military issues is simpler than the Byzantine general, and the premise scenarios are different. Let us look at a schematic diagram:

As shown in the above figure, in this problem model, it is assumed that there are two opposing troops (one is the A army and the other is the B army), which is also the so-called two armies. The A army was separated into two parts by the B army, namely the left A army and the right A army. In terms of combat effectiveness, the two parts of the A-Army must simultaneously attack together to defeat the B-Army. This requires that the left and right teams of the A-Army must negotiate the offensive time and other offensive agreements. Negotiation means communication. Keep the consistency of the offensive instructions by communicating with each other.

Then the problem is coming. The A-Army on both sides must communicate with each other, and they must pass through the area of ​​the B-Army. It is difficult to ensure that the communication is smooth. Both sides must constantly send a receipt to confirm whether the other party has received the message. Obviously, from In theory, any receipt can not really confirm that the message reception of the two parties is consistent.

For example, the left A army sent a message to the right A army, the right A army received and sent a confirmation receipt to the left A army, but the confirmation receipt was blocked by the B army, at this time the left A army could not know that the right A army received the message. No, even if the right A army's receipt successfully reached the left A army, but if there is no left A army's receipt (the left A army's receipt may also be blocked by the B army), the right A army can not confirm the left A army received a receipt. No. According to this confirmation mode, as long as there is the interception of the B-Army, the A-Army on both sides can't theoretically guarantee that a consistent message can be confirmed.

We can see that the key point of the two military issues is that the channel transmission between the two points is not reliable . Most of the network communication software we use every day (payment, chat, email, etc.) will face such a problem. The communication process takes place on the Internet. No one can guarantee that the "B Army" passing through is reliable. Generally, only It is also a last resort compromise to be able to confirm the arrival of a message with a limited number of two-way receipts.

It is worth noting that in this problem model, it is not assumed that there is a vandal in the middle, that is, in the communication process between the two armies, regardless of the possibility that one party may be rebellious. Returning to the question of General Byzantine, the main issue to consider is the question of how to maintain correct consistency in the case of parties (not necessarily the two armies, or more than the army) who have intentional vandals or traitors.

The complexity of the Byzantine general problem can be expressed in terms of computer fault tolerance.

  • Byzantine fault tolerance : This is the most difficult situation to deal with, meaning that there is a node that does not follow the program logic, and calls to it return arbitrary or confusing results. To solve a Byzantine fault, a synchronous network is required, and the faulty node must be less than 1/3. Usually, this situation is only considered in certain specific areas, and the fault is eliminated by high redundancy.
  • Crash Fault Tolerance : It has one more limit than Byzantine fault tolerance. That is, the node always executes according to the program logic, and the result is correct, but does not guarantee the time when the message is returned. The reason for not being able to return a message in time may be a reboot after a node crash, a network outage, or a high latency in an asynchronous network.
  • Missing fault tolerance : It has a limit more than the fault tolerance, that is, it must be "not forgetful." Non-forgetfulness means that the state can be completely saved on the persistent storage before the node crashes. After startup, it can continue to execute and communicate according to the previous state. For example, the most basic version of Paxos requires that the node must record the number of the vote into the persistent storage. Once it crashes, it must continue to remember the previous voting number after the repair.
  • Crash Stops Fault Tolerance : It has more than one fault tolerance requirement to stop responding after a failure occurs. Simply put, in the event of a failure, the node will no longer interact with other nodes, as its name suggests, crashing and stopping.

4. The purpose of the consensus algorithm

In the case of a faulty process and the possibility of a network partition, the FLP theorem blocks the possibility of suggesting a solution under the traditional computer algorithm architecture. Computer scientists think, if we relax the setting of the FLP theorem, is there a solution?

Inspired by sociology and game theory, scientists have attempted to introduce the following mechanisms.

  • Incentive . For example, rewarding loyal generals in the case of General Byzantine. When the betrayed generals found that there was no gain in betrayal, did they still have the motive for betrayal? Here we introduce the concept of game theory: we no longer divide nodes or generals into fair/malicious (loyalty/betrayal), and believe that the behavior of each node is determined by the incentive mechanism. Just as the topic of ardent debates among Chinese philosophers two thousand years ago: at the beginning of the human race, the nature is good, the sex is evil? We believe that at the beginning of human beings, sex is not good or evil. Sexual good and evil are determined by the inspirational mechanism of the day after tomorrow. If the incentives are set up properly, considering that each node has a tendency to maximize its own interests, most nodes will follow the rules and become fair nodes.
  • Randomness . In the case of General Byzantine, the decision to take the next step requires the generals to coordinate and determine a unified next step. In the presence of betrayal generals, the judgment of a loyal general may be misled. In a traditional centralized system, decisions are made by authoritative generals. In a decentralized system, the researchers came up with the idea of ​​being able to randomly assign a general to make decisions in all the generals. This somewhat whimsical idea opens a door to solving the problem of General Byzantine.

According to what rules do you specify the generals who make the decision? Corresponding to the financial system, it is how to decide who has the right to write.

  • It is determined according to the computing power of each node (general) . Whoever has the strong computing power and unlocks a certain puzzle can get the bookkeeping right (in the case of General Byzantine command). This is the PoW consensus protocol used in Bitcoin.
  • It is determined according to the resources (stakes) that each node (general) has . The resources used cannot be monopolized, and whoever invests more resources can get the billing rights. This is the PoS Consensus Agreement.

For the above considerations, scientists introduced a consensus algorithm to try to solve the problem of General Byzantine. The distributed consensus protocol has the following two attributes:

  • If all the fair nodes reach a consensus, the consensus process is terminated;
  • The final consensus must be fair.

Let's talk about the scope of application of the consensus algorithm. The blockchain is generally organized in the following three ways.

  • Private chain : A closed ecological storage network where all nodes are trusted, such as most companies within a large group.
  • Alliance chain : semi-closed ecological trading network, there are peers with untrusted nodes, such as industry internal companies A, B, C.
  • Public chain : An open and ecological trading network where everyone can participate in transactions without any restrictions or qualifications.

Since the private chain is a closed ecological storage network, the traditional distributed consistency model should be optimal; due to the semi-closed and semi-open nature of the alliance industry chain, the use of Delegated Proof of ××× is optimal; Chain, PoW should be the best choice.

List of common consensus algorithms:

 

Paxos algorithm

First, the Paxos algorithm solves the problem of non-Byzantine generals , that is to say only the nodes in the distributed system are faulty, but there are no scenarios of malicious nodes, and how to reach a consensus in this case.

In 1998, Lamport proposed the Paxos algorithm, and subsequently added several improved versions of Paxos to form the Paxos protocol family. A common feature of the Paxos protocol family is that it is not easy to implement. Google's distributed lock system, Chubby, has encountered many pits as a Paxos implementation.

In addition to the classic Paxos (aka Basic Paxos), the following are variants of Paxos, based on the CAP law, focusing on different directions.

  • Cheap Paxos
  • Egalitarian Paxos
  • Fast Paxos
  • Multi-Paxos
  • Byzanetine Paxos

The Paxos algorithm is too obscure, and the Paxos algorithm branch listed above is not covered in detail. If you want to understand the original description of the classic Paxos algorithm, you can take a look at Lamport's paper "Paxos Made Simple", in which the following roles are used:

Seeing these roles, does it feel like the modern parliamentary system? Paxos is such a model. Of course, these so-called "people" in computers generally refer to nodes. These roles can be different service nodes or the same service node . After the proposal is issued, it is necessary to obtain the majority of voting support. When more than half of the support is given, half of the results are sent to everyone for confirmation, which means that Paxos can guarantee that the system will reach consensus when more than half of the normal nodes exist. The proposal process can also be divided into different scenarios as follows:

  • Single sponsor + multiple recipients : In this case, consistency is easy to achieve, or it can be achieved, because there is only one proposal that either reaches, veto or fails. However, in this case, if the only proponent fails, the entire system will be invalid.
  • Multiple proposers + single recipients : In this case, it is easy to reach a consensus. For the receiver, choose one as the resolution. Of course, this case also belongs to the single point failure structure.
  • Multiple sponsors + multiple recipients

Many-to-many situations, first of all, avoid single point of failure, but the problem has become complicated . Since there are multiple proposals and receivers, which one is the standard? There is no particularly mysterious way. Since multiple problems are not solved together, it is still necessary to go back to a single proponent. Just add a rule to select a single proposal. There are roughly two options as follows:

  • Close to the first case, that is, I want to find a proposal to come out, agree that only one proposal can be allowed to pass within a certain period of time, and some competition rules can be set or arranged according to a time series. In short, Pick a sponsor.
  • Close to the second case, multiple sponsors are allowed, but when the node receives multiple proposals, a proposal is selected by a certain rule, that is, it still keeps only one copy, and the rules can have various kinds, such as According to the serial number of the proposal or according to the time of the proposal.

In fact, in the network, like Bitcoin, it must be a many-to-many situation. There are more than one node that sends the transfer transaction. There are more than one miner. Of course, there are more than one node that receives the block for verification. In Paxos, In order to solve such a problem, a scheme called "two-phase commit" has been introduced.

The so-called two-stage is the two stages of “preparation” and “submission” : the preparation stage solves the question of which proposal to vote for, and the submission stage resolves the problem of confirming the final value. In the above process, there may be new proposals, so it is similar to Bitcoin, separating the time, for example, every 10 minutes, and the packager can only have one.

In the submission phase, if a proponent receives a reply from most nodes in the preparation phase, a confirmation message will be sent. If most of the responses are received again, the original proposal number and content will be maintained; if the received message has The updated proposal is replaced with the updated proposal content; if most of the responses are not received, the request is sent again, waiting for confirmation from other nodes. When the recipient finds that the proposal number is consistent with what he currently holds, the proposal is confirmed.

As far as personal understanding is concerned, if the practice is in a relatively private environment or the network environment is better, the effect will be more obvious. In fact, the so-called majority response is received, which is also an evaluation of the node itself. Because the node does not have a better way to judge, it is not the most, especially if the total number of nodes is not fixed.

Raft algorithm

Since Paxos is too difficult to understand and too difficult to implement, the Raft algorithm came into being. The goal is to be as simple as possible, without sacrificing reliability. Diego Ongaro and John Ousterhout of Stanford University redesigned a distributed consistency algorithm Raft with an easy to understand goal and released it publicly at the end of 2013. Raft clearly defines the details of each link in the algorithm, and also considers the simplicity and completeness of the entire algorithm.

Compared with Paxos, Raft is more suitable for learning and engineering implementation. Below, the author will describe this process in an easy-to-understand way.

One village manager of Baihua Village is responsible for external affairs. For example, official documents at the county and township levels, public grain collection, work assignments, taxes, etc.

Raft is a strong Leader's consensus protocol . We imagine that Baihua Village is a server cluster, and the leader of this cluster is the village head. Each household in the village corresponds to a server, and each household saves a copy of the data. All copies of the data must be consistent. That is, when the superior officials go to the village to inspect, the information obtained from each household should be the same.

The village head of Baihua Village was elected by the villagers. Whoever has more votes (simple majority) will be elected as the village head. The village head has a term term. The term of office is always increasing upwards: 1, 2, 3, ⋯, n, n +1, ⋯.

What is going to be handled here is the case of a split vote. In the case of a flat ticket, the village chief election failed, and each household was assigned a different sleep value. Villagers during sleep cannot initiate elections, but can vote. And only the right to vote, but not the right to vote. The first village to go out of sleep to initiate a new term of election.

Since each household has a different length of sleep period, this guarantees that the election will definitely elect a village head, and will not be deadlocked, and there will be no case of a vote for each election. Once the village head is produced, any “writing” (such as government policy declarations, legal education) for Baihua Village must pass through the village head.

The village head has to make a circle every day in the village so that everyone can see it. It shows that the village head is healthy enough to handle official duties.

After the village chief is elected, it is necessary to prevent the village chief from "failure" and must periodically check whether the village head is invalid. Once the village chief is found to have "failure", it will be re-elected.

The village chief receives the superior order, the data of the order is uncommitted, and then the village head concurrently sends a command to all the villagers, copying the data and waiting to receive the response, ensuring that at least half of the villagers receive the data and then go to the superior. Confirm that the data has been received (command executed). Once the data is sent to the superior to receive the Ack response, it indicates that the data status has entered "committed" at this time, and the village chief sends a notification to the villager to inform that the data status has been submitted (that is, the command has been executed).

Let's test various anomalies.

  • Abnormal situation 1 : Before the superior order arrived, the village chief hanged. This is very simple, re-elect the village head. The superior commands and requests from outside will automatically expire and they will resend commands and requests.
  • Abnormal situation 2 : The village head received the order from the superior, and had not yet had time to communicate to the villagers. This is similar to the abnormal situation 1, and the village head is re-elected. Superior commands and requests from outside will automatically expire. They will resend commands and requests.
  • Abnormal situation 3 : The village chief received the order from the superior, and it was communicated to each village, but the villagers had not executed the order and the village chief hanged. In this abnormal situation, the village head is re-elected. After the new village chief is elected, they can wait for the villagers to execute the order (that is, the Commit data) because they have received the order. Superior commands and requests from outside will automatically expire. It is possible that they will resend commands and requests. Raft requires external requests to automatically remove duplicates.
  • Abnormal situation 4 : The village chief received the order from the superior, and it was communicated to each village. The villagers executed the order, but the village chief did not receive the notice. At this time, the village chief hangs. This situation is similar to the previous situation. After the new village chief is elected, he can wait for the notice and complete the remaining tasks. The external will also receive a notification command (completed).
  • Abnormal situation 5 : During the execution of the order, the village head is unwell and cannot handle official duties. Because Baihua Village did not receive the “heartbeat” of the village head, the villagers in Baihua Village will automatically elect (current term +1) as the village head. Two village chiefs appeared at this time. At this time, the new village head will take over the role of the old village chief and continue to execute orders. Even if the original village chief recovers, it will become an ordinary villager.

PBFT algorithm

In 1999, Castro and Liskov proposed PBFT (Practical Byzantine Fault Tolerance), which is the first widely used BFT algorithm. In the PBFT algorithm, at most, the Byzantine node that does not exceed 1/3 of the total number of nodes in the system can be tolerated, that is, if more than 2/3 of the nodes are normal, the entire system can work normally. Early Byzantine fault-tolerant algorithms or assumptions based on synchronous systems, or because performance is too low to operate in real systems.

The PBFT algorithm solves the problem that the original Byzantine fault-tolerant algorithm is not efficient, and reduces the complexity of the algorithm from exponential to polynomial, making the Byzantine fault-tolerant algorithm feasible in practical system applications. Perhaps because of efficiency considerations, the central bank's blockchain digital ticket trading platform uses the optimized PBFT algorithm. Tencent's blockchain also uses PBFT.

In the PBFT algorithm, each copy has three states: pre-prepare, prepared, and committed. There are also three types of messages: pre-prepare, prepare, and committed. When the pre-prepare message is received and accepted, it enters the prepared state. Upon receipt of the commit message and acceptance, it enters the Committed state. The following example shows a 4-node/copy. Within this network, only one Byzantine node is allowed (where f=1 is set).

Baihuacun Primary School held a 100m race, and the first group of the third grade had only four players: Alice, Bob, Cathy and David (A, B, C, D). In order to save money, the game did not ask the referee, but randomly selected one of the four players to make a referee, assuming Alice. As everyone knows, the password for running 100 meters is: "Everyone, prepare, run!"

Here, "everyone" is a pre-prepare message. When the player accepts the command, he will step on the running gear. When this action is seen by other players, the player will be considered to be in the prepared state. Equivalent to sending a prepare message to other players. In the same way, the preparation is the prepare message. The player accepts that both hands are propped up and the body is arched. When this action is seen by other players, the player is considered to be in the committed state.

  • Suppose A is fair . Alice got the teacher's instructions and the first group in the third grade was preparing for the competition. Alice yelled: "Everyone!" The teacher's gesture is equivalent to an external message request. Alice receives this message and writes a message to the message, for example, number 030101. Must be numbered, because the game has a rule (imaginary), four consecutive starts fail, the entire group is eliminated. After receiving the password, students B, C, and D will step on the running gear if they think the order is correct (except for the Byzantine).

And this action is equivalent to broadcasting a prepare message to each other. Players A, B, C, and D see each other's actions. If more than f people (because f=1, so at least 2 people) have the same status as they should, they think that everyone is entering. status. The player will record the pre-prepare and the received prepare information he received.

  • Suppose A is fair . Alice sees at least 2 people entering the prepare state, and Alice then yells: "Prepare, run!".
  • What happened next is similar to the previous one : After the B, C, and D students receive the password (equivalent to receiving the commit message), if they think that the order is correct, they both hold up their hands and are arched (except for the Byzantine). And this action is equivalent to broadcasting a commit message to each other.
  • Players A, B, C, and D see each other's actions. If more than f individuals (since f=1, so at least 2 people) are in the same state as they should, they are considered to be committed. status. When everyone confirms that they are in the Committed state, they can start running!

  • Suppose A is unfair . A will be replaced and a player B will be re-elected. At this time, all players recorded their status and accepted/sent information. Those who have been Committed before the change will start broadcasting the commit message. If more than f people (because f=1, so at least 2 people) have the same status as they should, they think that everyone is entering. Committed state. For the player who is in the prepared and pre-prepare state before the change, the previous command and status are completely abolished and restarted.
  • The PBFT algorithm has the following three main advantages:

     

    • The PBFT algorithm consensus nodes are composed of the participants or supervisors of the service, and the security and stability are guaranteed by the business related parties.
    • The consensus delay is about 2 to 5 seconds, which basically meets the requirements of commercial real-time processing.
    • The consensus is highly efficient and can meet the demand for high-frequency trading volume.

    Because it is very suitable for the application scenario of the alliance chain, PBFT and its improved algorithm become the most commonly used alliance chain consensus algorithm. The improvements are mainly focused on:

    • Modify the requirements of the underlying network topology, using a P2P network;
    • The number of nodes can be dynamically adjusted;
    • Reduce the number of messages used by the protocol, etc.

    However, PBFT still relies on the statutory majority (quorum), one node and one vote, and the minority obeys the majority to achieve Byzantine fault tolerance. For the alliance chain, this premise is no problem, even the advantage. But in the public chain, there are big problems.

    Workload proof PoW

    The Proof of Work (PoW) mechanism is widely known as the popularity of Bitcoin. The PoW protocol is briefly described as follows :

    • Broadcast new transactions to all nodes;
    • Each node puts the received transaction into the block;
    • In each round, a randomly selected node broadcasts the block it holds;
    • The other nodes accept the block after all the transactions in the verification block are correct;
    • The other nodes put the hash values ​​of the block into the next block they created, indicating that they acknowledge the correctness of the block.

    Nodes always think that the longest chain is a legal chain and try to expand it. If two nodes simultaneously broadcast the blocks they have dug, the other nodes start mining based on the block they first received, but at the same time they reserve another block. Therefore, some nodes will receive the block of A first and start mining on it, while retaining the block of B to prevent the branch where B is located from becoming a longer branch later. Until one of the branches becomes longer in the next proof of work, the nodes that previously worked on the other branch will turn to this longer chain.

    On average, one node finds a block every 10 minutes. If two nodes find a block at the same time, the network will determine which block to build the general ledger based on the decision of the subsequent node. From a statistical point of view, a transaction is considered to be clearly confirmed and irreversible after 6 blocks (about 1 hour). However, core developers believe that 120 blocks (about one day) are needed to fully protect the network from the threat of blockchains from potentially longer, newly generated coins.

    There is a biological principle called the "handicap principle", which helps us explain the process of proof of workload. This principle says that when two animals have a motive for cooperation, they must convincingly express goodwill to each other. In order to dispel the other's doubts, they must attach their own costs when expressing their friendship to each other, so that they have to pay a high price when they betray each other. In other words, the expression itself must be unfavorable to oneself.

    The definition may be very sloppy, but this is something that has happened frequently in history: in the history of China, the signing of a covenant between the state and the state, in order to express their sincerity towards the covenant, often reciprocates. That is to send a son to each other (sometimes even send the Prince, the heir to the throne) to the host country to hostage. In this case, the price paid for trust is the family of the monarch and the son, and the upbringing of more than ten years.

    Bitcoin's workload proved to be a good use of the unfavorable principle to solve a social problem in its own network: the creation of a new block is based on the huge cost of time and effort , so when a new block is born, a miner Either ignore it, continue to search for its new block, accept it, and continue mining in its own block after the block. Obviously the former is unwise, because in the bitcoin network, with the longest chain as the legal chain, the miner chose to ignore and start a new stove, he had to convince enough miners to follow his route.

    If he chooses to accept, not only will he not pay extra hard work, but he can continue to mine in his own updated block. It will not appear again. You will take me away. It is a benign construction of the whole network. Bitcoin constrains node behavior through unfavorable principles, which is great because it can be used in many aspects of Internet construction today, such as anti-spam and anti-DDoS attacks.

    The advantage of the PoW consensus protocol is that it is completely decentralized and nodes are free to enter and exit. However, relying on the machine to perform mathematical operations to obtain the accounting rights, the consumption of resources is higher than other consensus mechanisms, and the supervision is weak. At the same time, each time a consensus is reached, the entire network needs to participate in the operation, the performance efficiency is relatively low, and the fault tolerance allows the whole network. 50% node error .

    • At present, Bitcoin has attracted most of the world's computing power, and other blockchain applications that use the PoW consensus mechanism are difficult to obtain the same computing power to protect their own security.
    • Mining causes a lot of waste of resources.
    • The consensus reached a longer cycle.

    Equity interest certificate PoS

    There are many variants of Proof of Stack (PoS). The most basic concept is that the opportunity to choose to generate new blocks should be proportional to the size of the equity. Equity can be invested or it can be other resources that are invested in advance.

    The PoS algorithm is an improvement over the shortcomings of the PoW algorithm. PoS was first proposed by Bitumintalk in Bitcointalk in 2011, and later implemented in different ways by Peercoin and NXT. PoS is not like PoW. No matter who you buy, you can buy a mining machine and download software . PoS requires participants to pre-place some tokens (interests) on the blockchain, similar to storing the property in the bank. This model assigns you the corresponding interest based on the amount and time of the digital currency you hold. The user only puts some benefits into the chain, which is equivalent to the deposit. The user will pay more attention and the decision will be more rational. At the same time, the reward and punishment mechanism can also be introduced to make the operation of the node more controllable and better to prevent attacks.

    The mechanism under which PoS operates is as follows:

    • The person who joins the PoS mechanism is the holder of the currency and becomes the validator (validator);
    • The PoS algorithm picks one of these verifiers to give the right to generate new blocks. The order of selection depends on the amount of money held;
    • If no block is generated within a certain period of time, PoS picks the next verifier and gives the right to generate a new block;
    • By analogy, the longest chain in the blockchain is subject to change.

    PoS and PoW have one big difference : under the PoS mechanism, the currency is interest-bearing. As we all know, bitcoin is limited in number. Due to the bitcoin loss problem, bitcoin is reduced overall, which means that bitcoin is a deflation system. In PoS mode, the concept of currency age is introduced, and each coin generates 1 currency per day. For example, if you hold 100 coins for a total of 10 days, then your currency is 1000. At this time, if you find a PoS block, your currency will be cleared to 0. Every time you are emptied by 365, you will receive a certain amount of interest from the block. Therefore, there is no deflation under the PoS mechanism.

    Compared with PoW, PoS does not need to consume a lot of power in order to generate new blocks, and it also shortens the consensus time. But the disadvantage is: PoS still needs mining .

    Entrusted equity certification mechanism DPoS

    The Delegated Proof of Stake (DPoS) mechanism is an improvement of the PoS algorithm. The author tries to explain this algorithm in an easy-to-understand way.

    Assume the following scenario : There is a mountain next to Baihua Village called Block Chain Mountain, which is owned by the villagers collectively. Company A outside the village is preparing to develop tourism resources for the block chain mountain. Company A and the villagers committee jointly established Baihua Tourism Development Co., Ltd. and signed a joint-stock cooperation agreement. The following is a dialogue between the villagers Li Da and Liu Wu during the Spring Festival holiday:

    Li Da: Regarding the tourism development block chain mountain, the village committee and the company A signed the contract.

    Liu Wu: What are the benefits of us?

    Li Da: We are all shareholders of Blockchain Travel Co., Ltd.

    Since the villagers are all shareholders, all the villagers are the owners of the block chain.

    Liu Wu: What kind of work do shareholders have to do?

    Li Da: As for the important decision on the development of the blockchain, shareholders must vote.

    Liu Wu: That can't be done. I have to go out to work after the Spring Festival, and it is not necessarily where. How can I come back to vote?

    Li Da: It doesn't matter. We can choose a few representatives, such as Teacher Wang. He will stay in the village primary school to teach, will not go, and people are reliable and creditworthy.

    Liu Wu: I also elected Teacher Wang to vote on our major resolutions.

    Teacher Wang is here to entrust an equity person (also called a witness). The witnessing mechanism is used in the DPoS algorithm to solve the centralization problem. A total of N witnesses signed the block. DPoS eliminates the time spent waiting for a transaction to wait for a certain number of blocks to be verified by an untrusted node. By reducing the need for validation, the DPoS algorithm greatly increases the speed of transactions. By trusting a small number of good faith nodes, unnecessary steps in the block signature process can be removed. DPoS blocks can accommodate more transactions than PoW or PoS, allowing cryptocurrency to be traded closer to centralized clearing systems like Visa and Mastercard.

    Li Da: We collectively recommend the teacher of Wang, and give Mr. Wang a little compensation every year, because it is very tiring to participate in the board of director of Company A.

    Liu Wu: Cheng!

    The owner of the equity must pay the witness a certain amount of compensation in order to be as long as possible for the witness.

    Liu Wu: I am also going to recommend Tao Aunt. High culture, good people, will always stay in the village.

    Li Da: Tao Aunt is not good, or do not do this errand.

    Witnesses must be guaranteed to be online as much as possible. If the witness misses signing the blockchain, he will be kicked out of the board. Can't serve as a witness.

    After the villagers elected several witnesses,

    Liu Wu: How did you choose Lai Da this time? This guy has always done good things. I quit!

    If the equity owner does not like the selected witness, he or she may choose to sell the equity to exit.

    DPoS enables the blockchain network to retain some of the key advantages of a centralized system while maintaining a certain degree of decentralization. The witness mechanism allows the transaction to wait only for a small number of good faith nodes (witnesses) without having to wait for responses from other untrusted nodes. The Witness Mechanism has the following characteristics :

    • The number of witnesses is determined by the equity owner and at least 11 witnesses need to be secured.
    • Witnesses must be online as long as possible in order to respond.
    • Witnesses sign and broadcast new blockchains on behalf of equity owners.
    • Witnesses who lose the blockchain will lose their eligibility and will lose this portion of their income.
    • Witnesses cannot sign invalid transactions because the transaction requires confirmation from all witnesses.

    A Sociological Discussion on Consensus Algorithm

    For the Byzantine problem of distributed systems, from the perspective of computer science, the FLP and CAP theorems have told us that there is no solution. Researchers and scientists can only find inspiration from other places. In fact, it doesn't take too much time, they will find that the real human world is a distributed system. If the world of the best-selling book "Three-body" really exists, then the solar system and the trio of the Centaurus will explode at the same time. For us on earth, it must be the first explosion of the solar system, because light must be the first Arrive to the earth.

    In the eyes of the trio, they will first observe the explosion of Centaur. For the same event, the order in which different systems receive events is different. Different systems run at different speeds. Plus the channel of communication is problematic. In the above three-person example, we assume that the transmission of light is unimpeded. But what if the light is swallowed up by the black hole in the way, and the message will never be received?

    The genius of Bitcoin lies in the introduction of a consensus mechanism with reference to the way in which human society is organized and operated. The establishment of a transaction, that is, the bookkeeping right of a distributed ledger, is determined by a consensus reached by a specific consensus mechanism. Consensus is a typical sociological concept. The readers should have a feeling of deja vu in the various consensus algorithms described in this article.

    PoW, we can call it "Fan Jinzhong". Fan Jin used most of his life to learn a useless eight-part writing, just as bitcoin miners use the power to calculate the problem. The key is that the question is meaningless. One day, with good luck, you have the right to package all the transactions he approves.

    PoS is a user who wants to put in some benefits in advance, which is not very similar to our real-world shareholding system. People convert real money into shares and start a business. Who has more shares and who has the right to speak.

    The DPoS mechanism is especially like our board of directors. Elect representatives to represent the interests of shareholders. The elected representatives are generally mature and experienced. Not only can it handle daily affairs quickly, but it also protects the interests of shareholders.

    Paxos, Raft, and PBFT are very similar to the drill queues in our lives, and we agree on each other's messages and passwords. Each row of the head is used as a leader, and the rest of each row is targeting the head and adjusting its actions. The Ripple Consensus algorithm has a list of special nodes in the initial state, just like a club. To accept a new member, 51% of the club members must vote. Consensus is determined by a core member's 51% power vote, while outsiders have no influence.

    Since the club started with “centralization,” it will always be “centralized,” and if it starts to corrupt, shareholders can't do anything. Like Bitcoin and Dotcoin, the Ripple system separates shareholders from their voting rights and is therefore more central than other systems.

    If we look at Lamport's paper on distributed systems consensus, we find that the paper is an example of a theory of parliamentarians, bills, and messengers. It doesn't look like a computer paper.

    Here is a summary. Traditional, pure computer algorithms have nowhere to focus on the Byzantine problem of distributed systems (see FLP and CAP theorems) . Therefore, some sociological theories and concepts have been introduced in the research of distributed systems, including the above game theory, biological principles, and so on. We can think of each computer node as a unit. The computer network is a society composed of units. How do we give the social design rules of this computer node to ensure:

    • When a small number of nodes are too slow, or the failure crashes, the entire network can still output correct results;
    • The response of the entire network should not be too slow. It is unacceptable to wait for an hour to buy a cup of coffee;
    • When the computer network is partitioned (some nodes on the network and the rest of the nodes are completely disconnected), the correct result can still be output stably;
    • The entire system is stable and outputs stable results.

    We can learn from the social mechanisms and incentive mechanisms in human history to achieve the above functions. We have reason to believe that Internet or distributed network systems are inextricably linked to real-world social operations. Because of this, the development of blockchain is not a product of the underworld.

    Today, the most widely discussed consensus algorithm in the entire blockchain ecosystem is nothing more than PoW and PoS. PoW is slow, energy intensive, environmentally unfriendly, and vulnerable to economies of scale, and PoS is considered a good alternative.

    Author | Jiang Yong, Wen Yan, Jia Wen

    Editor | George

    Produced | Blockchain Base Camp (blockchain_camp)