Viewpoint | Meditations on Fault Proof (1)

Editor's Note: In general, we will consider Fault Proof as a concept related to Layer-2, which is the model that Layer-2 uses when reporting its status to Layer-1. But in this article, the author uses the general concept of erroneous proof, and considers how to design a erroneous proof mode to make SPV nodes (approximately so-called "light nodes") obtain higher security.

Fraud Proof is an extremely complicated and annoying concept, but if you want to know some of my experiences, please bear with me and think with me.

-Benihime Morgan 's river artwork-

In a nutshell

  • SPV nodes are very easy to run and expand; with error proof, SPV nodes (Simplified Payment Verification) can have the same security as full nodes.
  • I want to introduce the "SPV +" mode here; regular SPV nodes only need to save the block header (the storage increment is about 4MB per year), and SPV + nodes also need to save the first and last transaction in each block (storage (Approximately 115 MB per year).
  • The "SPV +" node must establish a payment channel with a full node or establish an LN connection.
  • Every time a block is verified for correctness, the SPV + node must pay a small fee to these full nodes; I estimate this fee will not exceed $ 50 / month.
  • The follow-up is to add a few new opcodes: we need a rangeproof off-chain, coupled with witness-commitment technology similar to "SegWit", and can use SPV + nodes conveniently and at low cost.


A. How to make Bitcoin more like physical gold

Bitcoin's design benchmark is gold; although in many ways, Bitcoin is far superior to gold, but when it comes to receiving money, the question comes-how do you know that the money arrived? If you pay by physical means such as gold, it is very simple to have the account arrive-just like all cases where one hand pays and one hand delivers; but if you use bitcoin (digital currency) to pay, protecting asset ownership will become A very abstract thing.

For this problem, Satoshi proposed a perfect software solution that allows you to know if the money has arrived-it is "Bitcoin" software.

Wait! Isn't this going back to the beginning? Exactly how this software works

The mystery revealed that Bitcoin software uses a special mechanism to synchronize with other computers running the software; this mechanism is similar to Dropbox, but the difference is that each file itself guarantees synchronization, so there will be no version control issues. . In other words, "knowing that the money has arrived" and "knowing that you have synchronized" are the same thing.

Satoshi Nakamoto proposed in the white paper two ways to "confirm that the money has arrived":

  1. [Forward approach (traditional)] Run the software and wait for full synchronization.
  2. [Reverse method (experimental)] First, run a "light client"-only strategically synchronize some simple parts; then notice if "alert" appears.

The first method is the so-called "full node", which relies on a positive proof -you should see X. Once you see X, you know you have received the money; the second method says For the "SPV mode" 1 , it relies on negative proof -you shouldn't see Y, and once you see Y, you know you haven't received the money. The Y here is the "alert" mentioned in the white paper, and now you may hear another name-"fraud proof".

B. The theoretical support of "alert"

Personally, what I find most interesting is that the reverse proof mechanism (ie, "alert") is similar to many behaviors in real life.

Imagine the following example:

  • We will not try to put an end to homicides 100%, but will do our best to arrest the inmates after they have occurred (convictions in court and the prisoners should be given the punishment they deserve).
  • We will not try to eliminate 100% of adulterers, but if there is a real adulterer, we will expect him to be eliminated by the market and replaced by good businessmen; if there are too many disputes, we will eliminate torts or rules we do not want businessman.
  • We will not try to guarantee that every scientific research published is 100% error-free, but will make them public to the greatest extent possible, and look forward to receiving criticism or correction.
  • We will not try to prevent 100% of judicial corruption. However, we do require that all aspects of legal proceedings must be recorded to ensure that the fairness of the trial can be traced afterwards by the public and legal scholars.
  • We do not try to become "omnipotent." However, we hope to find the required knowledge and skills in books and websites, and hope that experts will make this information more accurate in the future.

Usually we assume that everything is fine until a serious enough error occurs and we fix it. Without this, it would be difficult for us to verify that everything is 100% correct in real life.

C. The theoretical challenge of “alert”

The problem with "alert" is that Satoshi didn't actually implement the idea. Eric Lombrozo also mentioned this in a tweet last month.

-Eric Lombrozo: "Many of the top tech experts I've talked with say that false proofs are too difficult to achieve, and even impossible in the worst case. Satoshi Nakamoto seems to recognize the difficulty and therefore Never proposed a solution. "-

To achieve erroneous proof, there are two main difficulties:

  1. Resistance to DoS attacks: The reason why Bitcoin full nodes have strong resistance to DoS attacks is because of the asymmetry of the Proof of Work mechanism-a new block can be produced every 10 minutes. It only takes a short time to verify this block. But does this apply to "alert"? Does "alert" implement a PoW mechanism? Who will pay for the service? If no one pays, how can we stop malicious nodes from spamming fake "alert" to cover up the real "alert"?
  2. Proof to the contrary: malicious / careless miners may discard part of the data in the block, and more extremely, miners may create a block without knowing what is in the block at all! If the block contains wrong transactions, how do we find out? If no one finds out, how do you remind others?

To address the first problem, we can use methods other than the blockchain to defend against DoS attacks, that is, the Payment Channel.

For the second question, we can put (very valuable) "review resources" in a specific part of the verification block-in simple terms, we can let the node declare that it really knows what the entire block (all parts) contains Content, and then have the verifier verify the claim and endorse it.


A. SPV mode

Satoshi Nakamoto's SPV model (at Chapter 8 of the white paper ):

  1. Bitcoin's block header is very small (4MB increments per year), and it is easy to verify, regardless of the number of transactions contained in the block.
  2. It can be easily proved that the block contains something (a transaction) "X"-as long as there is "X" itself, the block header, and a valid Merkle Branch containing both .

If you don't understand well, you can refer to the following example:

Suppose we have three block headers: headerA, headerB, headerC; each block header contains a hashMerkleRoot (Herker root hash): hA, hB, hC.

Does the transaction Tx exist in any of these blocks ([header A], [header B])?

Yes, because h ([tx]) = ht, and

h (ht, hs1) = hi1

h (hi1, hs2) = hi2

h (hi2, hs3) = hA

among them:

ht is the hash of transaction Tx;

hs1, hs2, hs3 are hash values ​​provided by the full node ("Fred").

hi1, hi2 are intermediate hash values ​​calculated by the SPV node ("Sally").

The actual meaning of the above proof is that there is a Merkel tree with hA as the root hash value. It has two branches, hi2 and hs3, and the hash value is proof. There is no other possibility; hi2 also has only two: hi1 and hs2. Branches … layer by layer, it turns out that ht must exist in this Merkel tree,

Merkle Branch (contains the hash values ​​provided by the full node, as well as the number order and location information of these hash values ​​on the Merkle branch) is very small, only growing at the rate of log (n). Payers can easily get / generate Merkle Branches and send them to payees along with transactions; this cost is negligible.

In other words, as long as there is a Bitcoin block header, you can know "whether the money has arrived". The block header is easy to get, so it seems that SPV mode can easily achieve unlimited throughput.

B. Problems with SPV mode

The problem is that we can never determine whether an 80-byte block header is really a "Bitcoin Block Header".

The only way is to check all the information of the corresponding block. If there is an invalid transaction or a double spend transaction (or that there is some kind of "defect" in the block, see "Block Flaws" below), the entire block will be considered as an invalid block .

C. Good news

Although we cannot verify whether an 80-byte block header is a Bitcoin block header, fortunately, we can verify the proof of workload of the block header by calculating the hash value of the block header against the current difficulty of the block. (The block header is designed to be very friendly, and it itself contains a set of important information-the difficulty of generating the block and the timestamp information; the two can verify each other.)

In this way, we can check whether the miner has actually performed the hash operation; unfortunately, we still cannot determine whether this block header is valuable. transaction). It ’s like if you asked Matthew, a miner, to buy a box of chocolates, you can easily verify that “Matthew really bought more than 300 knives for a box of chocolates”, but you ca n’t be sure whether these chocolates are delicious or not. Do they really contain chocolate?

D. Review of forward / backward proof

You can eat every chocolate in the box and prove that every chocolate is delicious. This is the "positive proof".

Or you can prove it in reverse along the following lines: This box of chocolates is packaged and does not seem to be passive. In addition, this box of chocolates has a brand endorsement, and China strictly enforces the brand law / trademark law; Many people have bought this brand of chocolate. If there is any problem with the quality, I can just search it and see the relevant news / bad reviews (in fact, after I checked it, I didn't find any negative news ).

Another example of using reverse proof is a refund promise . Suppose you want to buy a car (a product of uncertain quality, like a newly created but not yet verified Bitcoin block), and now there are three cars ("Car A", "Car B" , "Car C"), currently you are most interested in Car C.

If you want to get a positive proof, you have to drive Car C thousands of miles, accompanied by a large team of mechanical engineers to check every part of this car and report the problem to you.

If it is the reverse proof: Suppose that both Car A and Car B provide a legally valid statement "cars with less than 40,000 miles can be refunded if they fail"; but Car C does not have such a commitment , then counter It has been proved that the quality of Car C is not good.

To achieve erroneous proof on Bitcoin, we need one thing-it appears when the block is legal; it never appears when the block is illegal (or vice versa).

In game theory, this is called " separating equilibrium " in "signaling game" (or more precisely " screening game "). Among them, the senders of erroneous certificates are divided into "honest" and "dishonest" categories, and we are trying to sort out the dishonest category through low-cost means.

E. Our needs

We need to find a way to remind us of "block errors" . Ideally, this alert is coming quickly and accurately (that is, "before the transaction is settled", or "before 6 blocks are confirmed", or (for security reasons) within "20 ~ 30 minutes" Respond).

For a concrete example, the ideal situation would be as follows:

  1. "Sally" (SPV node) received a bitcoin for some reason. To show her the information of the transaction, Sally also saw that the transaction was legal.
  2. Sally wants to know if the transaction is confirmed by 6 blocks without running a full node. Therefore, she first downloaded all the bitcoin block headers, and then asked the full node for Merkle Branch (including [1] her transaction [2] latest block header). She got a Merkle Branch , but unfortunately (she didn't know it at all): the block header inside was invalid for some reason …
  3. At this point, “Fred” (full node) must be aware of the problem—one block has one or more “defects” (see below).
  4. There must be some kind of incentive to get Fred to issue some kind of warning (that is, "alert") to Sally.
  5. Under other normal circumstances, Fred cannot be motivated to issue a warning (that is, a "false warning" does not appear when there is no problem).

F. Types of block errors

Blocks can have many defects (see validation.ccp, especially "CheckBlock" ). I divided them into four categories:

  1. "The first type" -bad transactions (invalid transactions, double spend transactions, or repeated transactions ).
  2. "Second type" -missing block data (on the Merkle Tree recording Sally transactions, the data of nodes adjacent to the Sally exchange are unknown or invisible-this may be artificial or Unintentionally).
  3. "Third category"-bad blocks (coinbase misplacement, wrong version , "witness" data loss, [drivechain] massively updated Escrow_DB / Withdrawal_DB). (Translator's Note: drivechain is a sidechain architecture proposed by the author himself, see Peer-to-Peer Bitcoin Sidechains )
  4. "Fourth category"-improper accumulation (exceeding block size / SigOps limit, coinbase transaction fee [must be equal to the sum of all transaction fees in the block], [drivechain] sidechain output-"CTIP" field of Escrow DB ).

First kind of error

The first level of error is very easy to deal with. Sally can directly verify the validity of the transaction by verifying the transaction and reversing the outcome (so that "false" validation returens "true"), as described below. In SPV mode, you can even check nLockTime and CSV, because Sally knows Merkle Branch and all block headers. As long as two transactions are observed to have the same input, double spend transactions can be easily checked out. Duplicate transactions must not pass the test, because they must be double-spend transactions (except for coinbase transactions, see Type III errors for details).

Second type of defect

The second category of defects is most relevant to SPV users-SPV users must assume (except for the block header) that the rest of the block is complete, but (by definition) cannot check whether this is indeed the case. To make matters worse, miners can create a new block without verifying the block content, and they do. Therefore, there may be unknown content in the newly created block. The above assumption is obviously unreasonable.

I will prove that, in theory (though not verified by practice), as long as a valid "block header + Merkle Branch" can be provided to Sally, there should be a complete Merkle Tree (containing Sally's transactions and a certain number of known Valid Bitcoin transactions). Therefore, all the defects related to the lack of blockchain data (that is, the second type of defects) are essentially the "missing Merkle Tree-related data" problem. It can be said that this kind of defect is the problem of the original image of the unknown hash value (easy to handle), or the specific point can be solved by sampling the unknown original image of the hash value (Translator's Note: a hash value The "primary image" refers to the original data that produces this hash value after being input into a hash function).

The solution I proposed was to require Sally to download the last transaction (plus Merkle Branch) of each block in addition to the block header 2 .

Third category of defects

Class III

The third type of defect is very common, but I believe that this small problem can be solved by a specific simple method. For example, if the block version is wrong, the SPV node can directly obtain the correct block version from the block header maintained by itself.

Most other information is missing and can be found in coinbase transactions ; therefore, in addition to all block headers, the SVP node also needs to save the coinbase transactions of each block (Translator's Note: coinbase is a special transaction of additional currency in the block) . With this information, the SPV node can know: [1] whether the coinbase transaction appears only once and appears in the correct position; [2] whether "witness data" exists and what it witnesses; [3] determine all Withdrawal_DB And most Escrow_DB is correct.

As for Escrow_DB of drivechain, SPV nodes on main chain 3 must pay attention to the cumulative effect of transactions between chains in the block; the solution is to be introduced in the fourth category of defects (Chapter 7 of this article).

So we have to add some overhead-introducing the "SPV +" mode (also known as "SPV patch"). In addition to synchronizing the Bitcoin block header (80 bytes every ten minutes), the "SPV +" node must also synchronize the first and last transactions of each block, plus the Merkle Branch related to these two transactions.

  • The old style [Satoshi Nakamoto's Classical SPV]: Synchronize the block header (80 bytes / block) of each block in the blockchain; summarize each time a new transaction is received (Transaction & Merkle Branch).
  • New style [SPV + mode]: 80 byte / block + (coinbase transaction + last transaction in the block) / block + two (different) Merkle Branch / block; each time a new transaction is received Perform a summary (transaction & Merkle Branch); establish payment channels with several other nodes.

How much storage overhead does SPV + mode add? I'm not sure, but if the coinbase transaction is about 1000 bytes, the "last transaction" is about 280 bytes, and each block contains about 5000 transactions, then the overhead of synchronizing a block will be increased to 2192 bytes / block 4 , It is not just 80 bytes, and the growth rate of the overhead is not only O (1), but it will increase to O (log (n)).

About 52,596 blocks are produced in a year, so the storage cost will become about 115 MB / year, not just 4 MB / year. This seems to increase the overhead significantly, but from a global perspective, SPV + is still a very cost-effective solution. If Sally wants to fully check the validity of the transaction, she only needs to download this additional data for each block: (can be) the block produced in the last six months, or the block that records the bitcoin she received (And ~ 10 blocks before and after these blocks), and / or some random check information.

Fourth category of defects

The fourth category of defects is very interesting. Chapter 7 of this article will introduce how to convert the fourth type of defects into the first type of defects, but in short, to solve the fourth type of defects, I require that the transaction hash value is not only a guarantee of the transaction itself, but also explain my own accumulation of Impact of indicators. For example, the transaction must not only ensure that it is "277 bytes", but also state that "after adding itself, the host block size has increased from 500809 bytes to 501086 bytes." In this way, all “fake transactions” can be isolated and identified. In other words, the last transaction will provide important information (total number of transactions, block size, overall transaction fees, etc.).

Before I delve into more technical details, in order to avoid someone being left behind because they don't understand, I will explain the "entire logic" again in the form of a story.


Note 1: The difference between "full node" and "SPV node" may not be as clear as people think. In short, when a full node is downloading a new block, it is in SPV mode relative to the next block. In addition, assuming that 51% of the computing power is running a new software secretly (or, in other words, running a new but compatible protocol), the new software adds block extension data by default. Then, even if other nodes want to become "full nodes", they have to become a mixed mode of some full nodes and some SPV nodes. If those miners later change the protocol back and remove the requirement for extended data, then we will become 100% full nodes again. However, in the process, you may not realize that the node shape has changed, and in fact, you cannot know. Therefore, it is only meaningful to use these data if the protocol itself is assumed to be fixed. Of course, this article assumes this. However, in reality, the agreement may and does change, and miners can always choose to run customized software.

Note 2: To be precise, it is necessary to do this for all the blocks that she wants to "fully verify" using SPV mode (other than those that don't want to verify)

Note 3: For those blockchains that are themselves sidechains, inter-chain transaction errors can be classified as Type 1 errors. To get an error warning, the SPV node of the side chain needs to find two transactions, one that provides a deposit on the main chain, and the other is a transaction that records the amount deposited on the side chain. Therefore, Sally on the side chain must check both the main chain and the side chain to find the error.

Note 4: The total storage cost of a single block is: "block header + first transaction + second transaction + 2 × (32 × log 2 (n))", where "log 2 (n)" refers to "The upper limit of space available based on the number of transactions in a block." Therefore, in this example, the storage cost of a single block is "80 + 1000 + 280 + 2 × (32 × log 2 (5000))", which is approximately 2192 bytes. Note that we don't need the bitmask of the coinbase transaction because we already know its exact structure, but we may need to know the last transaction. (This data should be smaller than encoding self into a branch-hash composed entirely of 0, which I have estimated here.)

Original link: Author: Paul Sztorc translation & proofreading: IAN LIU & A sword