From GB to KB, how does zero-knowledge proof create a concise blockchain?

Written in front: The author of this article is blockchain developer Ronald Mannak. In his article, he explained the significance of zero-knowledge proof to blockchain scalability through a simple description.

Many people have written technical articles on Zero Knowledge Proof (ZKP). I also recently compared the new generic zk-SNARKs in the article. I found that there are few articles about popular science ZKP applications. The purpose of ZKP is not just to protect privacy, it is versatile and can redefine the way blockchain operates.

blockchain

Concise blockchain, from GB to KB

Blockchains can become very large and grow with the number of blocks. The blockchain is designed this way, and we have accepted this fact. However, Coda's recently launched testnet is different. First, Coda's blockchain is fixed in size and will not grow larger. Secondly, its size is only 22KB, which is equivalent to the capacity of a classic home computer Commodore 64 or ZX Spectrum in the 1980s. Coda is as secure as the traditional blockchain, and it can even be said to be more secure than the traditional blockchain. More and more projects will soon launch similar but more functional "simple blockchains". How does such a blockchain work?

Anyone who has ever created a blockchain node understands this pain: It takes hours or even days to synchronize a node. Many blockchains are so large that disk space and bandwidth requirements exceed most people's home computers. This is part of the reason for centralization. Even a popular blockchain like Ethereum has only about 10,000 nodes. Most of them are hosted on AWS and are owned by only a few entities. Blockchain is not as decentralized as many people think.

Why does it take so long to synchronize the blockchain? There are two reasons. The first reason is obvious: it takes a while to download data over a few hundred Gigabytes. Second, the blockchain needs to be verified after downloading, because malicious nodes may send you wrong data.

To verify the blockchain, you must start with the genesis block: execute the first transaction and confirm that the state of the calculation is equal to the state of the download. Go to the next transaction until all transactions in the blockchain have been checked. This is a waste of time; thousands of nodes have gone through this process.

This is necessary because in traditional calculations, the only way to verify that a calculation is correct is to recalculate it. This is good for small-scale calculations, but not good for some calculations that take a lot of time, such as the blockchain example.

ZKP improves efficiency and bandwidth

There is a way to verify the calculation results at a lower cost without re-calculation: zero-knowledge proof (ZKP), of which zk-SNARKs is probably the best known.

How to do it? We will talk about rewriting the blockchain replay function as zk-SNARK. zk-SNARK will output two things: the original output (just like the original replay function) and a small mathematical proof that the result was calculated correctly. This proof can be as small as 200 bytes (yes, less than 1KB).

You do not need all (or even multiple) computers to perform the replay function. One computer can create proofs, and other computers can verify at any time they see fit. No matter how long the original calculation takes (even hours, days, or years, it doesn't matter), verification takes just a few milliseconds. The certificate can be posted online via a USB flash drive, or even printed on a T-shirt.

If the malicious node changes the balance, the proof will not match the result, and other validators will reject the status. If the malicious node changes the zk-SNARK code, the result will also be rejected. (There is also a third parameter, a publicly shared string, which will also associate the proof with the zk-SNARK code. If the code is changed, the proof and the shared string will not match, and the verifier will reject the result.)

We no longer need to redo expensive calculations or download the blockchain (because we already have the blockchain's existence and valid mathematical proof). You just need the current state (such as the last block) and a simple proof that the current state is part of a valid blockchain and it takes a few milliseconds to verify the result.

Recursive combination

Validating a proof is fast, but what about creating a proof? Time is not fixed and it is much less efficient in terms of calculations and memory compared to traditional calculations. In fact, although the zk-SNARK version of the replay function sounds good, it is not a good solution in practice. It will consume a lot of memory and even slower than the original non-zk-SNARK replay function.

But there is another better solution. With some tricks, we can use recursive zk-SNARK. With recursion, we don't have to verify the blockchain from scratch, but we can build on previous states, which is much faster. Note that recursive zk-SNARK is not as efficient as non-recursive zk-SNARK, but recent zk-SNARK constructions have made great progress.

The recursive zk-SNARK program uses the previous state, a proof belonging to the previous state, and a new transaction as inputs. It verifies the previous state (using the provided proof) and checks if the transaction in the new state is valid. If the answer is yes, it will output the new status and a proof.

Once the new state and proof are distributed to the network, all nodes can discard the previous state without any negative impact. New nodes need only download the latest status and proof. This is why Coda is able to have a fixed-size blockchain.

In our previous example, only one node would create a new block and proof. Obviously, the same node does not necessarily need to produce every block. For example, one node can be randomly selected from many nodes (using verifiable random functions, nodes can even choose themselves randomly without cheating). We can do better. We can divide the block output logic into multiple zk-SNARKs.

The end result is that the block producer does not need to keep the complete blockchain history, it only needs the previous state. How small is the occupied capacity? An ordinary Coda node only needs 22 KB to store the certificate, current state and Merkle path. As long as 22 KB, nodes can verify the entire blockchain, query balances, and create transactions. But to generate a block, there are more requirements on the node: it needs the full balance Merkle tree of the previous state. The size of the Merkle tree depends on the number of wallets. If Coda has as many wallets as Ethereum, then Coda block producers need about 1 GB. The smallest full node capacity on Ethereum is (as of December 2019) 230 GB. A huge difference.

In this way, the network has more active nodes, which enhances decentralization and opens up many new possibilities for programs that interact with the blockchain without the need for a solution like Infura or Metamask. Considering that 99% of new users lose their patience before installing Metamask, this could have a huge impact.