Introduction to Blockchain | Exploring Zero Knowledge Proof Series (1): Initially Knowing "Zero Knowledge" and "Certificate"


I think the blockchain is hard to call a "technology." It is more like a field, all-encompassing. Or metaphysically, blockchain is more like an organism, combining various theoretical techniques.

Zero knowledge proof is an important technology to build trust, and it is also an indispensable part of the blockchain organism.

Zero-knowledge proof is the key technology for breaking data and chain calculations on the chain, and is also an important way to realize data privacy protection on the chain.

To explain the "zero knowledge proof", we need to explain the "proof" first, then explain what is "knowledge" and finally explain what is "zero knowledge".

"Certificate" of past and present

What is proof? Many people may, like me, see these two words, they can't help but think of the various triangle-like geometric figures in the middle school exam paper. When the teacher magically draws a guide line, the proof process is suddenly obvious, and then he will regret what he is. did not expect.

Ancient Greece: "Certificate" == "Essence"

The mathematical proof originated from ancient Greece. They invented (discovered) axioms and logic, and they used evidence to convince each other, not by authority. This is a complete "decentralization." Since ancient Greece, this methodology has influenced the entire process of human civilization.

The above picture is a clever proof of the Pythagorean Theorem. There have been many exquisite proofs, magical ideas, and inspiration for genius in history. Once a proposition is proved, God can do nothing. Well, yes, and the "God is not omnipotent" proof: God can't make a stone that he can't lift.

A mathematical proof often hides a profound "hole". I believe many people have read the story of "Ferma's Theorem" [1]. This theorem proves that it spans four hundred years and writes from Fermat that "the space here is too small. I can't write it." When Wiles finally reached the summit, it took the wisdom of many generations. Recently, such as "Pengalai conjecture", a little bit of a sense of time like "Goldbach conjecture", and my very admired Chinese scientist Zhang Yitang has been grinding a sword for ten years, carefully studying "Goldston-Pintz-Yıldırım" and After the proof of "Bombieri-Friedlander-Iwaniec." "Establishment", it proves "the bounded interval between prime numbers" [2].

Since the 17th century, since Leibniz, people have dreamed of finding a mechanical means to automate the proof without relying on the genius of the genius.

Early 20th Century: "Certificate" == "symbolic reasoning"

By the end of the nineteenth century, Cantor, Boolean, Frege, Hilbert, Russell, Brauway, Gödel and others defined the symbolic system of formal logic. The "proof" is a process of reasoning written in the symbolic language of formal logic. Is the logic itself reliable? Is the logic itself "self-contained"? Logical reasoning itself is right, can you prove it? This allowed the mathematician/logiclogist/computer scientist to invent (discover) the symbology, grammar vs. semantics, reliable vs. complete, recursive vs. infinity. (See the "Engine of Logic" book [3] for this wonderful story).

In 1910, Russell published the "Mathematical Principles" of Hong (zhuan). In the book, Russell and Whitehead attempt to "form" the mathematics completely. If this goal is achieved, all mathematical results will be based on a solid foundation. The following picture is a page in "Mathematical Principles (Volume 2)":

One of 110.643 is a proposition: "1+1=2", and then the proof of this theorem. Everyone may be wondering, does 1+1 still need proof? Yes, in the book of mathematical principles, the numbers 0, 1, 2, … have strict definitions. "Addition", "multiplication", and "equalization" must be strictly defined, and then each step of reasoning needs to point out the basis. What does the proof mean? Proof is probably cumbersome, but every step of reasoning is strictly correct. A large number of proofs in the book are mechanical, and a proof structure is constructed according to axioms and inference rules. Finding a proof is like giving it to a person, and then he has no brain to perform a mechanical search in the set of axioms and inference rules.

It seems that people are not far from the "automatic proof of the theorem".

Unfortunately, Gödel proved the "Gödel incompleteness theorem" in 1931 [4], and Turing proved the undecidability of Turing machine downtime in 1936 [5]. These results have completely ended this centuries of fantasy. No matter how elaborate the axiom system is, it is impossible to grasp all the truth.

Proof is not just a strict reasoning, but also a creative thinking that seems to be difficult to mechanize. The proof contains a lot of "knowledge", and each breakthrough will raise our awareness to a new height. Whether it is "insight" or the "algorithm" constructed in the process of reasoning, the connotation of a theorem often goes far beyond the conclusion of the theorem itself.

Sixties: "Certificate" == "Program"

After another half century, in the 1960s, logicians Haskell Curry and William Howard discovered a lot of "magical correspondence" in "logical systems" and "computing systems – Lambda calculus", which was later Named "Curry-Howard Correspondence". This discovery made everyone realize that "writing programs" and "writing proofs" are actually completely conceptually unified. In the next 50 years, the development of relevant theories and techniques made the proof no longer on the draft paper, but could be expressed in terms of procedures. This isomorphic mapping is very interesting: the type of program corresponds to the theorem of proof; the loop corresponds to induction; … (here a book is recommended: "Software Foundations" (6). In the framework of intuitionism, proof means constructing an algorithm, which is actually writing code. (The reverse is also true, um, the code is not a code, it is a mathematical proof, :P)

At present, in the field of computer science, many theoretical proofs have changed from paper sketches to code forms. The more popular "proof programming languages" are Coq, Isabelle, Agda and so on. The proof is constructed programmatically, and the correctness check of the proof can be done mechanically by the program, and many repetitive tasks can be assisted by the program. The building proved by mathematical theory is gradually built into the process like computer software. In December 1996, W. McCune used the automatic theorem proving tool EQP to prove a 63-year-old mathematical conjecture "Ronbins conjecture". The New York Times subsequently published an article entitled "Computer Math Proof Shows Reasoning Power". [7], once again explore whether the machine can replace the possibility of human creative thinking.

The use of machine assistance can effectively help mathematicians' thinking to reach more unknown space, but "find proof" is still the most challenging job. "Verification of verification" must be a simple, mechanical, and limited job. This is a natural "asymmetry".

Eighties: "Certificate" == "Interaction"

By the time 1985, Jobs had just left Apple, and Dr. S. Goldwasser graduated to MIT, and S. Micali, Rackoff co-wrote a classic that can be written into the history of computer science: "The knowledge in the interactive proof system is complex. Sex" [8].

They reinterpreted the term "proof" and proposed the concept of an interactive proof system: by constructing two Turing machines for "interaction" rather than "reasoning" to prove whether a proposition is true in probability. The concept of "proof" has once again been expanded.

The form of interaction proof is a "conversation script" of two (or multiple Turing machines), or Transcript. In this dialogue process, there is an explicit "certifier" role and an explicit "verifier". The prover proves to the verifier that a proposition is established, and also "does not reveal any other knowledge." This is called "zero knowledge proof."

Again, it proves that "knowledge" is condensed, but the proof process does not reveal "knowledge", and the proof verification process remains simple, mechanical, and limited. Does this sound a bit "counter-intuitive"?

Interactive proof

Alice: I want to prove to you that I have a solution to the equation, w^3 – (w+1)^2 + 7 = 0 (solution of the equation: w=3) Bob: Ok, I listened. Alice: But I I won't tell you how much x is, unless you are willing to pay, I will tell you. Bob: Yes, but you have to prove that you have a solution to the equation. I will give you money. Alice: @#$%^& (Black Technology) Bob: ?????? (Black Technology) Alice: &*#$@! (Black Technology) Bob: ?????? (Black Technology) .. …. (Continue Black Technology) Alice: Ok, it’s Bob: Ok, you do have a solution to the equation, but is it that I have saved the money, will you tell me the answer? Alice: Don't talk nonsense, save money!

The above example is an "interactive proof." Suppose Alice knows the solution of the equation, f(w) = 0, so how does Alice make Bob sure she knows w? Alice told Bob a lot of information in the "black technology phase." Well, the key question is, can Bob guess from the a lot of information that Alice said that w is a few, or can he analyze the clues about w? If Bob has this ability, Bob may not need to save money because he has already obtained this valuable information.

Note that if Alice's conversation with Bob is "zero knowledge", then Bob cannot get any other information about w except that he knows that w is a solution of f(w)=0. This is very important, it is to protect Alice's interests.

Now let's review the word "zero knowledge proof", which is called "Zero-Knowledge Proof" in English. The word contains three key parts:

  • zero
  • Know how
  • prove

You may have felt a bit, let's try to interpret it:

  • Zero: Alice leaked the knowledge of "zero" about w, that is, no knowledge of leaks.
  • Knowledge: This refers to w.
  • Proof: It is the "black technology part" in Alice's dialogue with Bob.

What is the use of zero-knowledge proof?

A zero-knowledge proof technique, many people think of anonymous Coin, such as Monero, such as ZCash. Indeed, these Coins are very popular with zero-knowledge proofs. I myself first heard about the word zero-knowledge proof through ZCash. But after a deeper understanding of this technology, I deeply feel that the power of this technology is far more than that.

Zero knowledge proof technology can solve the trust problem of data and the trust problem of calculation!

Zhang San said that he has 100 yuan. Li Si said that he graduated from Peking University. Wang Wu said that he would have lunch with Bafit. Nothing to say, Show me the proof.

So how can the "zero knowledge proof" solve the trust of data? In the previous article "zkPoD: Blockchain, Zero Knowledge Proof and Formal Verification, Realization of Fair Trading without Intermediary, Zero Trust" [9], I mentioned a concept of "simulation":

Zero-knowledge proof technology can "simulate" a third party to ensure that a certain assertion is credible

In other words, when we receive an encrypted data, there is also a zero-knowledge proof. This zero-knowledge proof is that "the X assertion about data is true", then this is equivalent to an angel whispering in our ears, "The X assertion about data is true!"

For this X assertion, it can be very flexible, it can be an NP complexity algorithm. In the vernacular, as long as we can write a program (a polynomial time algorithm) to determine whether a data satisfies the X assertion, then this assertion can be expressed in a zero-knowledge way. Generally speaking, as long as the data judgment is objective, then the zero-knowledge proof is applicable.

Some uses of zero-knowledge proof:

  • Data privacy protection: In a data table, there are more or less information that you don't want to be exposed. For example, my transcripts in the past, I just want to prove to others that my grades are passing, but I don't want others to know me. After taking 61 points or 62 points, this will be very embarrassing. I don't have a heart attack, but the insurance company needs to know this, but I don't want the insurance company to know my privacy information. Then I can prove to the insurance company that I have no heart disease, but the medical records do not need to be exposed. I am a business, I want to lend to a bank, I just want to prove to the bank that I have a healthy business and repayment ability, but I don't want the bank to know some of our trade secrets.
  • Computational Compression and Blockchain Expansion: In many blockchain expansion technologies, Vitalik uses zkSNARK technology to bring dozens of performance improvements to the existing Ethereum framework. Because of the proof of calculation, the same calculation does not need to be repeated many times. In the traditional blockchain architecture, the same calculation is repeated multiple times, such as signature verification, transaction validity check, and smart contract. Execute, etc. These calculations can all be compressed by zero-knowledge proof techniques.
  • End-to-end communication encryption: Users can send messages to each other, but don't worry about the server getting all the message records. At the same time, the message can also show the corresponding zero-knowledge proof according to the requirements of the server, such as the source of the message and the purpose of the message. Ground.
  • Identity authentication: The user can prove to the website that he has a private key or knows a secret answer that is known to the user himself, and the website does not need to know, but the website can verify the identity of the user by verifying the zero-knowledge proof.
  • Decentralized storage: The server can prove to the user that their data is properly preserved and does not reveal anything in the data.
  • Credit record: Credit record is another field that can give full play to the advantages of zero knowledge proof. Users can selectively present their credit history to the other party. On the one hand, they can selectively produce the scores of the records that meet the requirements of the other party, and at the same time prove the credit. The authenticity of the record.
  • Construct a completely fair online trading agreement for digital goods [9].
  • More examples can be any form of data sharing, data processing and data transfer.

Example: Map three staining problem

Let's talk about a classic question, the three-staining problem of the map. How to dye a map in three colors, ensuring that any two adjacent regions are of different colors. We turned this "map three-staining problem" into a "vertex three-staining problem of connected graphs." Suppose each region has a capital (node) and then connect adjacent nodes so that the map coloring problem can become a vertex coloring problem for connected graphs.

Below we design an interactive protocol:

"certifier" Alice

"Verifier" Bob

Alice has a map three-stained answer in his hand, see the picture below. This graph has a total of 6 vertices and 9 edges.

Now Alice wants to prove that Bob has the answer, but she doesn't want Bob to know the answer. What should Alice do?

Alice first makes some "transformations" on the dyed image, making a big shift in color, such as turning all greens into orange, turning all blues into green, and turning all greens into orange. Then Alice got a new stained answer, when she covered each vertex of the new graph with a piece of paper and showed it to Bob.

Looking at the picture below, Bob is going to shoot (see the picture below). He wants to randomly pick an "edge". Note that it is random and does not let Alice predict the random number in advance.

Suppose Bob picks the bottom edge and tells Alice.

At this point Alice uncovers the paper at the ends of the edge and lets Bob check that Bob finds that the colors of the two vertices are different, so Bob thinks this test is isomorphic. At this time, Bob only saw the part of the graph. Can it be convinced that the dyeing of the remaining graph vertices is fine? You must think that this is not enough. Maybe Alice is right? Other unexposed vertices may be indiscriminately dyed.

It doesn't matter, Bob can ask Alice to come again, look at the picture below.

Alice once again changed the color, changed the blue to purple, changed the green to brown, changed the orange to gray, and then covered all the vertices with paper. Then Bob picks an edge, like the one above, chooses a vertical edge, then lets Alice uncover the paper, and if Bob finds again that the vertices at the ends of the edge are different in color, then Bob The time has been a little shaken, maybe Alice really has this dyed answer. However, twice is still not enough, and Bob wants to come back several times.

Then, after repeating these three steps over and over again, the probability that Alice can cheat and successfully swindle Bob will be reduced exponentially. Suppose that after n rounds, Alice’s probability of cheating is Here |E| is the number of all sides in the graph. If n is large enough, this probability Pr will become very very small and become "insignificant".

However, the local staining situation that Bob sees every time is the result of Alice's transformation. No matter how many times Bob sees it, he can't spell out a complete three-stained answer. In fact, Bob obtained a lot of "information" in the process, but did not get real "knowledge."

Information vs. knowledge

  • Information "Information"
  • Knowledge "Knowledge"

In the interactive proof of the map three-staining problem, Bob gets a lot of information after repeating the interaction many times, but this is like Alice sending a bunch of random numbers to Bob, Bob doesn't "know" more. For example, if Alice tells Bob "1+1=2", Bob gets this information, but Bob doesn't get more "knowledge" because this fact is well known.

If Alice tells Bob that 2^2^41-1 is a prime number, it is obvious that this is "knowledge" because it is necessary to calculate whether the number is a prime number, which requires a lot of computational power.

If Alice tells Bob that there are two vertices in green, then Bob gets valuable "knowledge" because, based on the information he just got, Bob can use a Turing machine to solve three in a shorter time. Dyeing problem. If Alice reveals to Bob that the color of the leftmost vertex is orange, then it is clear that this "information" does not substantially help Bob solve the problem.

We can try to define that if Bob's "information" obtained during the interaction can help Bob's ability to directly crack Alice's secrets, then we say that Bob "gets knowledge." Thus, the definition of the word knowledge is related to Bob's computing power. If the information does not increase Bob's computing power, then the information cannot be called "knowledge." For example, during Alice's interaction with Bob, Alice throws a coin every time and tells Bob that from the information point of view, Bob's information is just an "event", but Bob doesn't get any "knowledge" because Bob is completely You can flip the coin yourself.

The following is a summary of the book "Foundations of Cryptography – Basic Tools" [10]

  1. "Knowledge" is related to "calculation difficulty", while "information" is not
  2. "Knowledge" is related to what the public knows, and "information" is mainly related to something that is publicly available.

Note: I was asked if the definition of information and knowledge here is related to the complexity of Kolmogorov. According to the algorithm information theory, the amount of information of a string can be measured by the length of the smallest program that generates the string. I don't understand this question very much. I hope that the professionals passing by will leave a message.

Verifiable calculation and circuit satisfiability

Looking at the map three-staining problem above, do you not feel it, as if this is just an academic question, how is it related to the real problem? The map three-staining problem is an NP-Complete problem, which is a noun in "computation theory." There is also a problem called "circuit satisfies the problem" which is also an NP-Complete problem. NP-Complete is a kind of problem. His solving process is difficult to complete in polynomial time, that is, "solving is difficult", but the process of verifying the solution can be completed by polynomial time, that is, "simple verification".

So what is the circuit? Here are three different "arithmetic circuits":

It can be seen that a circuit consists of a number of gates, including an additive gate and a multiplication gate. Each gate has several input pins and several output pins. Each gate does an addition, or multiplication. Don't look so simple, the code we usually run (no dead loop) can be represented by an arithmetic circuit.

What does this mean? We will try to solve the privacy protection of data by combining "zero knowledge proof" and "circuit satisfiability problem" below.

Let's think about a scenario: Bob gives Alice a piece of code P, and an input x, let Alice run it again, and tell Bob the result. Perhaps this calculation consumes resources, and Bob outsources the calculation process to Alice. Then Alice runs again and gets the result y. Then tell y to Bob. The following questions are coming:

How to let Bob believe that the result of code P running must be y without running the code?

Here is the time to think, everyone can think for five minutes…

(five minutes later……)

One of Alice's approach is to take the entire computing process down with a mobile phone that contains the computer's CPU, as well as the memory, the state of each transistor throughout the run. Obviously this is unrealistic. So is there a more feasible solution?

The answer is that Bob converts the program P into a completely equivalent arithmetic circuit and then hands the circuit to Alice. Alice only needs to calculate this circuit, and then the process can be taken with a mobile phone, or written down with paper, if the circuit size is not that big. Alice simply enters parameter 6 into the circuit and then records the value of all the pin lines connected to the gate during the operation of the circuit. And the value of the last circuit output pin is equal to y, then Bob can be sure that Alice did the calculation. Alice needs to write the input and output of all the gates of the circuit to a piece of paper and hand it to Bob. This paper is a proof of calculation.

In this way, Bob can verify that the proof on the paper is correct without repeating the calculation circuit. The verification process is very simple:

Bob in turn checks whether the input and output of each gate can satisfy an addition equation or a multiplication equation .

For example, Gate 1 is an addition gate. Its two inputs are 3, 4, and the output is 7. It is easy to know that the calculation of this gate is correct. When Bob checks all the doors, he can be sure that Alice did the calculations and did not cheat.

The content on this sheet of paper is a solution "satisfaction" of the arithmetic circuit P.

The so-called circuit satisfiability means that there is a solution that satisfies the circuit. If the output value of this solution is equal to a certain value, then the solution can "represent" the computational process of the circuit.

For Alice, if Bob verifies in this way, she has no room for cheating. But this method obviously has a drawback:

  • Disadvantages: If the circuit is large, the proof is very large, and Bob's inspection proves that the workload is also large.
  • Disadvantage 2: Bob knows all the circuit operation details, including input, during the verification process.

Black technology

Let's make some changes to the scenes of Alice and Bob. If, Alice has a secret, such as online banking passwords. Bob wants to know if Alice's online banking password is 20 characters long. And Alice thought about it and told him that the password should not be a problem. At this point, Bob converts a code that calculates the length of the string into circuit Q and sends it to Alice. Alice used the circuit Q to calculate her own password, and then sent the pins of all the gates of the circuit to Bob with the result of the operation 20.

Wai…t, this is a problem. After Bob gets all the internal details in the circuit operation process, don't he know the password? Yes, Alice obviously can't do this. So what should Alice do?

The answer is that there are many ways. The most familiar readers of blockchain technology are zkSNARK [11], zkSTARK [12], bullet proof BulletProof [13], and some relatively niche technologies can help Alice do To:

Alice proved to Bob that she had calculated the circuit in a zero-knowledge and used her secret input.

In other words, these "zero-knowledge circuit sufficiency proof protocols" provide Alice with a powerful weapon to prove to Bob that her online banking password is 20 in length, and Bob can't get any other useful anymore. Information. In addition to the online banking password, Alice can theoretically prove to Bob certain features of any of her private data, but does not reveal any other information.

The Zero-Knowledge Circuit Satisfiability Protocol provides one of the most direct technologies for protecting privacy/sensitive data.

In recent years, zero-knowledge proof construction technology has evolved rapidly and has been used more and more in the field of blockchain technology. The latest zero-knowledge proof technology, some technologies can make Bob high-speed verification certificate (a few milliseconds verification on mobile devices); some technologies can help all people to help verify (non-interactive zero-knowledge proof); some technologies Support for very small proof sizes (small to tens of bytes). We will introduce the follow-up articles in a step-by-step manner.

Written at the end

Whether it is the subtle number theory theorem, the map three-staining problem, or the circuit satisfiability problem. What is the significance of proving existence? All the proofs reflect the "asymmetry" of "proof" and "validation". Proof can be a very computationally intensive, or mental, activity, whether it is the "Femama Theorem" that takes hundreds of years, or the POW proof in Bitcoin, which condenses the cost of finding proof. Energy, proving that the process may be extraordinary and complex, occasionally requires genius to be born. The verification process must (or should) be a very simple, mechanical, activity that can be terminated within the (polynomial) validity time. In a sense, this asymmetry truly embodies the meaning of proof and demonstrates the value of zero-knowledge proof.

Roughly speaking, "proof" is the product of "logic", but "logic" and "calculation" are inextricably linked. You may feel vaguely aware of the relationship between "proof" and "calculation". They run through: such as mechanical reasoning, proof of expression, and interaction calculations. This is a fun but more ambitious philosophical issue.

The content of the article is inevitably inaccurate or not rigorous, and please ask the professionals to take the time to correct.


  • [1] Simon, Singer, Xue Mi. Fermat's Theorem: A mystery that puzzled the 358 years of the world's wise [M]. Shanghai Translation Publishing House, 1998.
  • [2] Alec Wilkinson. The Pursuit of Beauty: Yitang Zhang solves a pure-math mystery. The New Yorker. Feb. 2015.
  • [3] Martin, Davis, Zhang Butian. The Engine of Logic [M]. Hunan Science and Technology Press, 2012.
  • [4] Raymond Smullyan. Gödel's Incompleteness Theorems, Oxford Univ. Press. 1991.
  • [5] Turing, Alan. "On computable numbers, with an application to the Entscheidungsproblem." Proceedings of the London mathematical society 2.1 (1937): 230-265.
  • [6] Pierce, Benjamin C., et al. "Software foundations." Chinese translation:
  • [7] Kolata, Gina. "Computer math proof shows reasoning power." Math Horizons 4.3 (1997): 22-25.
  • [8] Goldwasser, Shafi, Silvio Micali, and Charles Rackoff. "The knowledge complexity of interactive proof systems." SIAM Journal on computing 18.1 (1989): 186-208.
  • [9] zkPoD: Blockchain, zero-knowledge proof and formal verification, achieving fair trade without intermediation, zero trust. Ambi Lab. 2019.
  • [10] Oded, Goldreich. "Foundations of cryptography basic tools." (2001).
  • [11] Gennaro, Rosario, et al. "Quadratic span programs and succinct NIZKs without PCPs." Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer Berlin, Heidelberg, 2013.
  • [12] Ben-Sasson, Eli, et al. "Scalable, transparent, and post-quantum secure computational integrity." IACR Cryptology ePrint Archive 2018 (2018): 46.
  • [13] Bünz, Benedikt, et al. "Bulletproofs: Short proofs for confidential transactions and more." 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 2018.

Author: Yu Guo

This article has been updated to Github:

Source: Ambi Lab