zkSNARKs contract library "input pseudonym" vulnerability, many mixed currency projects

Author: p0n1 (Ann laboratories)

A large number of zero-knowledge proof projects have introduced the "Input Aliasing" vulnerability by mistakenly using a zkSNARKs contract library, which can lead to forgery proof, double-flowering, replay and other attacks, and the attack cost is extremely low. Many Ethereum community open source projects have been affected, including the three most commonly used zkSNARKs zero-known development libraries snarkjs, ethsnarks, ZoKrates, and the recent three hot coins (anonymous transfer) applications hopper, Heiswap, Miximus. This is a bloody case caused by a piece of code that Chris, the father of the Solidity language, posted two years ago.

Double flower vulnerability: the initial exposure problem

Semaphore is an anonymous signaling system that uses zero-knowledge proof technology, which evolved from the previous developer of the famous developer barryWhiteHat.

The Russian developer poma first pointed out that the project may have a double-flowered vulnerability [1].

The problem is in line 83 [2], please take a closer look.

This function requires the caller to construct a zero-knowledge proof that he can withdraw the money from the contract. To prevent "double flower" from happening, the function also reads the "discarded list" to check if a specified element of the certificate has been marked. If the certificate is in the deprecated list, the contract determines that the verification fails and the caller cannot withdraw the money. Developers believe that the same proof can not be repeatedly submitted for profit, which is considered to be effective against double-split or replay attacks.

However, contrary to expectations, a fatal problem has been overlooked here. The attacker can use the "input pseudonym" vulnerability based on the certificate that has been successfully submitted, and can quickly "forge the proof" by slightly modifying the original input, successfully pass the zero-knowledge proof of the 82nd line of the contract, and bypass the 83rd line. Anti-double flower check.

The problem dates back to 2017, and the example of zkSNARKs contract cryptography provided by Christian Reitwiessner, the inventor of the Solidity language, [3]. Since then, almost all contracts using the zkSNARKs technology on Ethereum have used this implementation. Therefore, you may be attacked by the following processes.

Mixed currency application: the hardest hit area of ​​this security issue

Zero-knowledge proof technology is the earliest and most widely used application scenario in Ethereum, which is a co-contract, or anonymous transfer, privacy transaction. Since Ethereum itself does not support anonymous transactions, and the community's voice for privacy protection is getting stronger and stronger, many popular projects have emerged. Here, taking the application scenario of the coin contract as an example, the security threat of the "input pseudonym" vulnerability to the zero-knowledge project is introduced.

A coin or anonymous transfer involves two main points:

  1. Prove that you have a sum of money
  2. Prove that the money has not been spent

For the sake of easy understanding, here is a brief description of the process:

  1. A will spend a sum of money.
  2. A want to prove that you own the money. A shows a zkproof, proving that he knows a hash of a hash (HashA), and that hash is on the leaf of the tree marked with root, and that another hash of this preimage is HashB. Where HashA is the witness and HashB is the public statement. Since A does not need to expose HashA, it is anonymous.
  3. The contract verifies zkproof and checks if HashB is in the obsolete list. If not, it means that the money has not been spent, you can spend (allow A's call).
  4. If it can be spent, the contract needs to put HashB in the deprecated list, indicating that the money represented by HashB has been spent and cannot be spent again.

The 82nd line verifyProof(a, b, c, input) in the above code is used to prove the legality of the money, and input[] is a public statement, which is a public parameter. Line 83 checks if the money has been spent by require(nullifiers_set[input[1]] == false).

Many zkSNARKs contracts, especially those that are rich in currency, have similar core logic to lines 82 and 83, so all have the same security issues and can be exploited with the "input pseudonym" vulnerability.


Vulnerability Analysis: How does a sum of money spend 5 times anonymously?

The function of the verifyProof(a, b, c, input) function above is to calculate and verify on the elliptic curve based on the value passed in. The core uses a function called scalar_mul() to implement scalar multiplication on the elliptic curve [4] ].

  /// @return the product of a point on G1 and a scalar, ie 
  /// p == p.scalar_mul(1) and p.add(p) == p.scalar_mul(2) for all points p. 
  Function scalar_mul(G1Point point, uint s) internal returns (G1Point r) { 
  Uint[3] memory input; 
  Input[0] = pX; 
  Input[1] = pY; 
  Input[2] = s; 
  Bool success; 
  Assembly { 
  Success := call(sub(gas, 2000), 7, 0, input, 0x80, r, 0x60) 
  // Use "invalid" to make gas estimation work 
  Switch success case 0 { invalid() } 
  Require (success); 

We know that Ethereum has built-in multiple pre-compiled contracts to perform cryptographic operations on elliptic curves, reducing zkSNARKs and verifying the consumption of Gas on the chain. The implementation of the function scalar_mul() calls Ethereum pre-compilation No. 7 contract, which implements scalar multiplication on the elliptic curve alt_bn128 according to EIP 196 [5]. The figure below shows the definition of this operation in the Yellow Book, which we often call ECMUL or ecc_mul.

In cryptography, the range of {x,y} of an elliptic curve is a finite field based on mod p, which is called Zp or Fp. That is, x, y in a point {x, y} on an elliptic curve is the value in Fp. Some points on an elliptic curve form a larger group of cycles, the number of which is called the order of the group, denoted as q. Encryption based on elliptic curves is performed in this circular group. If the order (q) of this cyclic group is a prime number, then encryption can be performed in the finite field of mod q, which is denoted Fq.

Generally, a larger cyclic group is selected as the basis of the encryption calculation. In the cyclic group, arbitrarily select a non-infinity point as the generator element G (usually the order q of this group is a large prime number, then any non-zero point is equivalent), all other points can pass G+ G+…. is produced. The number of elements in this group is q, that is, there are a total of q points, then we can use 0, 1, 2, 3, …. q-1 to number each point. The 0th point here is the infinity point, and the point 1 is the G just mentioned, also called the base point. Point 2 is G+G and point 3 is G+G+G.

So when we want to represent a point, we have two ways. The first is to give the coordinates {x, y} of this point, where x, y belong to Fp. The second way is given by n*G. Since G is public, just give n. n belongs to Fq.

Take a look at the scalar_mul(G1Point point, uint s) function signature, with point as the generator and calculate point+point+…..+point, a total of n points are added. This is to use the second method above to represent a point in the loop group.

In the Solidity smart contract implementation, you need to use the uint256 type to encode Fq, but the maximum value of the uint256 type is greater than the q value, then there will be a case where multiple numbers in uint256 will correspond to the same Fq after mod operation. The value in . For example, s and s + q represent the same point, that is, the sth point. This is because the point q is actually equivalent to point 0 in the cyclic group (each point corresponds to 0, 1, 2, 3, …. q-1). Similarly, s + 2q and so on correspond to point s. We refer to the phenomenon that you can enter multiple large integers that correspond to the values ​​in the same Fq, that is, the numbers are pseudo-names.

The elliptic curve achieved by Ethereum 7 contract is y^2 = ax^3+bx+c. p and q are as follows.

The q value here is the order of the group mentioned above. Then in the range of uint256 type, there are uint256_max / q, which means that there are at most 5 integers representing the same point (5 "input pseudonyms").

What does this mean? Let's review the verifyProof(a, b, c, input) function that calls scalar_mul(G1Point point, uint s) above. Each element in the input[] array is actually s. For each s, there are up to 4 other values ​​in the uint256 data type range, and the calculated result is the same as the original value.

Therefore, when the user presents the zero-knowledge proof to the contract, the contract puts input[1] (that is, an s) into the invalid list. The user (or other attacker) can also use another 4 values ​​to submit the proof again. These four values ​​have not been included in the "deprecated list" before, so the "forged" proof can pass the verification smoothly. With 5 "input pseudonyms", a sum of money can be repeated 5 times, and the attack cost is very low. !

There are more affected projects

There are far more problems than semaphore one. Many other Ethereum-rich projects and the zkSNARKs project have the same problem of allowing "input pseudonyms".

Among them, the most influential are several famous zkSNARKs libraries or framework projects, including snarkjs, ethsnarks, ZoKrates, and so on. Many application projects will directly reference or refer to their code for development, thus laying aside security risks. Therefore, the above three projects quickly carried out security repair updates. In addition, a number of well-known hybrid projects using zkSNARKs technology, such as hopper, Heiswap, and Miximus, were immediately synced. These projects are very hot in the community, and Heiswap is also known as "Vitalik's favorite project."

Solution for the "enter pseudonym" vulnerability

In fact, all projects that use the zkSNARKs cryptography contract library should conduct a self-examination immediately to assess whether the impact is affected. So how should I fix this?

Fortunately, the fix is ​​simple. Simply add a checksum of the input parameter size to the validation function, forcing the input value to be less than the q value mentioned above. That is, it is strictly forbidden to enter the pseudonym to prevent the use of multiple numbers to represent the same point.

The deep problem of exposure is worth reflecting

The security breach caused by the "input pseudonym" deserves serious reflection from the community. Let's review the whole story. In 2017, Christian posted his own zkSNARKs contract calculation implementation on the Gist website. As a computing library, we can think that his implementation has no security problems, does not violate any cryptography common sense, and perfectly completes the work of proof and verification in the contract. In fact, as the inventor of the Solidity language, Christian certainly does not make any low-level mistakes here. Today, two years later, this code has caused such a security storm. In more than two years, there may be countless peers and experts who have seen or used this code with only two hundred lines, but found no problems.

Where is the core problem? There may be deviations in the understanding of the program interface between the implementer of the underlying library and the user of the library. In other words: the implementation of the underlying library is not considered for the improper use of the application developer; while the upper application developer did not have a deep understanding of the underlying implementation principles and considerations in use, and made the wrong security assumptions.

Fortunately, the current common zkSNARKs contract library has been updated rapidly, and the "input pseudonym" has been eliminated from the underlying library level. SECBIT Lab believes that the update of the underlying library can certainly eliminate the security risks of subsequent users to a large extent, but if the seriousness of the problem is not widely publicized and spread, there will still be developers unfortunately used. The wrong version of the code, or is developed according to the wrong tutorial (like those tokens that are zeroed because of integer overflow), thus laying aside security risks.

The "input pseudonym" vulnerability can't help but remind us of the "integer overflow" vulnerability that was frequently exposed. There are many similarities between the two: they all come from the false assumptions of a large number of developers; they are all related to the uint256 type in Solidity; the coverage is very wide; there are also many tutorial code or library contracts that are hidden in the network. However, it is clear that the "input pseudonym" vulnerability is obviously more difficult to detect, has a longer latency, and requires more background knowledge (involving complex elliptic curves and cryptography theory). SECBIT Labs believes that with the rise of zkSNARKs, zero-knowledge proof applications, and privacy technologies, more new applications will emerge in the community, and more hidden security threats may be revealed. I hope that in this wave of new technologies, the community can fully absorb the painful lessons of the past and attach importance to security issues.


[1] https://github.com/kobigurk/semaphore/issues/16 [2] https://github.com/kobigurk/semaphore/blob/602dd57abb43e48f490e92d7091695d717a63915/semaphorejs/contracts/Semaphore.sol#L83 [3] https ://gist.github.com/chriseth/f9be9d9391efc5beb9704255a8e2989d [4] https://github.com/iden3/snarkjs/blob/0349d90824bd25688e3013ca26f7f73b51bc7755/templates/verifier_groth.sol#L202 [5] https://github.com/ethereum /EIPs/blob/master/EIPS/eip-196.md

Source: Ambi Lab