Discussion | Random Number Security Problem in Blockchain

Foreword

"Random numbers" are not uncommon in computer programs. Developers often use random numbers to simulate and predict values. In C++ programs, we often use certain seeds to generate random numbers. In computer programs, random numbers can be divided into true random numbers and pseudo-random numbers. True random numbers are very difficult to achieve, such as using sieves, runners, and so on. For pseudo-random numbers, we are divided into:

1. Strong pseudo-random numbers: Unpredictable random numbers, often used in cryptography.
2. Weak pseudo-random number: A random number that is easy to predict.

The random number has three characteristics, as follows:

• Randomness: There is no statistical bias, it is a completely messy sequence, that is, distribution uniformity and independence.
• Unpredictability: The next occurrence cannot be inferred from the past series
• Non-reproducibility: the same sequence cannot be reproduced unless the sequence itself is saved

The characteristics of random numbers have a certain relationship with the classification of random numbers. For example, weak pseudo-random numbers only need to satisfy randomness, while strong random numbers need to satisfy randomness and unpredictability, and true random numbers need to satisfy 3 at the same time. Features.

The above is the case of macroscopic random numbers, but in the case of blockchains, the random numbers are different. For distributed APPs, the blockchain inherently has drawbacks in the generation of random numbers. It is a difficult challenge to get all the users to get the same random number value and not let the delay too high.

In the blockchain, there are many application scenarios for random numbers, such as sweepstakes, verification codes, tokens, password application scenarios (generating keys, generating salt), and so on.

The security of random numbers is also a long-standing problem. A broad principle is that the generation of randomness is best not controlled by any individual. For example, the way to obtain randomness from the data of a certain block of Bitcoin in the future is not convincing, because these randomness is ultimately determined by an individual, and it cannot be proved that the relevant person has not done evil.

Let's take a detailed analysis of the random number generation in the blockchain and implement the code implementation for the current random number security.

Random number in blockchain

We all know that PoW's solution is to let the whole network perform the hash calculation together, and set a problem of calculating the hash when the question is made. Whoever calculates it first will win. Therefore, this means that the probability of winning with high computing power is high, and the probability of winning with low computing power is low, so that the winner is random in this way. There is also a random number setting in the setting of the calculation puzzle. This random number needs to be visible to all nodes and can be unpredictable.

For Pos, in order to solve the problem of wasted power in Pow, it chose to randomly elect a node to pop out, and the probability of being selected is related to the share it has. If the system can guarantee that the process is unpredictable and real, then the malicious node can increase the probability of the double-flower attack by increasing its own share and increasing the probability of being selected. Compared with Pow, the Pos attacker must first become a large-capital player if the attacker wants to achieve the attack. If the price of the coin falls due to the attack, the attacker will suffer the greatest loss.

According to our analysis, both of the above cases involve the generation of random numbers. We assume that there is a problem with the random number here, for example, there is a predictable situation. Then the attacker can do evil indefinitely. In Pow, he knows the content of the random number ahead of other nodes, and can calculate it one step ahead of time, which means he can have a higher probability of obtaining the accounting rights. For Pos, the attacker can change the conditions of the random number based on the generation mechanism of the random number to increase the probability of making the choice to be the biller, thus improving the right to book.

Traditionally, PoS schemes attempt to start with data from the chain, such as using the hash value of the previous block, the timestamp of the previous block, etc. as the source of the random number, but these bring additional security risks. Because the information of the block itself is written by the node, and then the subsequent blockers are elected according to the information inside. There is a suspicion of circular argumentation, and the security will not be too good. This is part of the reason why the PoS solution is traditionally considered to be less reliable than PoW.

A blockchain is a distributed system that requires the computational results of each node to be verifiable and agreeable. However, the traditional pseudo-random number generation algorithm is related to the physical state or operation state of a single machine. Different machines or nodes will exhibit different operation results, but the blockchain cannot be designed this way.

So for DAPP, there are three solutions for reference:

The first is to introduce third parties and let trusted third parties provide random numbers for contracts; the second is to achieve multi-party collaboration through contract setting to achieve pseudo-random number generation, and to provide consistent random numbers for other contracts. The third is to allow the contracts on all nodes to collect the same seed, and then calculate the same random number sequence by pseudo-random algorithm.

Let's take a look at it:

1 Trusted third party participates in the generation

In the blockchain system, in order to make the random numbers of all nodes the same, we can connect all the nodes of the blockchain to the third-party platform. Simply put, we can find an application to help us generate random numbers, and all nodes trust this application.

Although this can solve the problem of random number generation, we know that the blockchain is a representative of decentralization. It is somewhat contrary to theory if a centralized application is introduced to help. However, whether the third party is trustworthy and whether it can provide high quality random numbers is a problem. If this applies to other attackers to join forces or be hacked, the threat is huge.

However, there are still some landing DAPPs that use this design thinking. For example: External oracles: Oraclize.

Oraclize provides a bridge between Ethereum and the Internet. With Oraclize, smart contracts can request data via the web API. Such as the current exchange rate, weather forecast or stock price. One of the biggest effects is the ability to provide pseudo-random numbers. Some contracts use the URL connector in Oraclize to connect to [random.org] to get pseudo-random numbers. As shown below:

In this application, Oracle is the party that provides the data. Since blockchain applications such as Bitcoin scripts and smart contracts do not have direct access to the data they need, they need such data: the price of assets and financial applications. This app is mostly used for weather related information for point-to-point insurance. Gambling random number generation. Oraclize is a oracle, independent of the blockchain system, the smart contract sends a request to Oraclize, and when Oraclize listens to the relevant request on the chain, it generates a random number and returns the result to the blockchain.

The following code is a function that is accessed in Ethereum:

Basic contract participation

The second method is one of the most consistent distributed ideas in the blockchain. He needs different participants in the blockchain system to cooperate to generate pseudo-random numbers. The specific algorithm we explain is as follows:

First, in the generation of random numbers, in order to achieve the security features of generating random numbers, we need to use the theory of cryptography to enrich our concept. First let's look at the promise and open (Commitment & Open).

In the promised and open application scenario, if there are two users A and B here. The two of them did not face each other, but they wanted to play a rock-paper-scissor game and decide the winner. So what should we do? If the time is not synchronized, then one person can see the situation of another person, which means that they can not compete fairly. So we have to take some measures for each person's results:

• They make their own choices and then make their choices a hash; (H(A), H(B))
• Exchange this hash;
• After both parties receive the hash of the other party, they exchange the choices of both parties;
• Verify that the other party's choice is consistent with the previous hash;

In the process, you can also add your own public key to make evidence to prevent future remorse.

In this way, both parties know the choice of the other party, and can also confirm that the other party's choice is made in advance. This hash value is called a promise, because it contains confidential information, but does not leak confidential information, and finally sends the corresponding secret information, which is called open promise.

Let's talk about how to record and punish a bad node in the process of producing a random number. Below we introduce the secret sharing scheme:

Secret sharing means that a person can split a piece of information that needs to be kept secret into n shares and send them to n people. As long as the malicious nodes do not exceed a certain number, eventually everyone can combine the pieces of information to restore the original information. And even if the distributor cheats, everyone can check it out. For example, the password sharing scheme (3, 9) divides a secret into nine copies and then assigns it to nine people. As long as three of the nine copies are put together, the secret can be recovered.

For example, in the direct coordinate system below, we all know that y = ax + b linear coordinates requires two points to determine this line. So we can use this equation to do the sharing scheme of (2, n). We can find n points on the image and distribute them to n users. Then, if two users put the points together, they can be restored to a straight line, and the value of b can be obtained. The value of b is our hidden secret.

Knowing this, if the evil node does not send a random number open method, then we can also use a reasonable number of users to open the secret. Ensure the normal operation of the system. If he wants to cheat on the split information, everyone can check it out and kick him off.

Finally, because we are a blockchain, we need to broadcast the information in the protocol process, we can write directly to the chain, which can simplify the implementation, and does not require all voting nodes to be online at the same time, and if someone cheated, cheating The record will always be saved on the chain. Taking appropriate penalties will also cause malicious nodes to close their hands.

After talking about the basics of cryptography, we refer to a diagram to represent this process:

Public seed collection generation

In the third way, the contract uses the information on the common block to perform the seed value of the random number, and the smart contract generates a pseudo-random number based on the seed. The biggest drawback of this method is that once the hacker knows the random number generation algorithm and can get the correct seed, it can easily launch a random number attack on the smart contract.

However, unlike traditional centralized systems, the information on the blockchain is visible to all, and the seeds on the blockchain are almost “transparent”. It is the block information on the chain, and the smart contracts on all nodes can be obtained. In principle, the malicious contract used by the hacker can also obtain these values.

Below we list several dangerous random number generation functions":

In the example, we see that the block.number function is used. And in our experiments, we can see:

In addition, we can use randomness to generate random numbers, but this is also not safe.

It can be seen that this method also needs to be designed more deeply. When the hacker gets the random number algorithm, he can make a hundred percent prediction based on the content of the algorithm to do evil.

In our time, we found that many contracts use the block.blockhash(block.number) function as the seed for generating random numbers. This is very dangerous. The block.number variable gets the current block height. But when it is not executed, this "current block" is a future block, that is, only when a miner packages and publishes the transaction, the future block becomes the current block, so the contract can reliably acquire the area. Block hash of the block.

So this information will only return a value of 0 no matter how it works.

Source: Public Number: Blockchain Research Laboratory