# Getting started with blockchain | What is a hash function

Today we take a look at an important concept of bitcoin: the hash function. This article first introduces what are the hash function and the hash function's three characteristics of collision resistance, privacy and puzzle friendliness. Combined with the characteristics of the blockchain, the specific application of the hash function in the proof of the blockchain mining workload is shared, and the relationship between the hash function and the flood control modification is briefly explained.

01

What is a hash function?

Hash Function is a widely used function in the hash algorithm of encryption algorithm, also known as hash function or hash function. As a public function, the hash function can map the message M of any length into a short length and fixed length value H(M), which is called H (M) as a hash value or Message Digest. Hash operation is a one-way cryptosystem, that is, an irreversible mapping from plaintext to ciphertext. There is only an encryption process and no decryption process. Its function expression is:

y = H(x)

Secure Hash Algorithm 256 (SHA-256) is a hash function commonly used by Bitcoin. This article uses the SHA-256 algorithm as an example. Regardless of the digital format of the input and the size of the file, the hash output is a fixed-length bit string, such as 256 bits in binary, that is, 256 strings of 0 or 1, and 64 in hexadecimal numbers. Bit, for example:

0xd7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592

Convert to binary:

1101011110101000111110111011001100000111110101111000000010010100011010011100101010011010101111001011000000001000001011100100111110001101010101100101000111100100011011010011110011011011011101100010110100000010110100001011111100110111110010011110010110010010

02

Three characteristics of the hash function

To make the hash function password safe, you need to add the following three features: collision resistance, privacy, and puzzle-friendly.

Characteristic 1: Collision resistance

The Hash function H takes a variable length data block M as input and produces a fixed length hash value h = H(M). Let M be the original image of H. Since H is a many-to-one mapping, there are multiple original images for any given hash value H. If x ≠ y is satisfied and H(x) = H(y), ie two different inputs x and y produce the same output H(x) = H(y), it is called a collision. If no one can find a collision for the hash function H(x), the function is said to have collision resistance. For a given data X, it is easy to calculate the hash value H = H(M); but it is difficult to inversely calculate X according to H; it is difficult to find X and Y such that H(X) = H(Y). In fact, by compressing large-space messages into small spaces, collisions must exist. Suppose the length of the hash value is fixed to 256 bits. If the order is 1, 2, … 2^256+1, the 2^256+1 input values ​​are calculated one by one, and two input values ​​can be found. The hash value is the same. But don't be too happy, because you have time to figure it out, it is yours.

For a hash function with a hash value of 256 bits, an average of 2^128 hash calculations is required to find the collision pair. If the computer performs 10,000 hash calculations per second, it takes about 10^27 years to complete 2^128 hash calculations.

Feature 2: Concealment

Concealment refers to the probability distribution when the input r is selected from a higher-order minimum 墒, and it is impossible to determine x under the given H(r||x) condition. Simply put, you can't get input from the output. Let y=H(x), if we know y, it is difficult to quickly find x that satisfies the condition, then the hash function H(x) is said to be concealed. Concealment means that it is almost impossible to find its inverse function x'=H'(y). In fact, there should be more than one x satisfying the condition, and even one of the hidden requirements requires no one. This is because the unidirectionality of the hash function mentioned above is not feasible for a given hash value at 2^128 hash calculations.

One application of privacy is "commitment", with information (message) and a random number (nounce) as input, and the resulting output is promise. The promise here consists of two things: by committing, others can't know your input; and once you open up the promise, you can't deceive others that the promise is another message. Therefore, the application here is hidden. In addition, each commitment requires a new random number, which is important for the security of the promise.

Feature 3: Puzzle friendly

If for any n-bit output value y, assuming k is selected from the high-order minimum entropy distribution, if a feasible method cannot be found, find x in a much smaller time than the n-th power of 2, guaranteeing H(k||x) = y is true, then we call the hash function H puzzle-friendly. In the search puzzle application, we will create a search puzzle that is a mathematical problem that requires searching a large space to find a solution. To put it simply, it is required to have randomness. Any input can get a fixed number of bits of output. It is difficult to find any correlation between input and output. Even if the input is slightly adjusted, the output obtained is random, except by adjusting. There is no other better way to enter the content for "violent trials."

Collision resistance, concealment, and friendly puzzles, each with its own focus, but also intertwined. In particular, the privacy and the puzzle are very similar. For readers who want to understand deeply, what needs to be pointed out is that the secret is to determine x, and the puzzle is friendly to find k. For the hash function H(x), we make the following summary:

• For y = H(x), knowing that x calculates its hash value y is simple;
• Collision resistance: it is difficult to find different x, y, so that H(x) = H(y);
• Concealment/puzzle friendly (approximate understanding): For y = H(x), knowing the hash value y, it is difficult to find x.

03

The meaning of the hash function

Hash function and proof of workload

First recall the block structure. The block of Bitcoin consists of the block header and the list of transactions contained in the block. The size of the block header is 80 bytes, the 4-byte version number, the 32-byte hash value of the previous block, and 32. The Merkle Root Hash of the byte, the 4-byte time suffix (current time), the current difficulty value of 4 bytes, and the random number of 4 bytes. The list of transactions included in the block is appended to the head of the block. The first transaction is a coinbase transaction, which is a special transaction for the miner to get rewards and fees.

The general structure of the block is shown in the figure: among them:

• Version is the version number of the current block
• The hash value of the multiple transactions contained in the block constitutes the Merkel data structure of the block
• Nbits is the computational complexity of the current Bitcoin protocol settings
• Ntime is the timestamp of the block
• Pre_hash is the hash value of the previous block

Block_header = version + Merkel + nbits + ntime + pre_hash

A block header with a fixed length of 80 bytes is the input string for the bitcoin workload proof. Constantly change the random number in the block header, that is, the value of nonce, and do double SHA256 operation (ie SHA256(Block_Header)) for each changed block header, and make the result value and the target value of the current network. Contrast, if it is less than the target value, the problem solving is successful and the workload is proved to be completed. Summary: It can be easily understood that the process of bitcoin workload proofing is a process of performing a SHA256 hash operation by continuously changing the block header (ie, trying different random values) as input to find a specific format hash value. That is, a certain number of leading zeros are required, and the more the number of leading zeros required, the greater the difficulty.

Hash function and tamper resistance

When x changes only one byte, does y change only one byte, can this small difference tamper with the facts?

No!

If the input x has a 1 bit change, the hash result will have more than half of the bit changes in the hash value, which is also called the "avalanche effect".

The blockchain is called a chain because each block holds the target hash value of the previous block (the creation block has no previous hash value). A transaction in a block has been tampered with because the Mercker tree algorithm has changed the target hash value of this block and caused a chain reaction. Based on this block, the hash values ​​of all subsequent blocks will change, and any accounting node can easily find that the books have been tampered with, thus ensuring that all blocks on the blockchain have tamper-proof functions. .

The collision resistance of the hash function is used to verify the integrity of the block and transaction, and can be identified as soon as it is falsified.

Finally, we summarize the article:

1. The hash function inputs the information x of any length, and the length of the output hash value H(x) is a fixed 256 bits;

2. The hash function has three important characteristics of collision resistance, concealment and puzzle friendly. Specifically:

• For y=H(x), knowing that x calculates its hash value y is simple;
• Collision resistance: it is difficult to find different x, y, so that H(x) = H(y);
• Concealment/mystery friendly (approximate understanding): For y = H(x), knowing the hash value y, it is difficult to find x;

3. Do a double SHA256 operation (ie SHA256 (Block_Header)) for each changed block header, and the result value is smaller than the target value (there are enough leading zeros), and the workload is proved to be completed;

4. The anti-tamper modification under the hash function means that even if the input x changes by 1 bit will change everything, it cannot be tampered with.

Today's currency letter small class will come here first, Jianghu Road is far, see you next time.

– END-