## Hash function

Hash function: h=H(Data)

## definition

The hash function H takes the variable size data Data as input and produces a fixed length h value.

A cryptographic hash function is a mathematical function. The characteristics of the hash function itself:

1, input arbitrariness: the input of the function can be any size of data;

2, output fixed: the output of the function is a fixed size of data;

3, can carry out effective calculation: that is, in a reasonable time, the input data can be calculated to obtain the output.

For blockchain technology and encrypted digital currency, to make the hash function password-safe, you also need to have the following characteristics:

##### 1) Collision resistance

The concept of collision: If there are two different values X, Y, H(X) = H(Y), then the hash function H is called a collision. Collision resistance means that two different values X, Y cannot be found such that H(X) = H(Y).

From the interpretation of the collision resistance and the characteristics of the hash function, it is easy to conclude that the collision is an inevitable phenomenon. Because the input space of the hash function is data of any size, the output is a fixed size of data. This means that the input space is larger than the output space, so collisions are inevitable.

For example, if we define that the output of a hash function has only 0 and 1 results, then it is clear that collisions are very easy to happen.

Then a good hash function should look like this:

- Any y, find x, make H(x) = y, very difficult
- Given x1, find x2, making H(x1) == H(x2), very difficult
- Find any x1, x2, making H(x1) == H(x2), very difficult

For example, for a 256-bit output hash function, the worst case is to perform 2256+1 hashes, and an average of 2128 hashes. This level is almost 10^27 years for a PC. Therefore, we can think that this is a collision resistance.

It is because of the collision resistance that there is a saying that the hash is on the chain. The so-called hash chaining is a typical deposit scenario: this scenario allows us to write the hash output as a message digest into the block structure of the blockchain. For example, if there is a very important file, the file is very large, and it is not feasible to write the file itself into the blockchain. Owner A hopes that the file will be safe and reliable in subsequent use, and there is no risk of being tampered with. Owner A then hashes the file and writes the hash value to the blockchain. In the subsequent use process, you only need to do a hash of the file to be used, and compare it with the hash value in the blockchain. If the hash values are the same, the proof is not tampered with. If the hash values are different, the proof is tampered with.

##### 2) Unidirectional (secret)

Unidirectional (concealment) means that if we only know the output of the hash function h = H(X), there is no feasible algorithm to calculate the input value X.

Common hash algorithms are: MD5, SHA1, SHA224, SHA256, SHA384, SHA512, SM3

SHA-256 is a more efficient hash function used primarily by the Bitcoin world.

#### Hash pointers and data structures:

A hash pointer is a data structure that is a hash pointer to the data storage location and location data. When we learn C, we know that a normal pointer is an address that points to the location of the data store. But the hash pointer not only tells you where the data is stored, but also tells you if the data in this location has been tampered with. That is, a pointer that can clarify the hash value of the data under a certain timestamp.

A linked list constructed by a hash pointer is actually a blockchain.

As shown below:

Common linked list structure:

Common linked list structure

Blockchain structure:

Block linked list structure

Therefore, a typical application of blockchain is “tamper-proof”.

Merkle trees: A binary tree using a hash pointer is the Merkel tree. In the structure of the Merkel tree, all the data blocks are grouped in pairs, and the hash pointer of each two data blocks is stored in the parent nodes of the two data blocks, and the hash pointers of the two parent nodes are again They are stored in their parent node, and so on, reaching the root node of the Meck tree.

Its structure is as shown:

Merkel tree

The characteristics of the Merkel tree – the subordinate certificate

Suppose that if it is necessary to prove that a certain data block belongs to a certain Merkel tree, only the information of the data block and the block information of the data block leading to the root node of the tree can be verified. You can ignore other parts of the tree.

As shown in the figure below: If we want to verify whether D3 belongs to the ROOT tree, we only need to display the information of the nodes D3, N2, N5.

Affiliation

(Author: BitTribeLab Todd)