Introduction to Blockchain Technology | Distributed Hash Table (Part 1)

DHT, a distributed hash table, is a key technology for distributed storage and download, and is now widely used in P2P networks. To understand the technical implementation of a distributed hash table, you first need to know what a hash algorithm and hash table are.

The hash algorithm is simply a function. This function has some special properties:

1. The length of the output is fixed regardless of the complexity of the input.

2. No matter how small the input changes, the output will be completely different from before.

3. The original information cannot be reversed by hash value, which is irreversible.

A hash table, also known as a hash table, is used to store key-value pairs. When encrypting a file, you not only need to store the hash of the file, but you also need to store the file itself. This requires storing the files and hashes in pairs for easy searching.

For ordinary hash tables, the cost of expansion is very large. Because the normal Hash calculates the address as follows: Hash(Key)%M , after the expansion, the position of the element has changed. It is mathematically proven that the expansion is twice as large, which results in the least number of elements of the hash. This is why the hash table is always expanded to twice the original size in order to reduce the movement of elements during expansion.

So the essence of the hash table is that when you look for a certain information, it is ultimately a process of taking the hash value to find a location, but we also see that when there are new nodes to join or the old node When exiting, you need to re-store key-value pairs. When the amount of information is large, it will easily lead to network congestion.

In order to solve the problem of node changes, in 1999, MIT's Karger et al. invented a consistent hash, which really allowed distributed storage to enter a stage where it could be scaled up.

What is a consistent hash?

First, organize the hash space into a virtual ring. Let's take the example of SHA256. SHA256 has 2^256 hashes, which combines all possible hash values ​​into a circle, from 0 to 2^256−1, sorted clockwise, and at 12 o'clock, 0 coincides with 2^256−1 .

Then, the server used by each node for storage performs a hash calculation. You can select the server's IP address or host name for hash calculation (use IP address to prevent host name overlap, it is very important here), and calculate it according to the calculation. I will put the node servers in the corresponding positions of the ring in order.

Finally, the key of the data (key, the keyword of the data) is calculated using the same hash algorithm, and the position of the data on the ring is determined. From this position, the clock is “walked” along the ring, the first one. The server you encounter is the server it should be located to.

In this way, when a node stops working for some reason, it only affects the data between its counterclockwise direction and the previous node. Similarly, when a new node is added, it only affects its inverse. The data in the hour hand direction to the previous node.

Is there a disadvantage to ordinary consistency hashing? Of course there is. To sum up, it is: without considering the situation of each machine, it can not achieve a good load balancing.

Is there a solution? Yes, it is virtual machine technology. For example, when there are only two nodes in a hash ring:

It can be seen that there is a possibility that the above allocation is uneven. In order to achieve load balancing without adding physical devices, we add some other content to the IP of the two nodes (for example, #1. #2) Compute the hash again and get Node 1.1, Node 1.2, Node 2.1, Node 2.2, so that it implements the following situation:

When these virtual nodes join, the allocation of data will naturally change. Of course, there is only one physical entity of these virtual machines, which will naturally be stored in the real physical entity, and naturally the load can be balanced as much as possible.

In the underlying technology of IPFS, DHT is a very important part, and the reason why IPFS will prefer the fixed IP of the public network is because the fixed IP does not change the position in the hash ring, and thus does not cause the node to change. Additional network load. This is one of the reasons why mine revenues will be higher and more stable.

At this point, the basic technology of the distributed hash table DHT has been introduced.

(Author: white area)

We will continue to update Blocking; if you have any questions or suggestions, please contact us!

Share:

Was this article helpful?

93 out of 132 found this helpful

Discover more

NFT

FIFA Teams Up with Modex for Epic NFT Collection at Club World Cup 🏆⚽️

FIFA and Modex are teaming up for a special limited NFT collection that will have practical uses in the real world, j...

Blockchain

Breaking News: Andreessen Horowitz Plans to Raise $3.4 Billion Fund for Early-stage Crypto Businesses

According to reports, Andreessen Horowitz is preparing to gather a $3.4B fund specifically for upcoming crypto ventur...

Blockchain

FTX’s Redemption Plan: A Second Chance for Crypto Dreamers

FTX announces revised plan to reimburse creditors affected by bankruptcy with significant payouts.

Blockchain

Swiss National Bank Partners with SIX Digital Exchange for CBDC Pilot Project

The Swiss National Bank (SNB) has teamed up with SIX Digital Exchange (SDX) to launch a trial project for a new wCBDC...

Blockchain

Crypto.com: Empowering Charities Through Digital Innovation

Popular cryptocurrency exchange, Crypto.com, has signed a Memorandum of Understanding (MOU) with Singapore Charities,...

Market

OKX Teams Up with Komainu and CoinShares: Adding a Traditional Touch to Institutional Asset Trading

Fashionista to read OKX collaborates with Komainu and CoinShares to drive institutional use of digital assets.