Science | Understanding Ethereum's P2P Network

Author: Larry Hu

Translation & Proofreading: haiki & A Jian

Source: Ethereum enthusiasts

This article aims to help you understand the P2P network and explain some implementation details of Ethereum. P2P technology uses the rich resources of terminal equipment to alleviate the shortcomings of centralized systems. Since the 1990s, this technology has been adopted by well-known software such as eMule, bitTorrent and Skype. P2P technology is also a core component of the Bitcoin or Ethereum blockchain system. Many people have heard of P2P, but don't know what it is. Let's start with understanding what a P2P network is.

What is a P2P network?

 

A peer-to-peer (P2P) network is an overlay network-that is, it is built on the open Internet. From a mathematical perspective, a P2P network can be viewed as a directed graph G = (V, E), where V is the set of peer nodes in the network, and E is the set of edges connected by the peer nodes (also (A collection of connections between nodes). Each peer node p has a unique identification number pid. Edge (p, q) in set E means that p can send messages to q through directly connected paths; that is, p uses q's pid as the destination address to send messages to q over the network. Although in the underlying TCP / IP network, similar IP addresses can be translated to be geographically close to each other, there is rarely such a clear direct association.

Ideally, there should be a path connecting all peer nodes. But because each node has only an incomplete view of the network topology and other peer nodes, the network overlay needs intermediate nodes to forward messages to the correct destination. The structure of the graph provides multiple intermediate paths for each pair of nodes, so even if the peer nodes change, the connectivity of the graph can also provide the network's resilience . For each peer node, the connectivity of the graph is reflected by the adjacency relationship with other peer nodes. When peer nodes join or leave the network, neighboring peer nodes may hold incorrect adjacency information. Therefore, network overlay maintenance mechanisms (Overlay maintenance mechanisms) are used to save updated adjacency information, so that connectivity between all nodes is maintained.

Participants in a P2P network provide some resources to other network participants. No centralized coordinator is needed, and each peer node can contribute computing cycles (CPU), disk storage, and network bandwidth. In the traditional client-server model, the server provides resources and the client uses resources. In contrast, in a P2P network, peer nodes are both providers and consumers of network resources. Therefore, P2P networks can solve some shortcomings under the client-server model, such as scalability and single points of failure .

Generally speaking, P2P networks have a threshold, and the contribution of nodes' resources is higher than this threshold to join the network. The criteria for measuring resource contributions should be fair. For example, the average contribution of each peer node in the network should be within the statistical range of the overall average of the P2P system. The contribution of resources should be mutually beneficial. The benefits that can be obtained after contributing, attract users to join P2P applications.

How does Ethereum's P2P network work?

 

Ethereum 's official client node software, Geth , implements a peer node discovery protocol (RLPx node discovery protocol) based on an overlay maintenance mechanism (called the Kademlia distributed hash table). Although Kademlia is designed to efficiently locate and store content in a P2P network, Ethereum's P2P network only uses it to discover new peer nodes.

Kademlia

In the Ethereum network, each client node is equipped with an enode ID, and then this ID is hashed to a 256-bit value using the SHA3 algorithm. Kademlia uses XOR operations to define distances, so the distance between two 256-bit numbers is their bitwise exclusive OR. Each peer node has a data structure containing 256 different buckets. Each bucket i stores 16 nodes with a distance of 2 i-1 to 2 i from the node. In order to find a new peer node, the Ethereum node chooses itself as target x, finds 16 nodes closest to target x from the bucket, and then requests these 16 nodes to let them find 16 each from their bucket. A node "closer" to the target x is returned, so that up to 16×16 newly discovered nodes will be obtained. Then request 16 of the 16×16 newly discovered nodes closest to the target x, and let them return the 16 nodes closer to x. This process continues iteratively until no new nodes are found.

-Exclusive OR operation diagram-

-bucket map corresponding to distance-

Peer-to-peer communication

Geth uses UDP connections to exchange information for P2P networks. There are 4 types of UDP messages. A * ping * message requests a * pong * message as a return. This pair of messages is used to determine whether neighboring nodes are responsive. A * findnode * message requests a * neighbors * message (which contains a list of 16 nodes already known by the responding node) as a return. When the connection of the peer nodes is established, the Geth nodes exchange blockchain information through encrypted and authenticated TCP connections.

 

data structure

The Geth client uses two data structures to store information about other nodes. The first is a long-term database called db , which is stored on disk and the data persists after the client restarts. The db contains information about each node that the client has interacted with. Each record of db contains the node ID, IP address, TCP port, UDP port, (this client) the time when the ping was last sent to the node (in the record), the last time the pong was received from the node, and the node responded to the findnode message Failures. If it takes more than one day to receive a pong message from a node for the last time, the node will be moved out of db.

The second type of data structure is a short-term database called a table . The table is empty when the client restarts. The table contains 256 buckets, and each bucket stores up to 16 records. Each record stores information about other Ethereum nodes-node ID, IP address, TCP port and UDP port. If a node in the record fails to respond to the findnode message continuously, it will be moved out of the table more than 4 times.

When a client is started for the first time, its db is empty, and only 6 hard-coded boot nodes are known. Later, when the client starts to discover peer nodes, the client adds nodes to db and table according to the mechanism described above.

If you want to read more about the Ethereum P2P network, you can refer to the following articles contributed by members of the Ethereum community:

  • "RLPx Node Discovery Protocol" by Felix Lange, Gustav-Simmonsson, and Roman Mandeleil
  • "Peer to Peer" by Felix Lange
  • "Kademlia Peer Selection" by James Ray

reference:

Vasilios Darlagiannis, (2010). P2P Systems and Overlay Networks, [PDF file] Retrieved

from: https://www.iti.gr/iti/files/document/seminars/p2p_eketa_090610_v2.pdf

S. Umamaheswari and Dr. V. Leela, (2011, Mar.01). P2P Overlay Maintenance Algorithm, [PDF file] Retrieved from: http://journals.sagepub.com/doi/pdf/10.1260/1748-3018.6. 3.555

(Finish)