# Technology Sharing | Safe and Fair Block Dag Sorting

### Foreword

In previous articles, we saw that a problem that may arise in DAG topology is the order of defining blocks and transfers. In this article, we describe Taraxa's mechanism, which not only ensures sorting in Block DAG, but also ensures that the sorting process is safe and fair.

### Why is sorting challenging?

What is the difficulty in sorting in Block DAG?

If we only look at any DAG data structure, we can only establish partial ordering , which means that we can only determine the ordering relationship between some blocks, but not the ordering relationship between all blocks. Here is an example.

In the image above, we can say with 100% certainty that blue blocks must appear before green and orange blocks in the sort. why? Since the green and orange blocks point to the blue block, this means that the blue block already exists when the orange and green blocks are created, otherwise the orange and green blocks wouldn't know they would point to it.

But how do you know the ordering relationship between the green and orange blocks? There is no definite answer to this question. The answer depends on the type of sorting algorithm used. The order may be completely different for different algorithms.

In the ranking process, we not only have to retain a degree of freedom, but also need to make many choices to ensure that the ranking algorithm is fair and safe, because we need to ensure that there is no dishonesty participant to bias most honest network participants Or attack.

Now look at the sorting algorithm of Block DAG designed by Taraxa and see how we can solve these problems.

### Popular games

In Taraxa's Block DAG topology, each block includes backward pointers that not only point to a single parent block (as in a single chain topology), but also to what the block block proposer has seen and believed All blocks on the DAG boundary (edge). There are many concepts here, let's unravel them one by one.

What is a boundary? In Block DAG (or any DAG data structure), the boundary refers to the latest set of blocks with no other blocks pointing to them.

In the image above, the blue blocks are "boundary" blocks, while the rest are not. Boundary blockchains are interesting because these blocks are the basis of all blocks, and other blocks created by block proposers (analog block creators) are behind them in terms of order. In a way, these blocks will be referenced in the next new block. However, due to the different network propagation delays encountered by each node, you will see a completely different situation when encountering any given block, so not every node will see the same boundary (or the same Block DAG). Therefore, block proposers are likely to build their blocks based on different boundaries at any given moment.

What is a pointer or parent block? A pointer is indicated by an arrow in the figure. It is data embedded in a block that "points" to or references another block. In Taraxa's DAG block structure, the pointer points to the parent block set. The parent block is the block pointed to by the pointer, so it can be understood as the parent of their previous block.

In the image above, we see that the blue block has two pointers to the two parent blocks represented by the green block. This means that when the proposer creates the blue block, the proposer knows the two (2) blocks on the Block DAG boundary and considers these two blocks valid, and therefore points the created blue block to both of them.

The process of connecting these pointers to a block is actually a popular contest-in simple terms, which block will be pointed to by the most blocks and considered valid. In addition, if a block that points to a particular block is itself "popular" (meaning many other blocks point to them), then "popularity" will also continue to that particular block (parent block).

Now let's measure the popularity of each block.

### Weight-popular score

To measure the popularity of each block, we adopted the GHOST protocol proposed by Zohar and Sompolinsky. GHOST helps calculate the weight of each block (the "popularity" of each block) based on the pointer embedded in the Block DAG.

What is "weight"? If the Block DAG is rotated sideways, each child block and its descendants can be considered to be isolated from the parent block, so the weight of any block under GHOST is the sum of the weights of the block and all its descendants.

Weights represent all collective efforts to gradually build other blocks on this block, even if such efforts are not coordinated and spread across multiple generations.

The GHOST protocol is very simple,

Each block itself naturally has a weight of 1

The weight of each block is equal to its own weight (1) plus the weight of all the blocks pointing to it

Start from the boundary of the DAG block and then measure backwards

Here is an example.

Starting from the boundary blocks, we can assign weights to each block. The blue block is on the boundary of the DAG, and no block points to it, so it has a weight of 1. The green block has only one block (blue) pointing to it, so we add the green block's own block weight (1) plus the weight of the block pointing to it (1), we get 2, which is New weights for green blocks. Moving forward, the orange block points to 3 squares, so its weight is the sum of the weights of the three squares plus its own weight: 1: 5 + 20 + 12 + 1 = 38.

Looks good, why can't we sort by weight now? Okay, not quite. There are some very subtle issues with this simple strategy, let's explore them further.

### Method of obfuscation

Let's look at a simple example to understand why just using weight values ​​as the logical basis of the ranking mechanism is not enough.

(a) We see that there are two orange blocks, both of which have a weight value of 1, so their order is ambiguous.

(b) Added another block, which points to the previous two blocks, but now their weight values ​​are both 2 and there is still an ambiguous order.

(c) Several blocks have been added. The problem of orange blocks has not disappeared, and their weight is still the same. Not only that, the situation is worse now, the new problem is that the yellow blocks also have vague ordering.

In the example above, we assumed that everyone was honest-but we still had problems. What if you introduce a malicious player? That player can actively obfuscate the ordering on the block DAG structure. Let us look at the following example.

(b) We have 2 ambiguous orange blocks.

(d) Another block appeared and solved the problem, and we now have a clear ordering. Long live!

(e) But the good times are not long … A malicious player is actively seeking to maintain a fuzzy ordering and has created a block (red), which increases the boundary block, and the weight of this block is back to us The problem is that the two orange blocks are not well sorted again.

Another implication of this attack is that malicious players can not only keep the vague ordering, but also make the network in a state of continuous chaos by adding blocks before and after switching the ordering.

Although these are very simple examples, such problems may occur on a larger scale, that is, the entire block cluster is in an uncertain ordering state.

So how do we solve this problem? We next introduce the concepts of anchor chains and ghost pointers.

### Anchor chain-the most popular route

The next step after calculating the weights is to determine the anchor chain in the Block DAG. If the weight is the popularity score of each block, then the anchor chain is the most popular path through Block DAG. The blocks observed when crossing (walking) paths can also be widely observed through the network (very common).

How to build an anchor chain? Simple! Just choose a starting point-or in Taraxa's example, the starting point is the previous block with real-time finality (our article on real-time finality is about to be published)-and then walk forward Among the sub-blocks, the block with the highest weight is selected at each step. Here is an example,

In the above figure, starting from block G, there are 3 weighted blocks to choose from (381,765,381), of which the heaviest block is 765. According to this logic, continue to go in this direction, the path from the G block to the end point is the anchor chain (green) constructed.

This is the method of anchor chain generation, which "anchores" the Block DAG into a path, and we can now build a sorting algorithm based on this path.

Although this method of calculating the anchor chain solves the problem that each pointed block has the same weight, it leads to another problem, because the attacker can still change the anchor chain continuously (not converging quickly). So we introduce ghost pointers to help the network converge to a fixed anchor chain faster, so as to achieve deterministic ordering.

### Not all pointers are the same

To further prevent the network from getting out of order, we introduced a special pointer called a ghost pointer. The basic idea is that when creating each block, it points to all the blocks on the boundary that it has seen and verified as legitimate, but such blocks are special.

The special block is the termination block of the anchor chain. In addition to the need to acknowledge that there are other legal blocks on the border of the Block DAG, each newly added block will also point to a ghost pointer of this termination block. An important change here is that only the blocks pointed by ghost pointers are considered in the calculation of block weights, and the blocks pointed by other pointers are not considered.

The parent block becomes a block with only a special ghost pointer pointing to it. In the Taraxa Block DAG structure, each block will only point to a parent block. Other blocks are simply observing other blocks of the block tree (original leaves can be understood as blocks), or "ancestors"-in Ethereum, they are called "uncles" or "ommers". This can be mentioned earlier, and it can also be very clear about how many parent blocks a block can have.

This solves a basic problem, namely, how to ensure that all blocks on the boundary are not presented in an ambiguous form during the implicit voting process represented by the DAG graph.

Let's take a quick look at an example,

Thick arrows are ghost pointers (calculate weight)

Fine dot arrows are confirmation pointers (no weight)

(a) Two blocks (1.1 and 1.2) point to the first genesis block (0). The thick line is a ghost pointer, because only one parent block exists and both blocks point to it. The order is ambiguous now, but we can use the lowest hash for comparison, suppose (1.1) wins, and be considered as a candidate for a ghost pointer for future blocks.

(b) 3 blocks have been added, but due to network delays, not all of these blocks can see every new block at the same time, which means that some nodes can see some blocks faster Piece. For example, fewer network hops caused by closer physical distances will increase the speed at which nodes see blocks. Both block (2.1) and block (2.2) have seen the previous two blocks (1.1 and 1.2), so they both point the ghost pointer at these two blocks and honestly identify (1.1) as the anchor chain The termination block. However, (2.3) does not see (1.1), so it can only use the ghost pointer to point to (1.2) and cannot do anything else. Note that according to our rules, the weight of the block has been updated, but only the portion of the sub-block that points to it using the ghost pointer is calculated.

(c) The next block appears, block (3.1) sees blocks (2.1) and (2.2) at the same time, block (3.2) sees blocks (2.1) and (2.3) at the same time, Block (3.3) sees blocks (2.1, 2.2, and 2.3) at the same time. At release time each block selects the terminating block on the anchor chain they see, points its ghost pointer to it, and then continues.

Together with the anchor chain, the ghost pointer helps to force the network to converge on the anchor chain and stabilize the overall ranking. Next, we will describe how to finally sort the blocks based on the anchor chain.

### Sort by anchor chain

Using the ghost pointer, let's recalculate the weights from the previous Block DAG example. Note again that only the blocks pointed to by the ghost pointer can be weighted into the parent block.

Once the anchor chain is drawn, we construct epochs around each block (anchor block) on the anchor chain. The epoch is the number of blocks that the anchor block can observe, or the block that the anchor block points directly or indirectly. Think of them as friends of super popular anchor blocks.

In the above image, we use the red dotted line to draw each anchor block epoch. Unfortunately, the first anchor block with a weight of 25 was only epoch himself. The next anchor block with a weight of 21 has epoc, including itself and two other blocks with a weight of 1 that it can observe. The third anchor block weighs 18 and only one anchor block can be observed. The weight of the next block is 17, and its epoch is 3, which includes the weight of one block which can be directly observed, and the weight of the other block which is 2 it is indirectly observed. In this way we continue to distinguish until the epoch of each anchor block is drawn.

Now we are ready to sort the blocks ! The blocks are first in epoch order from oldest to newest (from left to right). In each epoch, determine which block appears first by looking at which block points to which block, and using weight values. Or if this method fails, the block hash is used as a method for judging a block with the same distance as the anchor block.

Looking at the epoch graph, G is the first (1). There is only one block in the next epoch, so this block with a weight of 25 is the second block (2). Move to the next epoch.The two blocks with a weight of 1 are before the anchor block with weight 21 (because they are pointing to the block with weight 21) .The way to compare these two blocks is to compare whose hash value is lower. Determine (3) and (4), and then the fifth block (5) is a block with a weight value of 21. We keep going until all blocks in all epochs are sorted.

As shown below, the numbers in each block represent order, not weight.

We're finally done!

But are we really done? What about those blocks that haven't been sorted yet?

There are always some blocks near the block DAG structure boundary that are not part of the anchor epoch. But don't worry, as more blocks are added to the boundary, they will eventually be included.

Doesn't the anchor chain (and thus the order) arise over time?

Yes! There is a risk of reordering within the Block DAG structure. This risk declines exponentially over time, but it is never really eliminated, which is why Taraxa needs to implement a real-time final process (article will be published soon). The introduction of true real-time finality in the Block DAG structure without the risk of reordering is the basis for building DApps in the network.

Stay tuned.