Observe digital currency mining from a statistical perspective

Author: Luxor Tech

Translation: Zoe Zhou

Source: Crypto Valley

Simple definition
The "luck" of digital currency mining is essentially a probabilistic event. Imagine that each miner got a lottery ticket to get a certain amount of hashing power they provided. For the sake of explanation, suppose you provide a hash power of 1 EH / s, and the total hash power in the network is 100 EH / s, then you will get 1 of the 100 lotteries, that is, the probability of winning is 1 %. So for every 100 blocks found, you will also find 1 of them according to statistical calculations.
Now suppose you find 2 blocks out of 100 blocks, which means you find another block earlier than statistically. Then you are lucky! Now imagine you found 0 blocks out of 100 blocks, and you are unfortunate. In the long run, you should find one of every 100 blocks based on statistical calculations, but there will be differences in the short term.
The description above should give you a basic understanding of "luck", but if you want to learn more about it, keep reading!
Key term
  • Mining pool
A mining pool is composed of a group of miners who work together to reduce the volatility of returns. Miners share their processing power through the network, then allocate rewards based on the amount of work they do to find blocks, and charge fees to the pool operators.
When the difficulty of mining increases to the point that it may take years for small miners to find a block by themselves, joining a mining pool has become a trend. The solution to this problem is to allow miners to concentrate their resources so that they can generate blocks faster, and thus get partial block rewards on the basis of consistent goals.
  • PPS model: Pay-Per-Share method-pay for each share
This distribution mode is based on the miner's share of computing power in the mining pool, and obtains a fixed income every day. The ultimate goal of this model is to eliminate the "luck" component and reduce the risk of the miner, but transfer the risk to the operator of the mining pool. Operators can charge fees to cover the losses that these risks may cause.
  • PPLNS mode: (The purest team mining) Full name Pay-Per-Last-N-Shares
It means "pay income based on the past N shares". This distribution mode is based on the actual income of the mining pool on that day and the proportion of computing power is used to distribute the income to the miners. This means that once all miners find a block, everyone will allocate the currency in the block according to the proportion of shares contributed by each person.

Calculate "Luck"

  • Possibility of mining blocks
The probability of mining a block with each hashing power is 1 / (²³² * Difficulty).
As of February 19, 2020, the BTC difficulty is 15,546,745,765,549.
So the probability that each hashing power can mine a block is 0.000000000000000000001498%.
  • "luck"
The correct way to calculate "luck" is to look at the expected and actual shares for each round.
"Luck" = average (expected share per round / actual share per round)
The more "luck", the better "luck" in mining. 200% good luck means you only need to submit half of your share and you can find a block.
  • "Luck Statistics"
"Luck statistics" is the opposite of the above. This is our view of "luck" as a mining pool operator.
"Luck Statistics" = average (actual share per round / expected share per round)
The smaller the "Luck Statistics" value, the better "Luck". A "lucky stat" of 0.5 means that your submitted share is half what it takes to find a block.
  • "Luck" difference
Looking at "luck" for a period of time, it should not be viewed in terms of time (hours, days, etc.), but in blocks. The "luck" difference is especially important when drawing conclusions or comparing two different miners based on "luck". One miner may mine more blocks than another, which greatly affects our view of "luck".

Visualizing "Luck Statistics"

  • Finding the right distribution
Starting from the basic Poisson distribution, it is a discrete random probability distribution that represents the probability of a given number of events occurring within a fixed time or space interval. If these events occur at a known, constant average rate, regardless of the time since the last event. The problem with Poisson distribution is that it is discrete rather than continuous. The Poisson distribution deals with the number of events over a fixed period of time, but this is not how we look at "Luck Statistics".
The next step is to check the gamma distribution, which is continuous. The gamma distribution is a probability distribution of the time between events during the Poisson point process. A process in which events occur continuously and independently at a constant average rate. Gamma distribution solves the problem of "how long does it take to wait for n random events to occur?" When the shape parameter of the gamma distribution is an integer, the distribution is called the Erlang distribution. This is important for viewing "Luck Statistics" data, as it is always a positive integer.
"Luck statistics" is a negative binomial distribution, so the Erlang distribution can be used for analogy.
  • Erlang distribution
We don't need to delve into the formula for this distribution, but we can think of the Erlang distribution as a generalization of the exponential distribution. The distribution is a continuous distribution and is usually used to measure the expected time of an event (that is, mining a block).
Using this distribution makes it easier to calculate "luck" and will actually become more accurate as the difficulty of the network increases. Under the current network difficulty, the error rate will not exceed one millionth.
If this is difficult to understand, the next section will help you visualize it.
  • Probability Density Distribution Function (PDF)
Using the Erlang distribution, the PDF can show how likely the "Luck Statistics" is some arbitrary value. At any time, the probability that the "Luck Statistics" value is an exact number (ie 1.00000000000) is 0%. Instead, PDF can be used to specify the probability that the "Luck Statistics" value falls within a certain value range (that is, below 1.0).
The reference formula is as follows:
You can calculate PDF using R or python. But the easier way is to use the Wolfram Alpha website.
quantile (ErlangDistribution [Number of Blocks, Number of Blocks], optional%)
For each block drawn, the PDF shows a series of potential results. This means that the "Luck Statistics" data is likely to be well below the mean of 1.0.
If we increase the number of blocks to 14, which is about 10% of the daily network reward, then we can see that the distribution starts to become more standardized. Now, the value of "Luck Statistics" is more likely to be close to 1.0, but there is still a lot of room for change.
If we increase the number of blocks to 144, which is 100% of the daily network reward, then we will see a normal curve that contains a fairly small range of potential results. In the 144 blocks, the "Luck Statistics" value is unlikely to be lower than 0.7 or higher than 1.3.
The PDF helps to understand the importance of looking at "luck statistics" data in large samples (ie averaging over more blocks).
  • Cumulative Distribution Function (CDF)
CDF is a great way to analyze "Luck Statistics" data. Suppose your mining pool has a "Luck Statistics" value of 1.3 in the past 1, 10, and 140 blocks. Is this unfortunate or almost impossible? (And raised other issues).
Similarly, you can use R or python to model, but you can also use the Wolfram Alpha website or excel.
Wolfram Alpha: CDF [ErlangDistribution [nblocks, nblocks], Luck Statistic]
Excel: = GAMMA.DIST (Luck Statistic, nblocks, 1 / nblocks, 1)

The result of the "Luck Statistics" value of 1.3 in 1 block will be 0.727468. This means that out of 100 re-runs of a block, about 73 times we will see more fortunate blocks. In 27% of cases, you will see a more unlucky block.

We put together the table below to show the probability that multiple blocks have a specific "luck stat". As you can see, there are times when the value of "Luck Statistics" is so bad that it is unbelievable (for example, the value of "Luck Statistics" over 1.5 blocks averages 1.5).
We usually don't care too much about this because we don't lose money. But this may be the reason to check if the data is correct.

in conclusion

  • PPS mining pool
As we described at the beginning, the PPS mining pool eliminates the mining differences for its miners. Therefore, bad "luck" damages the mining pool, good "luck" is good for the mining pool, and miners will never be affected. The only thing that worries miners is that if a mining pool goes bankrupt, they will not be able to get outstanding balances and will have to go through a shutdown period until they switch to a new mining pool.
As a PPS mining pool operator, we follow our "luck" very carefully. We need to ensure that there is sufficient liquidity to make up for short-term differences. We also want to make sure everything works. Usually when we are out of luck, we check if we are really so unlucky. If we find that our "luck" is worse than 99% of the cases, then we will start to consider other factors, such as attacks on mining pools or technical bugs. We will discuss this further with an example of a mining pool attack below.
Since we launched our own BTC mining pool, we have found that for every 9 blocks found, the average "luck" value is 0.502. As a mining pool operator, we are happy about this, but it is still within the range of possibilities (it will be better in 4% of cases). By running the other 10 mining pools, we know that bad "luck" may also occur, so we don't want this situation to continue.
  • PPLNS mining pool
The operation of the PPLNS mining pool is the opposite of that of the PPS mining pool. The difference risk is not borne by the mining pool, but by the miners. Bad "luck" for a period of time means that miners get less returns, but a good "luck" for a period of time means they get more returns.
For the PPLNS mining pool, miners may leave after a period of doom. This situation is called "Gambler's Fallacy," which means that if a particular event happened more frequently than usual in the past, it is less likely to happen in the future (and vice versa). It is believed that the next mining pool may not have better "luck".
Miners in the PPLNS mining pool should pay close attention to the "luck" in their pool. If it's unlikely that it's bad luck, the mining pool may be attacked or buggy.
  • Block interception attack
This is one of the most common attacks on mining pools, usually from other competitors' mining pools. If they are PPS mining pools, they will go bankrupt; if they are PPLNS mining pools, their miners will leave them.
Block interception occurs when miners do not return a valid hash value they created.
The attacker (miner) will set some custom specifications so as not to return a hash value smaller than a preset size. The preset size is usually slightly lower than the network target (reciprocal of difficulty). As a result, miners will still receive the rewards they submit to the pool above the shared target but below the network target share. A miner will never send a valid block hash to the mining pool. Because the fees paid to the miners are as if they can produce a network block, the PPS mining pool will lose money, and the miners in the PPLNS mining pool will also lose income.
There are many ways to prevent these attacks, but there is no clear and foolproof method. This is usually done by monitoring the individual "luck" of the miners through the mining pool and locking their accounts after they are unlikely to generate a hash below the network target.