Original: "Falsifying the log growth model of Bitcoin value "
This article explores whether there is a relationship between time and the price of Bitcoin. Aiming at the least squares hypothesis, the proposed log-log model [1,2,3] was tested for statistical validity, and Engle-Granger method was used for co-integration to ensure the stability of each variable and the potential False relationship. With the exception of one of these tests, all tests can refute the assumption that time is an important predictor of Bitcoin price.
- Opinion: Why won't Bitcoin become digital gold?
- "Half" and "Fed rate cut", the two big exams Bitcoin faces
- Ant S9 reached shutdown price. Earned 90 yuan a day, now it is 6 yuan.
- Research: 10% Bitcoin allocation in portfolio, which outperforms traditional asset portfolios
- Industry Research Report | Blockchain Derivatives Overview: Rapid Development, Risk Symbiosis
- Expectations are too high? Bakkt weekly trading volume is only 5 million US dollars
Various sources [1, 2 and 3] have proposed logarithmic price ~ logarithmic time (aka logarithmic growth) models to explain a large part of the price trend of bitcoin, so a mechanism for estimating future bitcoin prices has been proposed .
Scientific methods are difficult for most people to understand. This is counterintuitive. It may draw conclusions that do not reflect personal beliefs. Understanding this basic fundamental concept is the foundation of this approach: errors are acceptable.
According to the great modern scientific philosopher Karl Popper, testing a hypothesis for a wrong result is the only reliable way to increase the validity of an argument. If rigorous and repeated tests do not indicate that the assumptions are incorrect, then for each test, the assumptions are more accurate. This concept is called falsifiability. This article aims to forge a logarithmic growth model of bitcoin value, which is basically defined in [1, 2 and 3].
All analyses were performed using Stata 14. This article does not serve as financial advice.
Define the problem
To fake a hypothesis, we must first explain what it is:
The null hypothesis (H0): Bitcoin price is a function of the number of days Bitcoin has been
Alternative hypothesis (H1): Bitcoin price is not a function of the number of days Bitcoin has been
The authors of [1, 2 and 3] chose to test H0 by fitting an ordinary least squares (OLS) regression on the natural logarithm of the Bitcoin price and the natural logarithm of the number of days that Bitcoin exists. Neither variable has an accompanying diagnostic procedure, nor does it have any definitive log-transformed reasoning. The model does not consider the possibility of false relationships due to non-stationarity, nor does it consider the possibility of any interaction or other confounding factors.
In this article, we will explore the model and run it through conventional regression diagnostics, determine if log transformations are necessary or appropriate (or both), and explore possible confounding variables, interactions, and sensitivity to confusion.
Another issue to be explored is non-stationarity. Stationarity is the assumption of most statistical models. The concept is that as time goes by, there is no trend at any moment, for example, the mean (or variance) relative to time has no trend.
After the stationary analysis, we will explore the possibility of co-integration.
The medium is relatively limited in mathematical notation. A common symbol for estimating statistical parameters is to put a range on it. Instead, we define the estimation of the term as . For example, the estimated value of β = [β]. If we represent a 2×2 matrix, we will operate like [r1c1, r1c2 \ r2c1, r2c2], etc. The subscript term is replaced by @-for example, for the 10th position in the vector X, we usually subscript X by 10. We write X @ 10.
Ordinary least squares
Ordinary least squares regression is a method of estimating the linear relationship between two or more variables.
First, let's define the linear model as a function of X that is equal to Y with some errors.
Y = βX + ε
Where Y is the dependent variable, X is the independent variable, ε is the error term, and β is the multiplier of X. The goal of OLS is to estimate β to minimize ε.
In order for [β] to be a reliable estimate, some basic assumptions must be satisfied (called Gauss-Markov assumptions ):
- There is a linear relationship between the dependent and independent variables
- These errors are coherent (that is, they have constant variance)
- The average distribution of errors is zero
- There is no autocorrelation in the error (that is, the error has nothing to do with the error's lag)
Let's first look at the non-transformed scatter plot (data from Coinmetrics) of price v days.
Figure 1-Price v days. The data distribution is too wide to determine linearity visually.
In Figure 1, we come across a good reason to get the logarithm of the price-the span is too wide. Taking the logarithm of the price (not days) and redrawing it gives us a familiar log display mode (Figure 2)
Figure 2-Log price v days. A clear logarithmic pattern is emerging.
Taking the logarithm of a few days and plotting again, we get the obvious linear pattern determined by the authors of [1, 2 and 3] in Figure 3.
Figure 3-A clear linear relationship emerges
This confirms the log-log choice-the only transformation that really shows a good linear relationship.
Figure 4-The square root transform is much better than the untransformed sequence
Therefore, preliminary analysis cannot deny H0.
The log-log fit regression is given in Figure 5 below, where [β] = 5.8
Figure 5 — Log-Log Regression Results
Using this model, we can now estimate the residuals [ε] and fitted values [Y] and test other hypotheses.
If the assumption of constant variance (that is, concentric stationary) in the error term is true, the error term will randomly change around 0 for each of the predicted values. Therefore, the RVF chart (Figure 6) is a simple and effective graphical method to study the accuracy of this hypothesis. In Figure 6, we see that there is a huge pattern instead of random scattering, indicating the non-constant variance (ie, heteroscedasticity) of the error term.
Figure 6a-RVF diagram. The pattern here indicates a problem.
Heteroscedasticity like this will cause the estimated value of the coefficient [β] to have a larger variance, so the accuracy will be lower, and the p value will be much larger than it should be. This is because the OLS program cannot detect the increased variance. Therefore, when we then calculated t-values and F-values, we used low estimates of variance, which resulted in higher significance. This also affects the 95% confidence interval of [β], which itself is a function of the variance (by standard error).
The autocorrelation Breusch-Godfrey [6 & 7] statistic is also important, providing further evidence for this problem.
Figure 6b-Autocorrelation in detected residuals
At this stage, it is usually when we stop and reassign the model. However, as we know the impact of these issues, it will be relatively safe to continue regression analysis to understand that they exist. We can deal with these problems (mild form) in many ways, such as bootstrapping or using a robust estimator as the variance.
Figure 7-Different estimates show the effect of heteroscedasticity
As shown in Figure 7, although there is a small increase in variance (see expanded confidence intervals), in most cases, the presence of heteroskedasticity does not actually have much harmful effect.
Normality in error
The assumption that the error terms obey the normal distribution of zero mean is less important than linear or homoscedasticity. Non-normal but not skewed residuals will make the confidence interval too optimistic. If the residuals are skewed, you may end up with a little deviation. As can be seen from Figures 8 and 9, the residuals are heavily skewed. The Shapiro-Wilk normality test has a p-value of zero. They do not exactly fit the normal curve, so the confidence intervals are not affected.
Figure 8-Histogram of error terms, covered by a normal distribution (green). This error term should be normal, but it is not.
Figure 9-Normal quantile plot of error terms. The closer the point is to a straight line, the better the normal fit.
The concept of leverage is that not all data points in the regression contribute equally to the estimation of the coefficient. Certain points with high leverage may change the coefficient significantly depending on their presence. In Figure 10, we can clearly see that too many points are involved (above the average remaining amount and above the average leverage).
Figure 10-Using v-squared residuals.
Least squares (OLS) summary
The basic diagnosis indicates that, except for linearity, all Gauss-Markov assumptions are violated. This is relatively strong evidence for rejecting H0.
The stationary process is said to integrate level 0 (eg I (0)). Non-stationary processes are I (1) or greater. In this case, the integration is more like the integration of the poor-it is the sum of the lag differences. I (1) means that if we subtract the first lag from each value in the sequence, we will have an I (0) process. It is relatively well known that regression on non-stationary time series can lead to the identification of false relationships.
In Figures 12 and 13 below, we can see that we cannot reject the null hypothesis of the enhanced Dickey Fuller (ADF) test. The null hypothesis of the ADF test is that the data is unstable. This means that we cannot say that the data is fixed.
Figures 11 and 12-GLS enhances the ADF test, taking the unit of record price and number of days as the root.
The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test is a supplementary test for the smoothness of the ADF test. The test has a null hypothesis that the data is fixed. As shown in Figures 13 and 14, we can reject the stationaryity of most lags in both variables.
Figure 13 and Figure 14-KPSS Test for Ineffective Stationarity
These tests prove that these two series are undoubtedly smooth. This is a bit problematic. If the sequence is at least not stationary, OLS may mislead you into identifying false relationships. One thing we can do is take the logarithmic day difference of each variable and reconstruct our OLS. However, since this problem is quite common in the econometric series, we can use a more powerful framework called co-integration.
Cointegration is a method of processing a pair (or multiple) I (1) processes and determining whether a relationship exists and what the relationship is. To understand cointegration, we give a simplified example of a drunk man and her dog . Imagine a drunk man pulling her dog home with a leash. The drunk man walked around, and the dog walked at random: sniffing the tree, barking, chasing and scratching. However, the dog's overall direction of travel will be within the lead of a drunk. We can estimate that wherever a drunk man walks home, the dog will be within the length of the drunk's leash (make sure it may be on one or the other side, but the dog must be within the leash length). This terrible simplification is a rough metaphor for cointegration—the dog moves with the owner.
Contrast that with correlations-suppose a stray dog walked 95% of the way home from drunkenness, and then ran after the car to the other side of town. There will be a strong correlation between wanderers and walking behavior (literally R²: 95%), but just like a drunk man has many nightstands-this relationship does not mean anything-it cannot be used To predict where to get drunk, and in some parts of the journey, this is true, and in some parts, this is completely incorrect.
To find the drunkard, first, we will see what lag norm our model should use.
Figure 15-Description of delay order. The minimum AIC is used to determine.
We determine here that the most suitable lag order to investigate by choosing the smallest AIC is 6.
Next, we need to determine if there is a co-integration relationship. The simple Engle-Granger framework [8, 9, 10] makes this operation relatively easy. If the test statistic is more negative than the critical value, there is a co-integration relationship.
Figure 16-The test statistics are far below any critical value
The results in Figure 16 have no evidence that a co-integration equation exists between the logarithmic price and the logarithmic day.
In this study, we did not consider any confounding variables. Given the above evidence, any confounding factor is unlikely to have a significant impact on our conclusions-we can reject H0. We can say "there is no relationship between log days and log bitcoin price". If so, then there will be a common relationship.
Given that all Gauss Markov hypotheses are valid linear regression assumptions, and there is no detectable cointegration, and both variables are non-stationary, there is sufficient evidence to reject H0, so there is no valid Linear regression. The linear relationship between logarithmic prices and logarithmic days cannot be used to reliably predict sample estimated prices.
 Davidson, R., and JG MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University Press.
 Durbin, J., and GS Watson. 1950. Testing for serial correlation in least squares regression. I. Biometrika 37: 409–428.
 Engle, RF and Granger, CWJ 1987. Co-integration and Error Correction: Representation, Estimation and Testing. Econometrica, Vol. 55, pp. 251–276.
 MacKinnon, James G. 1990, 2010. Critical Values for Cointegration Tests. Queen's Economics Department Working Paper No. 1227, Queen's University, Kingston, Ontario, Canada. Available at http://ideas.repec.org/p/ qed / wpaper / 1227.html.
 Schaffer, ME 2010. egranger: Engle-Granger (EG) and Augmented Engle-Granger (AEG) cointegration tests and 2-step ECM estimation. Http://ideas.repec.org/c/boc/bocode/s457210 .html