"Open source" wealth: the data on the chain is a neglected gold mine

Almost all data is public in the blockchain industry, and these data are "open source" wealth. Gold mines are in front of people, and too many entrepreneurs are "invisible."

Photo-1517089152318-42ec560349c0_Copy

Image source: unsplash

Currently every crypto venture capital fund is pursuing one thing: to cast the next unicorn (a company worth more than $1 billion). Mining hardware manufacturers like Bitumin (who sought Hong Kong stock IPO last year) and exchanges like Coinbase were among the first successful companies to be among the unicorn clubs.

In general, unicorns in the field of encryption can be divided into two categories: financial services (Coinbase, Kraken, Circle, Binance) or mining hardware development (Bitmain, Bitfury). In both industries, there are many promising companies that are queuing to work for unicorns, including Bakkt.

However, this article focuses on a different field, and in this area there may be many unicorns in the future: the blockchain data industry .

1

Data hidden behind public data

In the current information age, investment companies need data to gain a competitive advantage. In the blockchain world, almost all data is public, and every transaction that occurs on the network is permanently recorded in the public ledger, including the transaction amount and the address involved. However, in a sense, the data is still "hidden" because the data stored in the blockchain is difficult to process to extract really useful information.

As TD research partners and Raul Jordan, the main developer of the Ethereum 2.0 Prysm client, say, the data model (LevelDB) for Bitcoin, Ethereum and many other blockchains is also optimized for transactional integrity. , not for related storage or retrieval. LevelDB does not include data acquisition related models, nor does it support SQL queries. This makes it difficult to extract valuable information from this data format. In addition, research has shown that LevelDB is prone to data corruption problems, which makes it difficult to process blockchain data.

A clear example of blockchain data being "hidden" is that Bitcoin Private (BTCP) secretly issued 2 million BTCPs in the Zclassic /Bitcoin merge fork in March 2018 (Bitcoin Private is Zcash's forked chain Zclassic and BTC merged The bifurcation was created to introduce zero-knowledge proofs until December 2018, when Coinmetrics published an analysis of the total number of BTCP tokens. Probably because BTCP is not a popular project, it took about nine months for this event to be noticed. However, this also suggests that even "public" blockchain data requires specialized processing to mine meaningful information.

2

Basic data analysis

2.1

Source data analysis

The complexity of processing and analyzing blockchain data creates a perfect opportunity for data scientists and engineers to enter the blockchain industry by creating companies that solve these complex problems, and accurate data analysis is in the encryption industry. For many applications, let's explore this model.

But before going into the discussion, ask a simple question: Who needs to query the blockchain data? why?

The answer is everyone.

In fact, each encrypted user needs to periodically query blockchain data. For example, when the user needs to know if the transaction has been confirmed, simply connect to the block browser website and search using the address or transaction ID. In fact, the process behind this is this: the user "searches" the entire blockchain for information about a particular address or transaction, and the block browser company performs the task on behalf of the user and delivers the results. In the background, the company did not actually query the blockchain, but instead queried the related database that was created based on the blockchain data.

For the cost of cover, the company needs a source of income. For example, the most popular Ethereum block browser, Etherscan, generates revenue through advertising, and almost all other blockchain data service companies use similar revenue models, whether it's DApp data, such as Dappradar; or different cryptocurrencies. Price and its market value information, such as CoinMarketCap. But in many cases, this pattern of income is vulnerable to negative impacts, especially in terms of the legitimacy of the projects promoted by the ads placed.

2.2

Compliance analysis

The overall market value of the block browser market has reached a level of tens of billions of dollars, because the 2C services they provide are needed by every blockchain participant, and there will be a lot of room for growth in the future.

But for some 2B data services, just block browser level data services are not enough. Investors, funds and blockchain companies need more sophisticated data collection and analysis, such as cryptocurrency exchanges. In the United States and Europe, exchanges require very intensive data analysis to comply with anti-money laundering laws to ensure that users do not use transactions for illegal activities, such as selling stolen or extorted cryptocurrencies.

These companies are also obligated to ensure that cryptocurrencies are not used to support illegal businesses or activities. Companies such as Chainalysis, Elliptic provide blockchain data analysis tools, and governments and exchanges are using these tools to combat illegal use of cryptocurrencies. Most exchanges prefer to work with professional data analytics providers, and Coinbase has decided to acquire the company itself with the ability to acquire the controversial data analytics company Neutrino.

3 economic data analysis

3.1

What are the indicators of economic data?

Another layer of blockchain data is the analysis of economic signals, which is what investors, funds, and research institutions need to make investment decisions. There are three very important data indicators here: exchange data, chain data, and chain activity data.

From an economic point of view, exchange data is the most important but also the most difficult to obtain, because as an exchange does not want the data of its own platform to be easily tapped. As for the reason, a previous Bitwise report is also obvious: 95% of the exchange's trading volume is forged.

The data under the chain is also difficult to obtain, because the data under the chain covers almost everything except the chain transaction, which is not only complicated but also difficult to quantify. The collection and analysis of data under the chain is currently only from some communities and small websites that rely on donations to survive. Like Coin Dance, it provides historical data about the number of nodes running on the Bitcoin network and its branches & client distribution. Electric Capital's developer activity report is a good example of community contributions, an example of relying on fund operations.

3.2

Economic data provider

The accuracy and standardization of data has spawned many related data companies. It is worth mentioning that the current business is most concerned with only the exchange data and chain data. The data under the chain is not yet clear. demand.

There are already a large number of data analysis companies, which can be seen from the quality of data obtained from various sources this year. Platforms worthy of attention include Kaiko, Coinmetrics and Messari. These companies have made outstanding contributions to the data ecosystem and are likely to use data-driven investments to generate higher returns.

Kaiko is one of the leading data providers for exchange data. Since 2014, they have been engaged in the business of summarizing exchange data: collecting data from more than 1,000 cryptocurrencies from more than 30 exchanges. Kaiko offers a monthly subscription to its exchange data, or a subscription service for unlimited API calls, for a monthly fee of up to 25,000 Euros. Interested customers can also purchase data on the historical price and volume of the currency in the exchange. They also plan to expand their products to include OTC data. Recently, they provided data to Bitwise to help Bitwise make an investigation report on the volume of exchange transactions, and concluded that 95% of the transactions in the exchange were forged. One of the main results of this data and analysis was the creation of the “Real 10” volume index, which aggregates the volume of transactions on 10 exchanges that have proven themselves to provide reliable trading volume information.

Messari's original vision was to increase the transparency of the encryption ecosystem by encouraging cryptographic projects to disclose important information about the project and its founders, such as relevant developers, early investors, ICO and Pre-ICO, and disclosed The database is public and free. In addition, Messari's OnChainFx provides an improved data source for the main cryptocurrency, including a variety of innovative metrics such as the “Y2050” market capitalization (total market value of coins that can be released by 2050) and all listed cryptocurrencies "Real 10" trading volume. Although Messari does not charge for these public data and services, they charge for the conclusions of the data analysis and the special tools they developed to analyze this data. In addition to multiple chain activity data, Messari also plans to provide its customers with historical currency prices and volume data.

4 data analysis enterprises face challenges

Although there are many quality companies in the blockchain data field, the main issues that these companies need to address are discovering the true potential of these data. The current data analysis may have the following two problems:

1. Transaction data can be misleading as it does not include the over-the-counter market. Many OTC trading discs will trade through direct transfer after negotiating the price. Because over-the-counter trading activities are generally conducted indiscriminately, data companies need to contract with multiple OTC dealers to obtain such data.

2. In many cases, activities derived from chain blockchain data do not represent real economic activity. For example, when a subject transfers coins between different wallets under its control, such as moving coins from the exchange's hot wallet to a cold wallet, and vice versa. To eliminate data from such events, you need to maintain a different list of wallet addresses used by each transaction, which is usually within the scope of the data analysis company's work.

How close are these companies to unicorn companies?

Due to the nature of their business and customers, blockchain data analytics companies are now more likely to approach billions of dollars in valuations. The clients of these companies are governments, legislatures, and exchanges that must comply with KYC regulations; and they are well-funded and well-funded entities that benefit data analytics.

Another advantage of data analysis companies is the high barriers to entry and reduced competition. Early involvement in blockchain data analysis companies acquired a wealth of historical information, such as information about ransomware attacks and legal terms that are not readily available to new entrants (or arguably traditional software service providers).

Exchange data analysis companies can provide detailed price and transaction information, and first-time access to the market allows them to collaborate with the exchange earlier to aggregate early exchange data that is not available to new competitors. Of course, as a latecomer, if there are enough methods and skill innovations in the analysis of the data on the chain and the extraction of effective information, or can provide more in-depth information mining, there is also a chance to occupy a place.

5 thinking

For a decentralized world like blockchain, the supplier of data analysis is an important link, but unfortunately it is centralized, and it may release some misleading data analysis results because of the benefits.

In order to avoid such problems, is it possible to decentralize data analysis? Can you index and query data for Web 3.0 in a decentralized manner by building a decentralized protocol and share the results of the analysis? This is a new model worth considering, and maybe some new projects will be born.

(Finish)