Distributed storage blockchain system: the future of human data infrastructure?

By providing a blockchain system for distributed storage, data producers can maintain their own data, which is a start. Further, data makers can realize the benefits of data through different methods and channels. We can also share data from the robot in this way.

A future with ownership of its own data and the ability to freely trade data is a better future worth working on.

Written by: Wu Weilong, co-founder of Genaro


Genaro co-founder, Chief Technology Officer of Fun Technology. He was the core developer of Silicon Valley's Maxim Integration Company and provided algorithms for Samsung. After investing in blockchain development, he was the first blockchain developer and technical geek with rich experience in innovation. The experience involved blockchain virtualization. Machine, P2P storage, consensus algorithms and many other low-level technologies.

Every action of a person produces a series of records in the world, some of which are recorded in memory, such as something interesting on the wedding anniversary; others are recorded in the form of data, such as On that day, what souvenirs were purchased and at what restaurant.

In the Internet age, the latter is recorded by various applications, retained in the company's database, and then used through a series of calculations. For example, a person who uses a Ctrip booking hotel will find that the discount coupons sent after the hotel is booked are just some of the amusement park areas that you want to spend.

Internet companies use user data to maximize the company's interests. If a person's monthly income is 15,000 yuan, after the delivery of 3,000 yuan, all Internet companies are looking for ways to analyze this person's data. The remaining 12,000 yuan is squeezed clean.

This way has penetrated into every aspect of everyone's life. It reflects that the personal behavior record is data-worthy. Every action of the user makes the Internet company better understand his personal consumption habits and spending power.

These valuable data are taken away and used directly by Internet companies for free. Can we retain the value of such data? In the most popular words now, you can try it out with the blockchain.


Infrastructure to realize data value

When we talk about the value of data, we need to calculate the volume of the data. If each person consumes 80 bytes of records (according to credit card and savings card ETF 80 byte file requirements), each person calculates 5 times of consumption per day, and only calculates 200 million Chinese urban population, and will find daily consumption record data. It is 7.2 TB of data when it is the image. Then, when the number of consumption, the number of consumers, and the length of data accumulation increase, such data will reach the PB level at a very fast rate.

Can these data be stored and processed by the blockchain system in the general sense? No. In the general sense, blockchain systems, or most public chains, are distributed systems in which each computer in the system needs to store the same files to ensure the functionality of the system. Obviously, they are unable to provide PB-level storage to keep the value of the data on top.

So we need to store the data in a distributed storage system, and then realize the data value through the settlement function of the blockchain. Simply put, while the storage is deployed in a distributed storage system, the state is left on the blockchain for later processing and use.

Among them, distributed storage refers to storing data through different encryption methods to ensure that the data and the account on the chain are in one-to-one correspondence. In the future use, different encryption computing tools are used to quickly call and process the data. In the data manufacturing and calculation, you can do a semi-anonymous way. Compared to the traditional Internet Dropbox, because the blockchain is combined to store data, in addition to the functionality of the key and local privacy, additional processing functions can be added to meet the needs of data sharing and computing. Data is better distributed and expanded to realize the value of the data.

"State" refers to the source of the data and the change in the data, or the result of the data operation. Leaving these states on the blockchain is for traceability of their operations and changes, so that you can better know which data is more valuable and give the corresponding value through immediate settlement.

So we say that only by combining the blockchain system with the distributed storage system, rather than simply using the blockchain system, it is possible to realize the value of the data, and realize the use of distributed The blockchain system for storing data in a system is a blockchain system that differs from a general public chain system in design and implementation.

However, in addition to the way in which the blockchain system and distributed storage combine to realize the value of data, we can also explore the realization of the value of data from another perspective, that is, using only the blockchain as the settlement book, and putting the data locally. Place the results on the blockchain after completing the local calculations. At this point, local computing requires trusted computing as a medium, and the oracle predictor problem needs to be considered. This part can refer to the existing trusted computing project solution.


Explain "Blockchain System" and "Distributed Storage System"

In order to understand more clearly the blockchain system that provides distributed storage, we first analyze the "blockchain system" and "distributed storage system".

The blockchain stores the entire node of the entire network through the storage resources of the distributed nodes, and ensures the validity of the changes of the internal nodes to the stored content through the corresponding consensus technology, and maintains a complete searchable database. In this system, the storage is the change or total surplus of the balance between the generated accounts in the chain. Of course, some more complete system functions also include the storage of multiple accounts to maintain the data status of the sub-accounts in the database.

Therefore, the main function of the system is to record changes in state and then synchronize. For nodes, whether it is PoW, PoS or PoX, the core requirement is to follow the specific voting rules and synchronize the new changes in the storage of all nodes.

If a system uses a blockchain structure, it does not support user personal data, or does not support data that we want to achieve value. The data on the blockchain is account data and settlement data, which are identical. Stored to each node.

So what is a distributed storage system?

The distributed storage system shares the storage resources of distributed nodes, and distributes the data of the data storage party through file integrity verification and erasure code verification technology. The nodes of the whole network do not maintain the same storage information to reduce redundancy. A distributed system (the nodes in the blockchain system maintain the same storage information).

Returning to the example of consumer data mentioned at the beginning of this article, the existing Internet companies use distributed storage of data, through RAFT and multi-level disaster recovery, etc., to make appropriate backups to ensure that data is not lost, and establish a storage. Massive data, and efficient, low-overhead systems.

That is to say, in this part of the distributed storage system, almost all companies have reached a consensus that distributed storage is the best way to achieve large data storage at this stage. Regardless of the existence of the blockchain, the distributed storage system is already a relatively complete system that has been widely used in real life.


Blockchain system providing distributed storage

After clarifying the two different distributed systems of the blockchain system and the distributed storage system, we next discuss the blockchain system that provides distributed storage. The blockchain system that provides distributed storage is a blockchain system different from the general public chain. It is a distributed storage + special blockchain system design.

An ordinary blockchain system whose core logic is to cover all account-related transaction attributes, such as accounts (public and private keys), account transfers (signature systems and consensus systems), and transfers under conditional conditions (opcodes and their Corresponding coding method).

A blockchain system that provides data storage needs to cover the above three attributes. Since it provides storage attributes, in order to ensure that the state of storage can also restrict conditions in the future, it is necessary to perform judgment logic on its operation code. Added to ensure that the state of the storage and the state supported by the data can be traced back in the chain, which is why we need to specially design the blockchain system that provides storage.

Specifically, the blockchain needs to maintain the state of the data under certain conditions, and the state is protected to ensure that the state changes accordingly after the transaction is correct. Then, if the state of the distributed storage is not combined with the blockchain system through the corresponding design, a gap will occur automatically. If the gap of the state is used, the state of the stored procedure will be recorded in advance, or it will be delayed. It is recorded, which affects the account balance in the blockchain system, which is the space that the system is not safe.

Therefore, the different parts of the blockchain system that provides distributed storage and the general public chain system are mainly reflected in the "state" part, which records certain states and makes feedback on the account.

In addition, the storage-related state also enables smart contracts to acquire status in a timely manner, so that a sub-book that can use external data can be designed, which can better streamline the project and reduce the chain-based assets caused by the oracle. The possibility of loss.


Design and challenges of distributed storage systems

After introducing the basic concepts, the next step is to introduce the design and implementation of the system. For clarity, the design of a distributed storage system will be discussed first, followed by the design of a blockchain system that provides distributed storage.

The design of the distributed storage system mainly solves three problems: how to safely put files; how to store files securely; how files are not "stealed" by the storage provider.

1. How to safely put files

The data is encrypted and segmented before being uploaded on the client, and is distributed to the space of the storage provider through the distributed storage distribution method, and the relevant storage heartbeat detection is used to ensure that the data can be completely retrieved when the user needs the data.

Throughout the process, the user's data is encrypted locally, and the user does not need to worry about the data being sneaked and the storage provider is not at risk of clear text storage.

2. How to securely store files

According to the redundant storage principle of distributed storage, generally speaking, for a resource, if you want to ensure that any two nodes (N=2) are still not affected by system availability, you need to maintain 2N+1 resources. In other words, a file needs to be saved to 5 servers.

According to the design of 12 nodes as a batch, any 5 nodes dropped without affecting the availability, but if a replay attack is encountered, that is, the encrypted resources are unreasonably configured, the attack node can pass some methods. To prevent files from being recalled safely. This requires designing the distribution method to ensure secure storage.

In addition, during storage, the node quality can be scored by the relevant scoring system to ensure the quality of the nodes providing distributed storage services in addition to the consensus nodes.

3. How the file is not stolen by the storage provider

First, the data is encrypted on the client's local machine, which ensures that the file before the upload has been encrypted and the storage party cannot see the user data. Secondly, in the redundant storage part, each storage provider's communication directory does not contain all the file fragment storage parties, which also prevents the possibility of collusion to some extent.

After solving the above three problems, the system can be called a secure distributed storage system and can provide distributed storage services.

In the whole process, designing a reasonable distributed storage to retrieve the verification method is a relatively big challenge . Its requirements are not only to achieve reliable results through inquiry, but also to prevent replay attacks and attacks through special methods. To improve the availability of the storage system.


Design and implementation of blockchain system providing distributed storage

The blockchain system that provides distributed storage is mainly oriented to two objects. One is the node. By providing storage and participating in the establishment of the distributed system, the degree of participation and the quality of the node are mainly reflected in the provided storage; the second is to use The status of the storage is obtained through the smart contract, and the account is transferred.

Then you need to establish a system in which the storage quality of the node determines whether the node can obtain better benefits, and the user can use the stored state in the smart contract. The special design of the system is mainly in two parts, one in the technical part, mainly to solve the problem of using functional; one in the governance part, mainly to solve the problem of the quality of the distributed service provider.

From a technical point of view, the blockchain system supporting distributed storage needs to update the storage state and be convenient to use. Therefore, it is necessary to add corresponding operation codes and corresponding state logics on the basis of the original virtual machine to ensure linkage. There will be no state leakage.

From the perspective of governance, due to the use of storage of such low-power resources, it is necessary to modify the PoS in the consensus. Using the hybrid consensus, the node needs to make relevant contributions to the storage system in addition to the assets on the mortgage chain. .

The advantage of this is that since the storage itself cannot produce particularly large benefits, then subsidizing the nodes that provide greater contributions by the block revenue can motivate the nodes to provide stable storage. If the node does not meet the criteria for the block, it can also participate in the node construction within the chain by co-building the block node to ensure that the trusted mortgage block node can be in the list of the block.

In addition to the above two angles, from the perspective of the economic model, it is necessary to fine-tune the Staking condition through the Pareto distribution after each storage increase, to ensure that the mortgage distribution of the entire system does not stagnate in a certain storage stage. A series of such adjustments and updates will ensure the positive development of the entire storage ecosystem.


Use of data

A blockchain system that supports distributed storage provides a secure way to store and use valuable data produced by everyone and to ensure that users have ownership of their data. Only on this basis can the user be given the value of the data that belongs to him through the tool.

To realize the value of data, how to calculate the stored data, and make full use of the calculation results, etc., need to be implemented by means of trusted computing and more cutting-edge technology, this is a topic that needs to be discussed separately. This article will not be discussed in detail.

At this stage, the data in the system can be used in two ways:

1. Smart contract. Recordable state storage can enrich the breadth of smart contracts. Smart contracts can generate new uses and new types when supported by data, which may lead to a new ecosystem.

2, cross-chain. The system can provide distributed data storage services for other blockchain systems, and can also perform related reprocessing on the chain status of other blockchains.

By providing a blockchain system for distributed storage, data producers can maintain their own data, which is a start. Further, data makers can realize the benefits of data through different methods and channels. We can also share data from the robot in this way.

A future with ownership of its own data and the ability to freely trade data is a better future worth working on.

Source: Carbon chain value