This article mainly describes the emergence of a file storage system based on blockchain technology and its impact on the entire system. Blockchain-based file storage (BFS) is another more promising alternative to P2P file systems with centralized storage and no-excitation mode. If BFS can solve its use and technical problems, then it may become a new storage architecture to promote the formation of decentralized Internet.
Recently, many people have begun to pay attention to decentralized systems, because this can remove mediation, create a new economic scale, and provide users with unprecedented data control capabilities. The Smart Contracts feature allows users to create applications that have these advantages. With these tools, users can quickly get a new Internet system called decentralized network (or web3), where all applications will be more powerful than they are now, and decentralized applications will be based on an economical and secure blockchain system. Development.
- Zhu Jiaming: Rethinking about industrial blockchain: Although the blockchain industry has made progress, it is expected to be much slower
- Experts say that the blockchain is in the 2.0 phase of four aspects to empower the manufacturing industry
- Digital Currency Research Institute of the People's Bank of China: Development and Management of Blockchain Technology
- Zhou Hongyi strongly recommended | New Yorker 16000 words heavyweight article: blockchain is the only hope to return to the essence of the Internet
- "618" shopping festival e-commerce giants trick play blockchain experts said that relying on "chain" to solve commodity security "a little difficult"
- Observation | In the blockchain era, can Shanghai create "BAT"?
However, we soon discovered that as a “decentralized operating system”, the blockchain cannot run applications such as crypto cats that are highly resource and efficient, without performance optimization . This kind of event makes more people pay attention, and on the basis of the existing foundation, try to carry out performance innovation and improvement, so as to ensure that useful decentralized applications can be produced.
For example, the blockchain itself is a very poor storage device, and we need to know a very important fact that blockchain storage files are very inexpensive . In fact, a book shared by thousands of users, each of which needs to copy each piece of data between them, which causes the blockchain to be unable to carry more than megabytes of data.
Therefore, many important technology developments for decentralized systems should be outside the blockchain – such as Layer 2 solutions, proprietary P2P network solutions, storage files, and more. The actual blockchain and other independent parts will be combined to build this decentralized Internet.
Although decentralized systems are epoch-making, if the underlying technology is congested, there will not be many people willing to join the decentralized Internet, which means that the decentralized technology stack (except decentralized assets) is more than existing Not too bad.
Currently, any part of the stack is incomplete, and the development of the DNS, storage, and compute layers is particularly early. The typical slogan we all know is: "We will take advantage of the blockchain's inability to tamper with and build a DAPP. It doesn't matter if you need to store the file in a centralized place.
I would like to say that even based on blockchain, we can create so-called applications that can be applied, but these applications are not “decentralized applications” because the data does not pass a credible, decentralized approach. For storage and trading. In other words, although these applications have many other advantages, they are not necessarily because they are completely decentralized.
This brings up my core point of view:
Blockchain-based file storage system -> Decentralized data -> Decentralized Internet
BFS will be the backbone of the web3 architecture, and BFS will decentralize the data to promote the decentralized Internet system. Because there is no real decentralized data, there will be no real decentralized applications, and there will be no real decentralized Internet system.
Compared to other storage solutions, a secure, fair and economical BFS system has many benefits and advantages for both general-purpose systems and web3 architectures. As the popularity of BFS increases, I will also analyze the technical and commercial barriers to sustainable development.
First, centralized cloud storage began to dominate the market (1990s – now)
In the 1990s, files were stored on different servers, and users could retrieve data directly from the computer. Although users have full control over the files themselves, setting up these servers requires extensive Internet and encryption experience, as well as a lot of time.
Initially, this did not seem to be a problem, because in 1997, the entire Internet was 1.5 terabytes of data, and the overall value of the Internet at the time was not as great as it is now. The well-known computer experts at the time summed up very well: "Now, all the information may add up to several gigabytes; by 2000, tape and disk production will reach this level."
This situation has gradually changed in recent years, because the current computer-to-computer interaction produces more data than humans themselves, and this data is important to both users and businesses because it allows them to find new ways to get data from Get relevant conclusions, such as artificial intelligence, ultra high definition video, and financial models.
Users are increasingly demanding to store, hold, and analyze this data, and they themselves have a hard time controlling the data. In 2018, there is now about 32ZB (that is, 32,000,000,000 TB) of data.
This is Amazon's entry point, because Amazon has built itself into an important player in the e-commerce industry, and they need to develop a lot of internal APIs and infrastructure to control the massive amounts of data associated with their business. Now, the Amazon team has built a complete collection of internal software that has helped many of their departments save a lot of time because it doesn't have to worry about infrastructure.
Among them, in 2006, Amazon S3 and EC2 were released, indicating that the era of centralized cloud is coming.
Cloud services make it easy for users to access all of Amazon's powerful tools, and because it's very simple to use, it can meet the storage needs of many enterprises. Amazon (after Microsoft, Google and Facebook-like private clouds) has gradually gained more Internet data.
Individual users lose their sovereignty and control over their data, and if there is a single point of failure, a large number of data holders can become victims of data corruption, data loss, and server downtime, resulting in billions of dollars. US dollars, as well as the loss of human knowledge and culture. These issues, including the growing number of people who understand that large centralized cloud servers can undermine the privacy of individuals and businesses, both intellectually and practically promote campaigns against data sets.
As part of the decentralized Internet, Amazon S3 has become a choice for many people because of its ease of use. A large number of DApps are currently using Amazon's services to launch their products, committing to centralize the future of data, or telling others that even if there is no decentralized data, you can still get decentralized applications.
However, the data set is a fundamentally serious problem that does not decentralize the applications or infrastructure hosted on the centralized cloud. The Internet is just a bunch of computers that transfer files to each other, and centralized storage leads to centralized data, leading to a centralized network.
Therefore, it can be inferred that a good decentralized storage solution solves the legacy problem of the decentralized network. Although many other components of web3 have been attacked, the data can eventually be extracted in a credible way. Although the public chain itself is not completely decentralized without decentralized storage, it is as if most of the blockchain master nodes are using centralized cloud storage solutions.
Decentralized storage to create decentralized data
Second, the challenge arises: peer-to-peer file storage system (2001-present)
The peer-to-peer file storage system emerged as a replacement for a centralized cloud server and there is no risk of centralization. Five years after the emergence of Amazon S3, BitTorrent has enabled files to be efficiently transferred between users. In 2009, peer-to-peer applications accounted for 50% of Internet traffic. Although BitTorrent allows users to share files with each other, this does not allow you to store and find files like Amazon S3 or Dropbox; therefore this is not a solution for file storage.
IPFS wants to build a true peer-to-peer, decentralized file storage system based on BitTorrent. In IPFS, all files are aggregated, there is a common language, and all users are shared across the system, which allows them to find and transfer files to each other.
For example, Internet Archive companies and many DApps are beginning to try to use IPFS for file storage and advertise that their architecture is decentralized. For many initial cases, IPFS is definitely enough.
Because IPFS concentrates users on one system (where each user can find each other through a Decentralized Hash Table (DHT)), a common communication language is generated through the IPFS protocol, and there is no single point of corruption. The new decentralized Internet is indeed the foundation for decentralized storage. Obviously, many well-known DApps such as OpenBazaar and Augur are using IPFS.
However, unfortunately, for community projects and open source enthusiasts, there are some underlying issues with the rapid expansion of IPFS. Here are some of the most important issues:
1. Files in IPFS are distributed to more nodes, but since these nodes do not have a lot of incentives to hold these files, files with few visits will gradually disappear. This can lead to many commercial applications, such as videos that need to be retained for a long time, or that are not available in previous blockchains.
2. Although DHT allows users to quickly retrieve each other in IPFS and find files, DHT is not secure for file retrieval. This means that a malicious attacker can find a specific file at a fraction of the cost.
Many projects patched the first issue by storing files on IPFS nodes hosted on a centralized Amazon S3 computer. This means that you will hold several Amazon S3 nodes yourself, thus ensuring that your files will remain on the IPFS network (the main Amazon node is still working). However, the problem of centralization reappears, which affects the significance of using IPFS. In order to make decentralized data better, we need to draw inspiration from these systems, but add an incentive layer, but also have more powerful security, and ultimately create decentralized data with the same capacity to expand the centralized data. .
Third, the blockchain-based file storage system (after 2020)
The public chain uses an encryption incentive and punishment system to guide untrusted user behavior to the desired consensus. Therefore, BFS with a strong crypto-incentive system is supported by other parts of the decentralized technology stack, such as a secure DHT alternative, which ensures that IPFS is the de facto decentralized Internet infrastructure.
For decentralized Internet, the ideal file storage solution needs to be better than the centralized solution, which is the decentralized solution.
In a good application scenario, each storage vendor in the network has a large amount of storage space, and these storage and bandwidth can effectively guarantee encryption. BFS has new applications for innovation and technology, such as error correction coding, storage proof and space proof. Many innovative players enter the field of vision, and dozens of projects are innovative in technology and products through various methods.
A collection of individuals and participating professional storage providers that adhere to blockchain rules may undermine the influence of any centralized company, even giants like Amazon. In addition to removing mediation data transactions, blockchain-based solutions have the following advantages:
1. It allows you to control your own data and prevent auditing
Encrypted incentives, if the node does not store and service data, then it will be economically penalized. When data files have a very high rate of fault tolerance, companies and even government agencies will find it difficult to record them. Due to decentralization, there will be no intermediary (such as Google/AWS) to replace your management data.
2. Strong resistance to serious black swan events and network downtime
Through traditional sharding or error correction code, files can be sharded and held by many people. Natural disasters, human/computer errors, and other tasks can hardly affect the system if there are enough nodes.
3. Compared with the centralized system, it has great advantages.
Since many nodes store different parts of the file, the downloaded files can be synchronized. As in BitTorrent, synchronous downloads are much faster than centralized cloud servers.
4. High probability price is very low, you can create a new economic model
Storage and data have been considered by many to be valuable, and many hard disk spaces are vacant. Storage providers can take advantage of these hard drive assets to monetize. Since the cost of storing files is much lower than the hard disk itself, storage requires very little cost, which means that the storage service provider can guarantee net profit. The current solution has shown users significant cost savings: Sia costs less than $2/TB/month, while S3's standard service cost is $23/TB/month.
For decentralized Internet, the ideal file storage solution should be better than centralized, which is the decentralized storage solution . BFS can get a centralized user experience, as well as IPFS decentralized services. The main problem with centralization is that they are central. In other words, the perfect BFS is the perfect file storage solution; it needs to use decentralized data to shift the focus of the decentralized user population from the centralized solution, because this transfer requires only a relatively small loss. .
Fourth, the blockchain-based storage system still has many problems to be solved
The benefits of BFS over IPFS and centralized solutions are mentioned in previous chapters. In fact, the actual storage capacity of these two most famous production-grade storage projects in 2018 is a few thousand times larger than that of large cloud providers in 2016, and the total storage capacity of cloud storage is expected to increase significantly in the next few years. After communicating with many blockchain users and traditional companies, it is concluded that there is still much to be done before decentralizing the Internet to disrupt centralization solutions. Amazon S3 and others have features and optimizations, and are currently unable to match usage based on blockchain solutions or IPFS. A large number of practices are faced with the need for detection technology and usability issues.
Blockchain-based content storage system is still very young
Amazon S3 currently has a huge advantage in uploading and downloading, as well as a wider range of feature applications.
From a file upload perspective, decentralized solutions are less efficient than centralized. Overall, video uploads are conducted through a decentralized market where file storage vendors and storage “buyers” need to be relatively consistent. This process of convergence and communication, as well as personal node processing speed is much lower than enterprise-level centralized computers, is the bottleneck of decentralized file uploading. When uploading data to someone (by chain verification storage contract), the system needs a long initial start-up time (upload delay); or uploading first, then letting the file upload transaction (where the buyer and the seller match successfully) will be stored in In the certified block, the entire process takes a few seconds to a few minutes.
For example, synchronous uploading and other solutions, in which different shards or file parts can be uploaded to different nodes at the same time, thereby maximizing the use of connection bandwidth, long-term buyer-provider contract, bulk chain buy/seller match, and storage content Negotiation can be addressed through the chain (layer 2 solution), and faster consensus/common block propagation techniques are being developed.
Difficulties in expansion are also the main issues affecting the performance of the blockchain. If each Tx specifies a 50MB file to be stored, and each block has 25 storage transactions, a new block will be generated every 30 seconds, then the entire system can store about 1.3 EB per year. The data is dwarfed by the content currently stored by large cloud providers. In addition, there are still many bottlenecks, such as the storage certification mechanism is still very slow, so the system can not get the maximum use capacity. Layer 2 solutions and other expansion schemes can solve this problem, but the encryption proof algorithm also needs to be more efficient.
The download speed is the same as the upload, and the same problem occurs. The download speed and delay problem are caused by the buyer/provider's combination and communication, and the speed of the personal node is also a problem. The downloader can pay for the download request (Sia, Stroj) in advance, or pay as needed, such as downloading (Filecoin) in advance. By purchasing in advance, you can use the buyer-provider combination and payment method each time, so even though the operation is done in the chain, it takes much longer than the centralized solution. The solution to these problems is similar to what is required for the upload function.
Therefore, there are still many features that cannot be implemented in a blockchain-based solution.
For example, as far as the current solution is concerned, each downloader must be a registered user on the blockchain and have a pass, but in a centralized cloud service, everyone can be in their own browser or app. Browse the content and don't need any basic knowledge (of course, this is actually a problem with the use). The current solution gives the user encrypted files, but since the transaction information is public, others can also see that the user is transferring a particular file hash to someone else. This is a very serious problem for many companies. For example, the gene company does not want others to know what information is transmitted, and does not want to make the data public (even the hash value does not want to be public).
At the same time, it is very difficult to design an effective certification mechanism (such as storage proof), and it is difficult to guarantee the upload of the corresponding file (upload certificate). In addition, for companies, they want all services to be professional and secure. Service Level Agreements (SLAs) and file permission permissions (who can view files) are difficult to implement, and most of the features are still in very early stages of development (we are now only 20% of the centralized system)
The availability of BFS and the blockchain itself is another big problem.
The lack of blockchain and the combination of more payment methods is a big problem. BFS is usually out of the public chain choice of DAPP users. For example, Filecoin, 0Chain, and Sia all have their own blockchain system. DApp users don't want to learn a very complicated new public chain, and if they want to upload files. Cross-chain integration and cross-chain payments can make the use of the entire system easier. For example, NEO DApp users (and perhaps some NEO and GAS certificates) can use the Gas Pass to upload files via a simple API interface. In this way, BFS does not matter whether it is NEO itself or cross-chain docking. Ideally, the dynamics of all pass payments need to be as intuitive as possible.
Second, when users want to use files, the experience is also very poor. For example, whether you are an uploader or a downloader, in the Filecoin and Sia systems, you have to download the entire blockchain data, which takes a few hours. Then, you need to create an account on the exchange, while be able to understand the cryptocurrency and wallet. This is the opposite of Amazon S3, because you can manage all the uploaded files through the web interface, and all the downloads are abstracted from the user terminal, so the user doesn't even know where the files come from the browser until Amazon 垮And recorded a lot of data on facebook and other Internet parts. Obviously, the user experience of blockchain and digital currency will take a long time to improve. One solution is to put these complicated processes on the upload side, so that the download side can be browsed through a simple JS module without installation. Used by the device.
V. A virtuous cycle of file storage and decentralized Internet based on blockchain
A decentralized network can create a data exchange system that does not require an intermediary. This allows new users of Internet applications to get an unprecedented experience. As Olla Carlson-Wee of Polychain Capital said: "I think we will compare web2 and web3, but as time goes on, we will find that these web3 sounds sci-fi, and I think the development of web3 is still blurred now. ". Tools for storing and sharing this data are important for such data transactions and for decentralization, whether for blockchain data, front-end data, metadata, or large multimedia files. Although the BFS system is still advancing, we can pay more attention and work hard to complete this technology. A blockchain-based file storage system is not just a concept, but represents an issue that is now eager to solve. In doing so, we can enjoy the convenience of decentralized solutions while also providing a fast and easy way to centralize the system.
Sixth, based on the blockchain file storage system, promoted the decentralized Internet
Decentralized data creates a decentralized Internet. According to the description above, the Internet is actually formed by a number of computers for storing and transferring data, which are connected to each other through a series of communication protocols. Decentralized data is trust-free and will be stored and shared in a decentralized manner. As this article writes, there are now 32 million blockchain wallets; now millions of users have access to decentralized communication protocols (eg, gossip protocol, Tor, etc.) and can verify data without trust. (The nature of the two blocks themselves). However, these have not been applied to the truly functioning decentralized app, as the powerful data storage layer is still missing.
It remains to be seen whether the decentralized Internet will completely replace the centralized Internet, depending on whether BFS can beat the centralized service. With more and more participants in decentralized storage applications, this has brought more users to the decentralized Internet. I hope this article can introduce you to the blockchain-based file storage system and its importance.
Author: Eric Wang is the co-founder of Archon Cloud, this system is a file-based storage system block chain, which Eric leading R & D and other related work.
Do not redistribute without permission. To read the original text, please search for the public number: DalingRe-DR, follow the public number, click on the link at the bottom of this article.