Foreword: In web2.0, data ownership belongs to the platform, which leads to the problem of platform data leakage from time to time. Then, with the advent of web3.0, can we unbind the ownership of data and the application logic. If it can be achieved, how to achieve it? The author of this article, Kyle Samani, is translated by the "SL" of the "Blue Fox Notes" community.
A year ago, I explained the Web3 stack according to the understanding at the time. Recently, I mentioned the investment theme of Web3. As I emphasized, a key meaning of Web3 is the unbinding of data ownership and application logic. In this article, I will explain the specific problems inherent in this unbinding and how we can consider investing in the Web3 stack.
Where is the database and data?
For practical purposes, most of today's applications can be viewed as a database-based UX. There are some exceptions, such as video streaming and video games. But usually this is correct. In fact, almost every major consumer service, such as Facebook, Reddit, Evernote, Twitter, Google Docs, Gmail, iMessage, etc., can be simplified to database-based UX. (Blue Fox Notes: UX is a user experience based on the interactive experience of the interface user and the product.)
- The first domestic "Digital Currency Dictionary" was launched at the New Moganshan Conference, and the Babit Think Tank was the main editor.
- Bitcoin plunged 21% last night, or related to the US House of Representatives impeaching Trump
- How far is it from the era of “People Bitcoin Payments”? As early as 2011, there were encrypted payment applications.
- QKL123 market analysis | Gold rose sharply, Bitcoin did not rise but fell (1204)
- Wu Zhen·imToken founder He Bin: A Bitcoin DeFi Tour
- Talking about the Application and Future of Smart Contract
In the Web2 model, application providers store and manage user data, so users do not have to manage it themselves. In addition, Web2 application providers are always online because they run servers 24/7, even if users are often offline (for example, in the subway, poor network connection, battery power, etc.). In the Web3 model, there is no centralized application provider, so the paradigm for data ownership needs to change.
This raises several questions:
1. Where does the user store the TA data (let's call the user Alice, assuming she doesn't maintain her own server)?
2. If Alice is offline, how does the content sender send content to her?
Of course, the answer must be: store the content in an always-on and accessible place, and make sure that when Alice comes back online, know where to find the content sent to her. This paradigm also encapsulates P2P applications like messaging and traditional database applications like news, social media and notes.
There are some mechanical challenges to doing this kind of operation:
1. Other people (not Alice) need to know the storage content and the retrieval of the content so that Alice can find and download it later.
2. Alice needs to know where to find the index.
3. Through the index, Alice needs to actually find and download the underlying data itself.
4. The person storing the data should not be able to read the content (if it is private) or review it.
By addressing these issues, data ownership and application logic can be untied to support Web 3.0's prosperity.
Before exploring how contemporary Web 3.0 entrepreneurs can solve these problems, think about others who have tried to solve them in the past.
Previous attempts to decentralize the Internet
There are some teams, including but not limited to Zeronet, Freenet, and Scuttlebutt, who are trying to "decentralize the Internet." They are trying to do this, which is even earlier than the contemporary encryption era we know today. Most of these efforts focus on supporting narrow use cases. For example, anti-review messages and message boards, etc.
If you are curious, you can try using these systems. You will find that there are many deficiencies in their user experience. Although there are many UX problems with these systems, the biggest problem so far is speed. Everything is slow.
Why are they so slow?
Because they are all logically decentralized.
These systems use some variations of the following architectures. I will describe their architecture in the context of encryption and P2P messaging applications:
These systems are based on the idea that if Alice is offline, someone sends her a message, they will send it to Bob, and Bob will store the information on behalf of Alice.
When Alice comes back online, she asks Bob if she missed the message when she was offline (message index).
Unfortunately, Alice does not receive the following guarantees: 1) Bob is currently online; 2) Bob is always online when Alice is offline; 3) Bob actually has a full message index that she missed when she was offline. To solve this problem, Bob can ask his peers if they noticed the message sent to Alice. However, these peers may also be offline, and they may only have incomplete messages.
In this paradigm, it is impossible to guarantee message delivery because it is not clear where the message should be delivered and who should store the message index. When the message recipient comes back online, this also creates a compound problem because the recipient does not know where to find the list of messages or data messages sent to her.
Scuttlebutt is a project focused on building a P2P social network that attempts to solve this problem by adopting a two-way friend selection system similar to Facebook. That is, once Alice and Bob become friends, they share each other's friends list so that Bob can index and store the content posted by Alice's friends on behalf of Alice. This requires Alice to notify all her friends, and Bob is her agent, and vice versa. Then, when Alice is offline, her friends post updates, and Alice's friends can send updates to Bob, who can host Alice.
Zeronet and Freenet are more general projects, not just P2P social networks, they use similar models, except for friend models that have no two-way choice. This adds a lot of complexity to the system, which makes it slower. Unlike the Scuttlebutt model, in Scuttlebutt, friends agreed to help define the information path, while Freenet and Zeronet users had to randomly ping other users and ask them about the information they knew. This is the key reason why these systems are so slow.
Let's say that, under certain circumstances, Alice eventually put together the various content indexes that she missed while offline. That is, she knows that Carol sent her a photo and Dave stored the photo at "dave.com/alicepic1.png". How can Alice access the photo if Dave is offline?
These are all issues that need attention. Decentralization of the Internet is difficult.
Logical and architectural (decentralized) centralization
The root cause of all of the above problems is the lack of logical centralized storage and indexing. What is logical centralization storage? To answer this question, it helps to understand the three vectors of decentralization in a distributed system:
>Number of computers in the system
>Number of people who can influence the system
> Number of interfaces that external agents interact with the system
To better understand these concepts, you can read the article published before the "Blue Fox Notes", "European Founder V God: How to Understand the "Decentralization" of Blockchain"
The Web 2.0 monopoly solves all of the above problems because they rely on logically centralized storage. That is, when Alice comes back online, she simply sends a request to the central web server, which maintains the central store, which stores all missed messages since Alice went offline. The web service queries the database it controls that contains all user messages and returns the correct message.
The problem with this model is that the Web 2.0 system couples all forms of centralization: they are not only logically centralized, but also centrally politically and architecturally. So, the storage system is logically centralized, but can it be decentralized in terms of architecture and politics?
Fortunately, the answer is yes: IPFS for contract-based storage and Arweave for permanent storage (contract-based storage: store X bytes of data in the Y time period, with Z's searchability guarantee. AWS, GCP, Azure, Filecoin, and Sia are all contract-based storage systems.)
This system is logically central, and decentralized in structure and politics. What does this mean? The best way to understand this is to think about how the computer retrieves basic files (location-based addressing) from other servers on the network and compares them to the IPFS/Arweave method (content-based addressing).
In the web 2.0 architecture, if Alice wants to download images from the server, Alice will go to a URL like:
Website.com/image.png. What happened when Alice tried to access the URL?
Using DNS, Alice knows where to find the server on website.com, and she asks the server that the image it hosts is on the local filesystem "/image.png". Suppose the server wants to collaborate, it checks its /image.png directory, and if the file exists, it will return the result.
Notice how fragile this system is: if the file is moved, changed, or the server is busy, or if the server does not want to cooperate for any reason, the request will fail.
This is the foundation for building webs today.
In a content-based addressing system like IPFS and Arweave, the URL that Alice accesses looks like this:
Although it is not human readable, it is derived from the content itself and is deterministically generated by content. That is, when hashed, the only piece of content in the world will produce the exact string. The magic of IPFS and Arweave is that they handle all the complexity of allowing a computer to parse QmTkzDwW… into a web page.
(screenshot 2019-07-24 11.24.19 AM)
(For IPFS, please refer to the article "IPFS: Blockchain-based Decentralized Storage Network" before Blue Fox Notes)
The content on IPFS and Arweave networks is stored on many machines. Regardless of how many machines are stored on the machine, or where they are located in the world, these protocols resolve addresses like QmTkzDwW… regardless of where they are actually stored.
This is the magic of content-based addressing. It exposes a single logical interface, a content-based address — that will always parse correctly, regardless of where the underlying data is stored, and on how large the computer network (these are architecturally and politically decentralized).
The four main technical challenges mentioned at the beginning of this article, content-based addressing solves three of them (1.3.4), which is to store content, make content downloads accessible, and ensure that hosters cannot read private information. But there is one more thing: how do you know where to look for data?
Although IPFS and Arweave act as logical centralizations, they are architecturally and politically decentralized file systems, which are not databases. That is, there is no way to query them and ask "Please tell me all the messages that Bob sent to Alice between the dates X and Y".
Fortunately, there are several ways to solve this problem:
The first way is to store the message index directly on the blockchain. The blockchain itself is logically centralized, but architecturally and politically decentralized. Using decentralized services such as Graph or centralized services such as dFuse, Alice can query the indexes stored on the blockchain. The blockchain does not store the underlying data, it just stores the hash of the data. Hash is just a pointer to the content stored on IPFS and Arweave. Both Graph and dFuse are now in use, and many applications use a hash of content stored in the chain, while data is stored in a pattern of content-addressing systems.
The second way is to use Textile. Textile built a unique technology called Threads, which acts as a private, encrypted, and personal database on IPFS. Because this database is built on IPFS, it is logically central, but architecturally and politically decentralized. As a logically centralized database, the sender and receiver know where to send and where to read the information. In addition, Textile recently released Cafes, allowing users to build servers to host their Threads (rather than hosting Threads locally). Textile's next step is to build an economic layer that motivates the verifier to host cafes for other users, similar to Filecoin's economics of IPFS.
The third method is to use OrbitDB. OrbitDB is similar to Textile's Threads, except that OrbitDB is primarily designed for public data (for example, to build decentralized twitter), while Textile's Threads are locally integrated for private information encryption and key management (such as P2P messaging). Like Textile, OrbitDB is available, and OrbitDB is developing an economic layer based on the underlying technology.
The last method is that there are many teams that are building efficient traditional databases with different decentralized vectors: Fluence's build will run in a BFT-free, unlicensed environment, and Bluzelle is building a two-tier system. The system has a set of politically centralized master nodes and a set of architecturally decentralized replica nodes.
Given that some smart contract teams are working on large-scale solutions to data availability issues, such as Solana's Replicators (see the previous article in the Blue Fox note, "Why is Solana the "world computer" that blockchain developers need?").
We are skeptical about the addition of the BFT layer to the traditional database. Instead, we chose to invest in "encrypted native" databases such as Textile and so on, as well as developer API layers (such as The Graph and dFuse).
Through the protocol and the above services, IPFS, Filecoin, Arwearve, The Graph, dFuse, Textile, and OrbitDB, there is a clear path for Web 3.0 to produce results. All of these services are already in existence today, and although they are still immature in terms of product readiness and the scale of the network that has been tested in the crypto economy, there are solutions for the most important issues, politically and architecturally decentralized. The system provides a single logical centralization interface.
What is left?
Now that we have a logically centralized but architecturally decentralized solution for storage, indexing, and retrieval, we can consider higher-order logic. E.g:
How does Alice manage multiple identities? For example, Alice may not want to use the same public key on multiple applications such as Facebook/Google/ Snapchat/Reddit. What if she wants to manage these identities on one interface without linking them publicly?
Considering that Alice wants to send Bob private information, but store it on IPFS or Arweave, these are public systems that need to use PFS (fully forward secrecy) to handshake. How do they set up PFS asynchronously and manage all related keys?
Given that the traditional encryption mechanism is only for communication between the two parties, how does the system support private communication (such as message boards or large chat groups) for large groups of people?
How does the system support common UX modes such as group discovery, user data recovery, and content removal?
While these are different technical challenges, I see all of this as a "high-level logic" problem.
Textile's Threads just solve these problems. In many ways, people can think of Textile as iCloud for IPFS. While this analogy is not perfect, it is often useful: iCloud abstracts data backup across devices and applications (provides a better user and developer experience), and Textile provides a variety of more advanced logic tools based on IPFS. Enables developers to seamlessly develop applications while ensuring that users on IPFS seamlessly synchronize and back up across devices.
Looking to the future
The Web 3.0 ecosystem is very diverse in many ways, including the type of problem to be solved, the location of the team, the economic model used, and so on. Although there isn't a logically centralized entity that coordinates the whole thing, the Web 3.0 stack is coming together, which is great. However, this also means that there is a lot of entropy in the system, so it is difficult for people to understand the higher level topics. In this article, I will refine it as follows:
The biggest challenge in transitioning from Web 2.0 to Web 3.0 is a logically centralized but architecturally and politically decentralized system transition from a coupled system with three centralized vectors (logical, architectural, and political).
We believe that Web 3.0 will be a paradigm shift that will unlock trillions of dollars in value in the next decade.
—— Risk Warning: All articles in Blue Fox Notes cannot be used as investment suggestions or recommendations. Investment is risky. Investment should consider individual risk tolerance. It is recommended to conduct in-depth inspections of the project and carefully make your own investment decisions.