Conversation with Qi Zhou, founder of EthStorage: Data Availability and Decentralized Storage
Talk with Qi Zhou, founder of EthStorage: Decentralized Storage and Data AvailabilityIntroduction
This is the final episode of the decentralized Rollup interview series, which explores the decentralization of Rollup from the perspective of “data availability and decentralized storage.” We invited Qi Zhou, the founder of EthStorage, to discuss how DAs can reuse the security attributes of the Ethereum mainnet, EIP-4844 and danksharding, and compare the security of different DA models. Mr. Zhou also introduced how EthStorage can be combined with EIP-4844 in the next Ethereum upgrade.
Guest Introduction
I am very pleased to share with you our ideas about the entire Ethereum DA technology and the decentralized storage we have done on it. I joined the Web3 industry full-time in 2018 and had previously worked as an engineer at Google and Facebook. I also have a PhD from the Georgia Institute of Technology. Since 2018, I have been following and working on Web3 infrastructure, primarily because I have done similar work in large companies, including distributed systems and distributed storage. I also believe that there is a lot of room for improvement in this area in the entire blockchain. Whether it’s the various things we did at the beginning, such as the execution shard technology. This is Ethereum’s shard 1.0, to Ethereum’s shard 2.0, which is called data sharding, and later on, data availability. In fact, they are all innovations and work around the proof of the entire Web3 infrastructure.
- Decoding RWA: The most valuable wealth password in a compliance context
- AzukiDAO proposes to initiate a proposal to bring a lawsuit against founder Zagabond.
- Arrington Capital submits application for XRP-based hedge fund to the U.S. Securities and Exchange Commission
So we also work closely with the Ethereum roadmap, learn and research, and participate and improve in this community. At the end of last year, we were honored to receive support from the Ethereum Foundation for our research on “data availability sampling.” We did some theoretical work for the Ethereum Foundation, including research on danksharding, and how data can be effectively recovered. At the same time, we also developed EthStorage, an Ethereum data layer based on Ethereum’s DA technology. We can use Ethereum’s smart contracts to verify off-chain data storage on a large scale. This is also very meaningful for Ethereum. So I am very happy to share with you today how EthStorage can better build a data storage layer network on DA technology.
Interview Section
Part 1: Discussion on the Definition of DA
How does Data Availability (DA) ensure the safety of Rollup?
First of all, in the process of researching DA, I found that many people have some misunderstandings about the definition of DA. I am very happy to discuss it today. Previously, I discussed DA with many members of the Ethereum Foundation, such as Dankrad Feist, and the important role DA plays in the entire Ethereum L2.
Earlier, we mentioned some basic working mechanisms of Ethereum Rollup, how to move these on-chain transactions to off-chain, and then through a series of proof methods (fraud proof and validity proof) to tell L1 smart contracts that these execution results can be proved to be correct by these proof methods.
So, a very important core is that they hope to reuse the security of the Ethereum network itself, but at the same time greatly expand the entire computing power of Ethereum. Just now we said that computing power expansion is actually putting on-chain computing off-chain. So how can we achieve the security of Ethereum at the same time?
For example, in the case of Optimistic Rollup, how to ensure that someone can challenge the sequencer for doing malicious things, and here it is very important to know what the specific original transactions off-chain look like. If the metadata of each off-chain transaction cannot be obtained, then I cannot find the original transaction record to challenge the sequencer on-chain. Therefore, DA can ensure security because it needs to make the metadata of each off-chain transaction available on-chain.
Expanding Block Space
Because all our transaction data needs to be uploaded to the chain, even if it does not need to be calculated, we will still generate huge transaction data. Then the core problem it needs to solve can be understood as a very effective technology to expand the space of the block. If you are very familiar with the structure of the entire blockchain, you will understand that each block contains a lot of transaction content. We call this transaction block itself the block space.
Currently, the space of each block in Ethereum is about 2-300 KB. However, this number is obviously unable to meet the needs of Ethereum expansion in the future. Here, we can do a very quick calculation: 200 kB of space, divided by about 100 bytes per transaction, gives a number of 2000 transactions. 2000 transactions divided by Ethereum’s block time of 12, which means that Ethereum’s TPS limit is limited to around 100. So this is actually a very small number for the entire Ethereum expansion plan.
So, Ethereum L2 is concerned with how to ensure security while putting a large amount of block data into the block space. Then, both fraud proof and validity proof can reuse the data in Ethereum’s block space to perform corresponding checks. Finally, the security of the off-chain transaction calculation result can be guaranteed by Ethereum. This is basically the relationship between DA and Ethereum’s security.
Understanding DA from the Perspective of Network Bandwidth Cost and Storage Cost
The main costs of DA are two-fold: network bandwidth cost and storage cost.
In terms of network bandwidth cost, for example, in a P2P network, the current block broadcast method of Bitcoin and Ethereum is through gossip (broadcast) to all P2P nodes, telling everyone that I have a new block that looks like this. The advantage of this network approach is that it is very secure and all network nodes will eventually receive a backup.
The disadvantage is that it will have a large overhead on the network bandwidth and latency. We know that Ethereum produces one block every 12 seconds after the POS upgrade. If the block is too large and may exceed 12 seconds, a large number of blocks cannot be produced, and the entire network bandwidth will eventually drop to a level that everyone cannot accept. So you can think of DA as solving the bandwidth problem of a large amount of blockchain data going on-chain.
The second is its storage cost. In fact, the Ethereum Foundation has had a lot of discussions in this regard. In the design of the core solution, it will not keep the block data uploaded by the entire DA all the time.
This brings up another question. When I have so much data on the chain, but it will be discarded by the Ethereum protocol after one or two weeks. So, do we have some better decentralized solutions to save this DA data?
This is also one of the original intentions when we designed EthStorage. On the one hand, many Rollups need more time to store data. On the other hand, with this data, I can actually use DA to better complete some whole-chain applications. For example, the whole-chain NFT or the front-end of many DApps, including a large number of articles or comments written by everyone in social networks. All of these can be uploaded to the entire blockchain through the DA network at a lower cost and can obtain the same security guarantee as Ethereum L1.
This is our entire technology for researching Ethereum DA, including discussions with many core Ethereum personnel. We found that Ethereum needs a storage layer, which is a decentralized storage layer that does not require an upgrade to Ethereum’s own protocol, or a modular storage layer to solve the problem of long-term data retention.
Part 2: Discussion on Different DA Schemes
Relationship between EIP-4844 and Danksharding, and why EIP-4844 needs to be deployed
Proto-danksharding, also known as EIP-4844, I think can be regarded as a very significant upgrade for Ethereum next. The reason why 4844 is to be done is because there is a very important reason, Ethereum’s genes will be in the estimated Ethereum shard upgrade route, that is, Danksharding’s time, they think the whole upgrade time is quite long, for example, it may take three to five years. This was the case in 2021 and 2020.
In this process, they predicted that there will soon be a lot of Rollups running on Ethereum, but because the data interface provided by Danksharding is completely different from the Calldata data interface used by Rollup now. This will cause a large number of Ethereum applications to be unable to upgrade quickly because of the new interface, and to seamlessly obtain the benefits that Danksharding brings to them.
When I attended Devcon last year, Vitalik also mentioned that he hoped that Ethereum could better serve these Layer 2s, so that they could develop their contracts under the same Danksharding interface. When Danksharding is upgraded, they can directly inherit the new benefits provided by Danksharding without having to upgrade their existing and already tested contracts.
So EIP-4844 is actually a super simplified version of Danksharding, which provides an application interface similar to Danksharding, including a new opcode called Data Hash; and a new data object called Binary Large Objects, which is Blob.
These data objects are designed to make Rollup compatible with Danksharding’s data structure in advance, which means that Danksharding will provide similar concepts such as Data Hash and Blob. However, through EIP-4844, they implemented these ideas in the next Ethereum upgrade in advance. Therefore, in the entire design function of EIP-4844, you can go to see their interface and new instructions added, and you can vaguely see the future of Danksharding, how to interact with the application layer on Ethereum.
From the application perspective, Ethereum also thinks about how to enable applications to better enjoy various expansion technologies on Ethereum by upgrading in advance, without additional upgrade costs.
However, there is a problem that EIP-4844 does not solve the problem of the entire block space expansion, and Danksharding can solve it. Currently, the Ethereum block space is about 200 KB. After Danksharding, the planned size in the specification is 32 megabytes, which is nearly 100 times the increase. Therefore, the current EIP-4844 does not actually solve the bandwidth problem of block on-chain.
How Danksharding solves the problem of block space expansion
Under the design of 4844, data is broadcasted on the chain in the same way as before using calldata, which is through P2P broadcasting. Therefore, this broadcasting method will ultimately be limited by the physical bottleneck of the entire P2P network bandwidth. The design of Danksharding changes the P2P network broadcasting and uses data sampling technology to enable everyone to know that these block data can be downloaded without having to download all the block data.
In fact, in a sense, it is similar to the ZK method. Through data sampling, I know that there is block data brought by Danksharding in the network with a size of (32 megabytes / block). But I don’t need to download all 32 megabytes of data and save them locally. If there is enough machine bandwidth and storage space performance, it can also be done, but for ordinary validators, they do not need to download all 32 megabytes of data.
Some development and experience of EIP-4844 testnet
We have recently launched our internal EIP-4844 testnet and deployed corresponding contracts for testing, including blob data uploading, contract invocation, and data verification, all of which have been fully connected. Therefore, once EIP-4844 goes live, we can deploy our contracts in the first time.
At the same time, we also hope to work with some developers of Ethereum and our already developed contracts to provide time for the development, learning, and various tools for Ethereum’s various rollup developments.
Therefore, we have recently submitted a lot of code to Ethereum for the toolset of EIP-4844, including new smart contracts to support opcodes, because solidity cannot support the data hash opcode. So all of our work is actually synchronized with some developers of the Ethereum Foundation.
Application and Limitation of the Data Availability Committee (DAC)
Because now more than 90% of the expenses paid by L2 users are for data availability, many L2 projects, including ZKSync, have launched ZKPorter and Arbitrum has launched Arbitrum Nova, in order to better reduce the cost of data upload. They provide their own data layer by providing their own DAC data availability committee.
This data committee will bring some additional trust to achieve an additional level of security similar to Ethereum. So when they choose the data committee, they usually choose some big names, such as data service providers or big companies to participate in the storage of this data. However, it will also face many challenges and doubts, because everyone thinks that this actually violates the principle of decentralized and non-accessible, that is, everyone can participate. But now the situation is that most of the data committees are some organizations that are very close to Layer2 project parties.
Like Arbitrum Nova, when I last checked, there may be six or seven such nodes. For example, the data committee nodes running on Google’s cloud or Amazon’s cloud are used to save this data, and they can provide all execution costs on Arbitrum Nova. The advantage of this is that its current execution cost is about one-thousandth of Ethereum, because it does not need to write all the data to Ethereum’s Layer1. However, it is still relatively centralized, so relatively high-value applications will still have relatively large concerns, because if there is a large amount of funds, tens of millions or hundreds of millions of funds, then he must trust the data committee’s data is available.
So when we designed EthStorage, there was actually no concept of any data committee. In the design process, we hope that everyone can participate and become a data provider. And they use encrypted proofs to prove that they have indeed stored this data. Because theoretically, in the data committee model, although I say that I have seven or eight data committee nodes, in fact, I can completely save only one physical data, but I can show that I have seven or eight addresses that can provide this data.
Then how to prove that my data has enough physical replicas to ensure data security. In fact, this is a very important innovation we made when we did EthStorage, and it is also a key point that we emphasized when we went to promote it with Ethereum Foundation ESP (Ecological Support Program). We use the ZK encryption technology used by EthStorage to ensure that the Layer2 data provider nodes can join without access and can prove that they have so many storage replicas, and can better ensure data security.
So I think DAC is really a temporary solution to solve the cost of uploading data to Layer1. We believe that through some encryption technologies of EthStorage, combined with some proof verification methods based on Ethereum contracts on Layer1, we can provide a better solution for data storage. Next, as Ethereum 4844 goes online, we will actively share these innovative contents and the results of their operation on the network with everyone.
EthStorage vs. DAC
EthStorage is actually an Ethereum storage rollup, a Storage rollup. So we can assume that a Layer 2 is not an execution of Ethereum EVM, but a very large database, or a key value database. It can be up to 10 TB, hundreds of TB, or even PB-level.
How can I ensure that the data in my database is as secure as Ethereum? First, we need to publish all these large-scale data in the database to Ethereum Layer 1 through DA, so that everyone can see that these data can be obtained in the entire DA layer of Ethereum. However, we cannot guarantee that it can be obtained permanently, because Ethereum DA will discard these data in about two or four weeks.
The second step is to upload these data and then save them on our Layer 2 node. This is different from DAC. Our data storage nodes are permissionless and anyone can participate. And it proves its storage, and then gets the corresponding return. This method is inspired by the storage proof design of systems such as Filecoin and Arweave. However, we need to specifically design a storage proof network and proof system for the Ethereum DA framework and Ethereum smart contracts. So in this regard, we believe that we have a very unique contribution to the entire Ethereum ecosystem and even the entire decentralized storage field.
Storage proof mechanism
Basically, all storage proof mechanisms, including Filecoin and Arweave, need to encode the user’s metadata first. However, this encoding process needs to be based on the data provider’s address, that is, each data provider needs to have its own different address, and then encode it based on its address and metadata to save something called unique replica. For example, the data “hello world” may be saved in four or five different physical machines in traditional centralized databases or traditional distributed systems, and each save is “hello world”. However, in EthStorage, it saves four or five or even ten or twenty, and its “hello world” will be encoded into different data according to the address of each data provider, and then saved in different places.
The benefit of this approach is that we can use cryptographic mechanisms to prove that there are so many different addresses, which are different storage providers. They encoded the data and made corresponding storage proofs based on the encoded data. Basically, Filecoin and Arweave are similar. But they only target static data, while we are targeting the hot data of Ethereum DA. And it can be verified through Ethereum smart contracts that there are so many physical copies of this data. That is to say, for each encoded data, we can prove that these encoded data are stored in this network, and each encoded data corresponds to a different data itself, because it is encoded based on the addresses of different storage providers.
So basically, in the design process, we optimize and improve some of the ideas of the existing decentralized storage. But at the same time, we also need to optimize the DA scheme of Ethereum, including the modification of dynamic data, how to effectively prove and optimize gas consumption on Ethereum contracts. There are many cutting-edge technologies and research that need to be completed.
How EthStorage maintains permissionless storage proofs
Ethereum has a type of node called archive node, which saves the historical records of all Ethereum transactions, including the world state. However, a huge challenge in Danksharding is that Danksharding plan will generate about 80TB of data in one year. Assuming Ethereum runs for three to four years, it will generate 200 to 300TB of data, which will continue to increase. This will actually pose a lot of challenges to archive nodes, because in the process of running archive nodes, it has no additional token economic incentives to encourage everyone to save this data.
EthStorage first needs to solve the problem of token economic incentives for permanent data storage. In this regard, we actually use Arweave’s discounted cash flow model to achieve incentives. And at the same time, it is very effective to execute it on the entire smart contract.
The second is its permissionless way. Because our incentive design encourages 10, 50, or even 100 nodes to save data in the network. Therefore, for any node, it can contact any node among them, synchronize the corresponding data, and then become a data storage party. There may be more optimization designs for data incentives.
Third, the storage node may have a very high cost because it needs to save all the data at once, which may be several hundred TB or even reach a PB level of data in the long run. Therefore, we further developed something called data sharding. In this way, for an ordinary node, it only needs a capacity of about 4 TB (currently designed as 4 TB, but it may be upgraded to, for example, 8 TB later) to save a part of the archived data in the network. However, we use some incentive mechanisms to ensure that everyone can save all this data in our layer2 network after splicing them together.
So there are many problems in this process, such as the problem caused by the large amount of data in the archival node, the incentive problem of tokens, and the decentralization access problem… These problems can be automatically implemented through Ethereum’s smart contract deployment in layer1. For us, we just need to provide a data network, so that everyone can download the data as long as they have enough data costs and generate storage proofs, submit them to the Ethereum network, and then achieve corresponding returns. Our entire contract has been basically designed and has begun debugging on the Ethereum 4844 Devnet.
We will continue to update Blocking; if you have any questions or suggestions, please contact us!
Was this article helpful?
93 out of 132 found this helpful
Related articles
- Digging into on-chain data: What’s behind DODO’s 2.6 billion monthly trading volume?
- Top 10 Web3 Events in the First Half of 2023 Inventory
- Page 181: A Complete Review of Coinbase’s Legal Documents – Strong Response to SEC Lawsuit, Seeking Dismissal of Case.
- Conversation with EthStorage founder: Data Availability and Decentralized Storage
- How infrastructure supports billions of users through account abstraction
- Page 181: A Comprehensive Review of Coinbase Legal Documents: Strong Response to SEC Lawsuit and Seeking to Dismiss the Case
- The growth engine of the DeFi field: a detailed explanation of the four innovative mechanisms of Uniswap V4