Science | Blockchain data availability issues and solutions

In 2017, all activities in the blockchain were aligned with Ethereum, the price of Ethereum soared, people flocked to create applications (that was before the “buidl” campaign), and big companies began to participate. But this unprecedented success surpassed the processing power of Ethereum.

– That's right… It takes a few days to complete the transaction with that gas fee –

The Ether Cat is the first distributed application to achieve massive success. But it (and other applications) has seized all the resources of the Ethereum network – causing the chain's to-be-packaged transaction pool (mempool) to grow at an unprecedented rate. Overnight, people began to talk about Plasma and state channels, thinking that these would be the way to solve our expansion problems. However, apart from those discussions and excitement, the development of these solutions has been difficult, and some community members have therefore begun to doubt the feasibility of these programs. But now, as these programs get online, people realize that the real problem that has plagued developers for a long time is an obvious data availability problem.

This article will initially discuss the background of data availability issues and how different Layer 2 network solutions can solve this problem, including: Plasma, state channels, and elastic sidechains.

Data availability problem

As Vitalik explained before, the problem with data availability is that the malicious miners attempt to publish a block that has a block header but loses some or all of the data to the chain. This type of attack can cause:

  • The network is spoofed to accept an invalid block, and there is no way to prove the invalidity of this block.
  • Prevent the node from getting the current state.
  • Prevent nodes from creating blocks or transfers because they lack the appropriate information to build the proof.
But data availability is not just about hiding block data. In general, as long as some data is hidden by some participants and is forbidden (also known as a review mechanism), it can be called a data availability problem. As far as we know, this is not a problem on the main network, but it is also costly. In fact, over the past 18 months, we have seen a 6.5x increase in the number of states stored on each Ethereum node (using the geth software and enabling fast synchronization).

Obviously, this is not sustainable for a true decentralized network. As the blockchain grows in size, the number of computers capable of participating in the network and acting as nodes will continue to decrease. So how do we deal with this situation?

Very simple! Start and end events on the chain, but only let the client handle all events that occur during this process. In fact, this is the core of all executive/tier 2 extension solutions. We deal with everything from the chain to the underlying chain as the settlement layer for the underlying transaction. But this poses a problem: the clients participating in the Layer 2 network need to maintain all the chain transactions associated with them, otherwise they can only rely on the control of others.


Suppose you go to the casino to play poker. At the beginning, you go to the counter to exchange dollars for poker chips (you can think of this process as a chain transaction). Then, you start sitting at the table and playing poker for a few hours (these are called chain transactions) – sometimes you can win, sometimes you can only lose money. After winning a big hand, you tell the casino that you have to redeem the chips.
But after you got up, someone gave you a sap and a sap. After waking up, your memory is a bit fuzzy, and you can't remember the details of the poker game (this is "data not available"). When you are not there, the person at the poker table decides to pretend that the last hand has not occurred and continues to play on the ear before the hand happens – that is, to defraud the money you should have won.
If a similar situation occurs on the blockchain, then this kind of cheating is impossible because the world knows what happened and what didn't happen. But since this is all under the chain, and you lose your trading history, you must accept the history of the surrounding peers telling you.

in practice

In Plasma , each participant must maintain a complete transaction history and sufficient witness data to prove whether their cryptographic assets were traded in each Plasma block. This makes each participant a node in the Plasma system but it only stores its own transaction data. This requirement is made because on the Plasma, anyone can collude with the operator of the chain and submit invalid transactions to steal other people's assets. The only way for participants to prevent this from happening is to ensure that they have a complete and valid transaction record for all assets.
The status channel has lower requirements for data because all participants agree on the current state only, rather than agreeing on status updates (eg, transactions). This allows the contract to be settled with just one transaction, without having to recalculate any transaction history. Moreover, since each state has an auto-incrementing nonce, and unless signed by both parties, the smart contract does not treat it as valid, so participants only need to store the latest state.
Note: Participants in the status channel may also want to store historical status in order to settle in an earlier, more favorable state when the other party loses its status history (participants may still want to cheat by saving history) .


Now, the relevant teams are doing everything they can to reduce the space that needs to be maintained by the client or submitted to the main network via ZK-SNARKS or RSA accumulators. This is a big improvement, but they can't solve it. Data availability issues. In fact, we can't really solve this problem for a single client, because it will require the client to be online 100% of the time and never lose the data stored on it (sounds much like a blockchain, isn't it?) .
However, given that hardware that meets this requirement does not exist, it is widely believed that the solution to the data availability problem is an motivated watchtower network (eg, PISA) or a similar construct. These motivated networks are actually made up of a group of cherished watch towers that back up data for paying users and challenge users for suspicious transactions when they are unable to file an objection (ie, go offline). If they fail to challenge the challenge within a certain period of time, they will lose their rights and these rights will be awarded to the new watchtower on the network participating in the challenge (assuming it did submit a challenge). This error/reservation protocol has multiple levels, so users can be assured that they will not be spoofed while offline or losing transaction/status history.
The reason why these solutions took so long was because community members used to ridicule the idea of ​​trusting a third party and wanted to propose a solution that would solve the problem without a third party. As the unreality of this idea became clearer, people began to propose various cryptographic economic models (such as the above model) to alleviate the need to trust these third parties.

SKALE's solution

SKALE's flexible sidechain solves data availability issues through the block proposal process. Once the certifier creates the block proposal, it will communicate it to other certifiers using the data availability protocol described below. The agreement will ensure that block proposals are transmitted to the vast majority (>2⁄3) of certifiers.
The five-step protocol is described as follows:
  1. Verifier A of the sending block sends the block proposal and the hash value of all transactions that make up proposal P to all of its peer nodes.
  2. After receiving the P and the associated hash value, each corresponding node matches the hash value with the transfer in the local to-be-packaged transaction queue to reconstruct P. For transfers that are not found in the queue, the nodes will send a lookup request to the certifier A who sent the transfer. The sender certifier A then sends the body of these transactions to the recipient certifier, allowing the peer to reconstruct the block proposal and add the proposal to its proposal storage database PD.
  3. After that, the nodes will send a receipt with A threshold signature to A.
  4. After the signature of the absolute majority (more than two-thirds) of the nodes is collected from the node (including A itself), A will create an absolute majority signature S. This signature will serve as a proof that most of the verifiers have P.
  5. A will broadcast this absolute majority signature S to all other certifiers.
Note: Each certifier has a BLS private key PKS [I]. The initial state of the key slice is performed using the Joint Federman Distributed Key Generation (DKG) algorithm, which is run when creating elastic sidechains and whenever the verifier is randomly swapped. Check out our articles on BLS and DKG for more information!
In a further consensus step, all certifiers who vote on proposal P need to provide a data availability receipt, so they must include an absolute majority of the signature S in the vote, and the honest verifier ignores all those that do not contain an absolute majority signature S. vote. Therefore, assuming an honest majority of honest certifiers, this agreement guarantees data availability, meaning that any consensus-winning proposal P will be open to any honest certifier.

to sum up

All in all, if you want to know what developers who are working on an implementation solution in the past 18 months are busy, it's likely that most of their time was spent on solving this problem. Although there is no perfect solution for all the expansion problems, there are still many new and exciting work going on, and we are full of confidence in the future!

understand more

If you are interested in trying SKALE, join the SKALE community on Discor and check out the developer documentation! In addition, you can check out SKALE's technical overview and consensus overview at any time to gain insight into how SKALE works and why it delivers 20,000 TPS.
Original link:>
Author: Artem Payvin

Translation & Proofreading: TrumanW & Ajian

(This article is from the EthFans of Ethereum fans, and it is strictly forbidden to reprint without the permission of the author.