Babbitt Accelerator Technology Open Class | Storj solves distributed cloud storage data problem with erasure code

The Babbitt Accelerator Technology Open Course is a global online Geekhub Global Online program that includes in-depth dialogue and courses. We regularly invite experienced technology makers from around the world to deconstruct blockchain technology online to deliver cutting-edge high-quality blockchain technology content to the Geekhub technology community. Community members can also participate in live interactions to explore blockchains. Technology development and the future.

In December 2018, IDC released the latest version of the white paper "Data Age 2025", which predicts that the total amount of global data in 2025 will reach 175ZB.

How big is the 175ZB?

A search engine tells me that if all 175ZB content is stored on a DVD, the height of the DVD will be 23 times the distance between the Earth and the Moon (the closest distance of the Moon is about 393,000 kilometers), or 222 circles around the Earth. About 40,000 kilometers). The explosive growth of data has placed higher demands on data storage. Cloud storage is increasingly favored by more and more people for its high efficiency and high commercialization. IDC predicts that by 2025, 49% of the global stored data will reside in the public cloud environment. However, centralized cloud storage still has problems such as high cost, low security, and privacy leakage. The advantages of decentralized cloud storage in terms of low cost, high security, and full use of idle resources have attracted more and more attention.

On May 8th, the Babbitt Accelerator-Geekhub Global Online 6th Distributed Cloud Storage Special Event invited Storj technical spokesperson, Storj global market leader, Storj open source partner program leader, Microsoft Azure Blockchain Workbench team member, dimension Kevin Leffew, founder of the Financial Technology Club of Keke University and the founder of blockchain network education, shared Storj V3 related content.

First, the V3 network components

The Storj V3 network consists of three parts: user / client, satellite, and storage node. The user / client is the user or application that uploads or downloads data. The storage node is responsible for storing and distributing data, and obtains corresponding data through storage and bandwidth. Tokens, the more storage you provide, the higher the bandwidth, the higher the incentives you can get; in the Storj system, anyone can run their own satellites or build accounts on trusted third-party satellites .


In the Storj system, a total of eight steps are required to achieve data storage, data retrieval, data maintenance and payment. The first is to retrieve the best node, and the storage node provides storage services to get the corresponding economic return. The shorter the corresponding time, the less the delay, the higher the throughput, the larger the bandwidth, the larger the disk space, the better the geographical location, the longer the normal running space, the more accurate the corresponding number of times, the faster the storage, and the corresponding More motivation. The second step is to encrypt the file and use the erasure code to process the file. Storj believes that using the erasure code can reduce the bandwidth occupation, and colleagues avoid waiting for the long tail corresponding time, which brings huge performance advantages. At the same time, Storj also believes that erasure codes can achieve very high endurance levels at low expansion coefficients. The files are then fragmented and transferred to the storage node, and the corresponding data is also stored on the satellite. In this process, the automatic repair system will continue to operate to ensure that the data can be repaired in time and the network can pay in time. The seventh step is to retrieve, the user uses the metadata to identify the previously stored location, and then retrieve the file fragments; the final step is to decrypt and reassemble the original data on the local device.

Second, the use of erasure code

In this sharing, Storj   It also specifically describes the use of erasure codes in Storj networks.

The erasure code first appeared in the 1950s. As a data protection method, it splits the data file into fragments and expands, codes and stores the redundant data in different locations. The erasure code divides the file into pieces, and only a part of the pieces can be used to reconstruct the file. The file shard and the number of rebuilds can be expressed by the following formula.


At the same time, the erasure code can reduce the data storage space without reducing the reliability of the data. The following table compares the use of erasure codes with the copy method and only the erasure code storage space.


It can be expressed by a simple formula. K represents the minimum amount of data recovery, M represents the amount of data added to provide additional protected redundant symbols after the failure, O represents the number of fluctuating nodes, and N represents the total value of the symbols created after the erasure code process.


At present, Storj V3 can be used for large files, video files, database snapshots, etc. It is believed that the Storj V3 network will bring different experiences to all users!

Third, the community questions:

1. What is the mining certification mechanism of the Storj project now? How efficient is it?

The Storj network is distributed, and although we like our storage node operators, the software does not trust anyone. Storage node work proves to be one of several ways to ensure that only committed storage node operators can join the network. On average, each storage node operator must spend at least a few CPU hours to find a valid network node identifier.

2. How does Storj solve the redundancy problem of storage files?

The Storj network uses erasure codes to solve the problem of data redundancy. First, the erasure code can achieve higher reliability on low scalability. It does not directly link scalability and reliability, which means that reliability can be improved without increasing overall network traffic. Second, the use of erasure codes will take up less hardware space and data recovery costs will not increase. Third, the use of erasure codes requires more CPU time, and the latter uses less than half of the total bandwidth to repair compared to the 9x copy method and the k = 18 , n = 36 erasure codes. It also uses less than one-third of the bandwidth for storage, taking up less than one-third of the disk space. The erasure code is about ten times more durable than the copy method.

3. Now the Storj project mining reward is the ERC20 type Storj Token . It is well known that the Ethereum TPS is about 7 , so how can Storj solve the high-frequency micropayment ?

At Storj , we are building the next generation of distributed cloud storage platforms. We make decentralization possible, so that anyone can rent out their extra hard drive space, just as you can rent an extra room on Airbnb or rent your car on Turo . One of the challenges we face is to implement a payment system that is accurate, timely, and scalable to millions of users. The current version of the Ethereum network can handle approximately 15 payments per second , which means that if the network only processes our STORJ payments, then the entire Ethereum network will take more than 4 hours to complete all of these transactions. At the peak of our payment processing, our transactions accounted for approximately 8% of all Ethereum transactions .

Now we are using Raiden to solve this problem. Raiden is well developed and nodes can easily make payments using local STORJ tokens ( tokens for all node storage and bandwidth payments ) . At the same time, each month, we generate a payment report that includes each node id , summary data, operator wallet address, node creation date, and audit information. We use the node creation date to calculate the escrow accrual and audit information to determine any failures.

4. How to evaluate the IPFS incentive layer project Filecoin? How does Storj compete with Filecoin after it goes live?

First of all, IPFS is not yet online. Although IPFS proposes a new algorithm for "space-time proof", this algorithm does not have any code support. At the same time, IPFS uses the copy method + erasure code, which is inefficient. Finally, Storj Labs has more experience in decentralization, networking, and more. On the Storj V2 version, we built a 100PB network and accumulated a lot of experience. FileCoin is anonymous and does not produce a complete test network. Although Storj Lab is currently in beta, the V3 network is open source, and we are actively working with the community to ensure that our next version is safe to run and is expected to be officially available this fall.

This chart illustrates the mathematical shortcomings of the filecoin method. Use the erasure code + copy method to compare the table


5, because the hard drive mining project is affected by various factors such as geographical location, storage demand, network bandwidth, hard disk space, etc., then if Storj mining in China, what is the yield now?

Trust is very important in storage nodes. Whether these are trusted is a factor that affects the amount of data and bandwidth utilization stored by the storage node. At the same time, each of the following factors is a factor that is credible:

Frequency – update the frequency of trusted nodes

Time – When to update a trusted node in the lifetime of a node on the network

Duration – the time required to establish or update a trusted node

Place of publication – an entity that assigns or establishes a trusted node on the network

Stored Location – Stores the location of the artifact or data associated with the trusted node

Artifact – a description of the data or object used to verify the trusted node

Range of values ​​- the smallest possible and maximum value used to evaluate reputation nodes

Tardigrade Minimum – the smallest possible value used to evaluate whether the node is allowed to store Tardigrade Satellite data

Reputation Factor Scoring Unit – Unit of measure for reputation nodes

Cancellation threshold – the threshold at which the storage node will move to the failed state based on the given reputation node

In a V3 network, there are more factors and nodes that affect trust. At the same time, the transparency of the trusted factors is also very important, because high transparency will make the network more stable and higher performance, and the storage nodes can guarantee high availability, sufficient storage and bandwidth to ensure long-term use. Second, high transparency also ensures the normal operation of the incentive mechanism. Designing a set of incentives and checks and balances is a reusable step to ensure that all types of behavior on the network can be effectively rewarded.

Finally, I would like to thank Han and the original community for their support of this event. To learn the past courses, please click on the live room homepage : Series 1: layer2 expansion 1, Ethereum 2.0 2, plasma 3, chain core expansion technology analysis 4, embrace layer2 5, blockchain scalability and design philosophy series 2: Distributed storage, digital cornerstone 1. Are your "chain" files actually saved? 2, distributed storage combined with public chain incentive engineering practice 3, blockchain storage value belief series three: consensus mechanism VRF-security bridge 1, consensus mechanism introduced VRF brought what? 2, Algorand 3, DFINITY 4, in the real world to achieve fairness: VRF in the use of DEXON Series 4: cross-chain dialogue COSMOS …… Thank you for your support

Reference documentation: