Why is Filecoin different? IPFS founder talks about its proof system

Source: IPFS Force Zone Compilation

Original link: https://filecoin.io/blog/filecoin-proof-system/

Like other large technological innovations, blockchain is a combination of several mature technologies that we have used and trusted for decades. The "consensus mechanism" has been studied since the 1970s and developed into a tool to combat spam in the 1990s. It allows users in distributed systems to reach agreements without a central arbiter.

Filecoin is built on proof-of-space changes. It is also related to the proof of equity, because the equity relationship does not just use the token as an equity relationship, but exists with a proven storage amount, which determines the probability of the miner producing a block. When building a distributed storage network, we set out to build a proof structure. In this structure, consensus is achieved by generating data stores. With the release of testnet, we will launch a new set of storage-based proof systems to achieve decentralized consensus.

When we announced Filecoin in 2017, we set out to create a decentralized storage network built on a strong decentralized market. In order to cultivate this market, decentralize market functions, and encourage the participation of early miners, we created an encrypted token, which is a by-product of the Filecoin consensus. This token is generated on the basis of useful work (i.e., useful proof of reproduction and proof of time and space).

These proven stories

Juan Benet recently explored the history of Filecoin's proof structure in an interview with the Zero Knowledge podcast. Here is an excerpt from this interview:

"Filecoin is driving blockchain development in a number of different ways.

 

Proof of replication is a proof system that verifies that storage miners do own the content they have stored and have not cheated. But how do you prove to the web that you are indeed storing something instead of lying?

Filecoin also tries to solve some other interesting problems, including higher throughput consistency and interoperability, and content addressable linked data structures.

After all, all this is to use all the unused storage space on the planet, organize them with incentives, build the most powerful computing storage network, and drive down the price of this storage .

Filecoin's proof of replication is both a proof of storage and a proof of space . The two are slightly different. In Filecoin, data units are stored in so-called sectors. You will seal specific data in sectors on disk with a slow encoding process and submit verification to the blockchain. Sealing is a labor-intensive task. To falsify such a proof, you must use the client's raw data stored on Filecoin to do a specific job.

A proof system is a cryptographic protocol in which there is a prover and a verifier -the prover will prove something to the verifier. For example, in POW, the prover needs to do some calculations, or it takes some calculation cycles. Their typical proof is hash.

Proofs of Storage is a simple proof system that proves that I own certain data. For example: I can prove to you that I have data X, and there is no need to display data X, or the data is a few GB, but in a more concise way.

It's Proof of Retrievability . Not only do I have to prove that I have X, but these proofs can be used to recover X in case I have malicious intentions and want to hide X.

Proofs of Space is another type-I can assure you that I spent a certain amount of storage space. If I save 1gb and then generate a random GB, then I can prove that I stored this random GB instead of storing anything else. This allows miners to use storage space as proof of work.

The interesting part is combining the proof of space with the ordinary proof of data holding-I hope X is useful, not just a random string. The hardest part is creating a proof of space, which is also used to store useful data. This is the proof of replication in the cryptographic protocol of the Filecoin network-as a basic primitive.

The other proof of storage is to create a more trustworthy cloud because they can prove to you that they are backing up your data. But they are completely unused in a normal centralized cloud environment because trust is contractual. Now they are used in the entire decentralized space because this is where we use incentive structures to guarantee things rather than contractual agreements.

We also use SNARKs to prove some replicated practical proofs that produce a large amount of output. We want to do a lot of challenges on these duplicate proofs, and aggregate them so that they can chain react in a very small and compact way. SNARKs is a good method, it gives you a way to prove your correctness, and then you can put this SNARK proof on the chain. Then, the parties can now verify some of the few inputs themselves, as well as the actual SNARK proof, and know that the proof has been generated correctly.

In the process of copying the proof, we use, for example, 32GB, and apply very slow encoding, so as to generate a grid-like graph in a layer where a node may be a 32-byte segment. Generating a graph requires a continuous process and successive hashing of each node. Because of the hash function, it must be done one after the other.

One generated map is DRG (Depth-Radius-Map). They are connected to these expanded maps to form a complex lattice structure. Finally, we encode the raw data into what we call a copy and submit it as a value. You can get the same source data and encode it multiple times, and you end up with multiple different, uniquely encoded copies.

Now that we have done it, to prove that we have coded correctly, we can just sample a few challenges to prove that we have stored this. Suppose we randomly sampled 1000 challenges throughout the proof and then calculated in SNARK. We take the source-encoded data, decode it, and show that it goes all the way back to our promised roots. This is proof that we want simplicity. Otherwise it is a 32-byte "leaf", and the entire Merkle chain going back to the root will be a considerable amount of data, and then multiplied by 1000. If we use 100 KB or MB to generate a proof, we can compress it with SNARK, which I think is about 200B.

A great story about all this work is what we call a proof roller coaster. Over time, you end up creating a lot of different structures, all of which serve different parameters for all of these different use cases.

The choice of this parameter, the choice proven in Filecoin, is the biggest reason we have spent so long posting all of these things. Because you choose a structure that has a specific shape and produces artifacts of a certain size, maybe that's fine, and then you adjust some parameters, like, "Hey, maybe we want the sector to be a little bigger." This makes the other Some parameters must be changed.

Soon you enter a large parameter space with many different variables. Once you adjust one thing here, there will be many other things that must also be changed. Because many algorithms are being optimized, it is very difficult to do complexity management. Because of many such structures and slow coding, you want to prove it slow enough, but fast enough to make it not expensive. Dialing to make it just right is a very difficult challenge, and then pinning a special SNARK structure to ensure that you can do this efficiently and concisely.

All these parameter optimizations can be so stressful and difficult that we actually have to write software to deal with these issues. We have a constraint solver to deal with the constraint optimization problem by choosing the proof structure and parameters in Filecoin. This was an unexpected result, and other groups can now use it to make their lives easier, but we have to write this.

We used a tool called Orient, which is on Github and everything is open source (see Filecoin's parameters in Orient and Ubercalc). It has a special language where you can define specific algorithms and artifacts they generate, then combine them into larger algorithms, and use all these variables and parameters.

Then you can do experimental results, such as how long a certain hash function takes, substitute data into some parameters and then calculate other parameters. For example, based on this hash function and the time spent inside or outside SNARK, this is the special construct you want to use because it can minimize time or minimize the on-chain footprint, and all of this is passed Calculated by this solver.

Manufacturing blockchain technology, because its structure is so complicated-whether it is individual primitives or how they are woven into a chain, we need this software to help us write software. Just like chip manufacturing, chip manufacturing has been smooth until it reaches a certain density, and then they cannot produce chips by hand. They had to start using software to lay out the chips. I think we have achieved this on the blockchain, and some of the structures we are working on need software to help us design.

I don't think other networks use replication proofs. This is one of our strengths. We created this field. This is a differentiating factor.

We are also the only company with this liquid market structure . This structure means optimizing according to a requirement and bidding structure. Under this structure, miners and customers can reason about prices together and then conclude transactions based on this .

I think that we are also the only one that is backed by effective storage . For other networks, this may be a consensus and is supported by a proof of space, but in our case it is useful. These are the three biggest differences between Filecoin.

Then comes the tight integration with IPFS through libp2p, and other things that have been heavily used on IPFS. It would be easy to back up all this data directly into Filecoin. It is worth mentioning that IPFS is an open network, and we have seen other networks begin to increase support for it, which is also cool. For this reason, it should be a decoupling layer. "

 

/ End.