Filecoin Ultimate Guide: An Overview of How It Works, Parsing Protocols, and Possible Improvements

Original author: Vaibhav Saini description link: medium build: Continental Star Editor's Note: The original title is "[will] Filecoin most comprehensive collection of useful Ultimate Guide"

Since the decentralization revolution began in 2009, many promising projects have emerged and changed our perception and lifestyle in this world. Protocol Labs is one such project, which has spawned projects like IPFS.

IPFS lacks an incentive layer to help its mass adoption, so its ultimate goal is to replace HTTP. This is where Filecoin comes in. Since its release, Filecoin has generated a lot of interest in the community. With the launch of the test network in December 2020, you can explore many things.

There is a lot of information about its technology and economy on the Internet that is both confusing and overwhelming. So here we have merged all the information available in "One Source". If you like high-tech Web3 concepts like Filecoin and have simple explanations with interactive tutorials, head here.

First, we'll discuss the technical aspects of file currencies, and then discuss their economic aspects in the next article. But before delving into the core technologies, let's analyze the state of the file storage market today.

State of the file storage market today

Today, Amazon S3 has become the main force for file storage on the Internet. There are many reasons: 1. Incredibly cheap : $ 0.023 per GB of storage. 0.04 cents per 10,000 read requests. It's too fast . 3. This is reliable : Well, it has had several downtimes, effectively taking most of the internet offline. But still has 99.9% uptime. 4. It is highly scalable. 5, and provides a great development experience . It can easily integrate with other Amazon service suites to scale (such as CloudFront). In a world where we have such a great cloud storage service, any competition must perform better than this, or at least the same level. In a small area, decentralized networks do not work properly. However, if it (IPFS) is adopted on a large scale (higher adoption rate than BitTorrent), it may prove that it is a better version of the Internet and will therefore open up a whole new economy.

Technology Overview

We divide it into 4 parts: 1. Overview of how the Filecoin network works? 2. In-depth research on the Filecoin protocol; 3. Other issues (not discussed in the white paper); 4. Possible improvements to the Filecoin protocol

01 Overview of How Filecoin Network Works

There are 3 groups of users in Filecoin: clients, storage miners and retrieval miners. Customers pay to store and retrieve data. They can choose from the available service providers. If they want to store private data, they need to encrypt it before submitting it to the provider. Storage miners store customer data for rewards. They decide how much space they are willing to reserve for storage. After the client and the storage miner reach an agreement, the miner is obliged to continue to provide evidence of its stored data. Everyone can view the evidence and make sure the storage miners are reliable. Retrieve miners to provide customer data on their request. They can get data from clients or storage miners. Retrieving miners and customers use micropayments to exchange data and coins: the data is divided into several parts, and customers pay a small amount of coins per piece. Retrieval miners can also act as storage miners.

Finally, the network represents all complete nodes that verify the behavior of clients and miners. These nodes count available storage, check storage certifications, and repair data failures.

Some terms used in this article:

Fragment: A fragment is a part of the data that a client stores in a decentralized storage network. For example, data (possibly a directory) can be intentionally divided into many parts, and each part can be stored by a different set of storage miners.

Sector: A sector is some disk space provided to the network by Storage Miner (can be considered a unique ID associated with a specific part of the disk space of a specific storage provider). Miners store customer's items in their department and earn tokens for their services. In order to store fragments, storage miners must assure their sectors of the network.

AllocTable: AllocTable is a data structure that tracks parts and their allocated sectors. AllocTable is updated on every block in the ledger, and its Merkle root is stored in the latest block. In practice, this table is used to preserve the state of the DSN for quick lookup during verification validation.

Order: An order is a statement of intent to request or provide a service. Customers submit bid orders to the market to request services (the storage market for storing data and the retrieval market for obtaining data), while miners submit request orders to provide services.

Order book: An order book is an order set. Filecoin maintains separate orders for the storage market and the retrieval market.

Commitment: A commitment is a commitment to provide storage (especially a sector) to the network. Storage miners must submit pledges to the ledger (Filecoin Blockchain) to begin accepting orders in the storage market. The pledge includes the size of the pledge department and the collateral stored by the miners.

Users share their intent by placing an order. The customer submits a bid order, specifying the price to be paid. The miner submits an asking price order and specifies the price to be charged. When the buy and sell orders match, both the customer and the miner sign a transaction order and submit it to the blockchain. Bid and asking orders together constitute the storage market (document storage market) and the retrieval market (document retrieval market). Let's dig into these markets and see how they work.

Storage market

It is a decentralized exchange operated by the network, and all asking prices and bids are stored in the blockchain for storing data on the Filecoin network.

The customer submits a bid order to the stored order book (using the PUT protocol, described in the next section). Customers must deposit the coins specified in the order and specify the number of copies they want to store. Customers can submit multiple orders or specify a replication factor in the order. Higher redundancy (higher replication factor) results in higher tolerance to storage failures (described below).

Storage miners guarantee their storage on the network by depositing collateral through pledged transactions in the blockchain through Manage.PledgeSector . The collateral (document currency) is stored during the time the service is provided, and if the miner generates storage vouchers for the data it promises to store, it will be returned. If some storage certifications fail, a percentage of the collateral is lost. Once the pledged transaction appears in the blockchain, miners can provide their storage in the storage market: they set prices and add asking orders to the market's order book.

All storage allocations are common to every participant in the network. In each block, the network will check if there are proofs required for each job, check whether they are valid, and take corresponding measures: 1. If any evidence is missing or invalid, the network will use the collateral of the storage miner Punish them. 2. If a large amount of evidence is missing or invalid (defined by the system parameter Δfault), the network will consider the Storage Miner to be faulty, settle the order as a failure, and then re-introduce the same new order to the market. 3. If each Storage Miner storing the miner is faulty, the miner will be lost and the customer will get a refund.

Search market

This is an off-chain exchange where clients and retrieval miners discover each other in a peer-to-peer manner. Once customers and miners reached an agreement on prices, they started using micropayments to exchange data and coins on a transaction-by-transaction basis.

Let's see how it works.

Retrieval miners announce their work by spreading their asking prices on the web: they set prices and add them to the market's order book.

Taken together, the following figure shows all the activities that take place in the network.

02In-depth study of the Filecoin protocol

Filecoin introduces the concept of a decentralized storage network (DSN) . DSN is a scheme that describes a network of independent clients and storage providers. DSN aggregates storage provided by multiple independent storage providers and coordinates itself to provide clients with data storage and data retrieval. Coordination is decentralized and does not require trusted parties: The security operations of these systems are implemented through protocols that coordinate and verify the operations performed by each party. DSN can adopt different coordination strategies, including Byzantine agreement, gossip agreement or CRDT, depending on the requirements of the system. DSN involves the implementation of three functions: put, get, and manage. Put allows clients to store data under unique identifiers. Get allows clients to retrieve data using identifiers. The orchestration network manages the space available for rent, audits suppliers, and fixes possible data errors. Management protocols are usually run by the storage provider along with the client or auditor network (this involves Byzantine failures and will be discussed below).

DSN has several attributes. The first two are required. 1. Data integrity means that the client always receives the same data as the storage, and the storage provider cannot convince the client to get the wrong data. 2. Retrievability simply means that the client will be able to retrieve its data over time.

DSN optional attributes:

1. Public verifiability allows everyone on the network to verify that the data is being stored without knowing the data itself.

2. Auditability allows to verify that data is stored within the correct time period.

3. Incentive Compatibility is designed to reward excellent service providers and punish inferior providers.

4. Achieve confidentiality: Clients who want to store their data privately must encrypt their data before submitting it to the network.

Fault tolerance DSN can tolerate two possible failures: Management failures: These failures are Byzantine failures caused by participants (storage providers, customers, and auditors) in the management agreement. The DSN scheme relies on the fault tolerance of its underlined Manage protocol. Violating management fault tolerance assumptions can compromise the vitality and security of the system. For example, consider a DSN scheme where the Manage protocol requires the use of Byzantine protocols (because nodes can audit them) to audit storage providers (if they are storing all data that should be stored according to the terms of the agreement). In such a protocol, the network receives storage certificates from the storage provider and runs the Byzantine Agreement (BA) to agree on the validity of these certificates. If the BA tolerates no more than f and if the total number of failures reaches n, then our DSN can tolerate f <n / 2 failed nodes. In violation of these assumptions, audits can be affected, rendering the entire system useless. Storage errors : Storage errors are Byzantine errors, they prevent clients from retrieving data: that is, storage miners have lost fragments, and retrieval miners have stopped providing fragments. Successful Put execution allows execution (f, m) if its input data is stored in m independent storage providers (n total) and can tolerate up to f Byzantine providers. The parameters f and m depend on the implementation of the protocol; the protocol designer can fix f and m or leave the choice to the user, thereby expanding Put (data) to Put (data, f, m). If it is less than f, the execution of Get on the stored data is successful. Wrong storage provider. For example, consider a simple scenario in which the protocol is designed so that each storage provider stores all data. In this scheme, m = n and f = m-1. Is always f = m-1? No, some schemes can be designed using erasure coding, where each storage provider stores a specific part of the data, so x of m storage providers are needed to retrieve the data. In this case, f = MX.

Consensus algorithm

The Filecoin DSN protocol can be implemented on top of any consensus protocol that allows verification of Filecoin proofs. Proof-of-work solutions often need to solve difficult problems whose solutions are not reusable or require a lot of wasted computing to find.

Unreusable work: Most unlicensed blockchains require miners to solve a difficult computational problem, such as reversing a hash function. Often, solutions to these problems are useless and have no intrinsic value beyond protecting the network. Some blockchains such as Ethereum (executing smart contract logic) and Primecoin (finding new prime numbers) try to utilize some computing power to accomplish useful work.

Waste of work : Solving problems is indeed very expensive in terms of machine and energy consumption, especially if these problems depend only on computing power. When mining algorithms are embarrassingly parallel, the main factor in solving problems is computational power.

Try to reduce waste : Ideally, most of the network's resources go to useful work. Some efforts require miners to use more energy efficient solutions. For example, Spacemint requires miners to be dedicated to disk space rather than computing. Although these disks are more energy efficient, they are still "wasted" because they are full of random data. Other efforts have replaced traditional problem-solving methods with traditional proof-of-stake-based Byzantine agreements, in which stakeholders vote on the next block in a manner proportional to the currency share in the system.

Therefore, the work done by Filecoin miners is not to waste wasted proof-of-work calculations, but to enable them to participate in consensus.

Useful work: If the results of the calculation are valuable to the network, not just to protect the blockchain, then we think the work done by the miners in the consensus protocol is useful.

Filecoin proposes a useful working consensus protocol in which the probability that a network elects a miner to create a new block (we call it the miner's voting right) is directly proportional to the storage space they are currently using on the network. The design of the Filecoin protocol makes miners prefer investing in storage rather than investing in computing power to parallelize mining calculations. Miners provide storage and reuse calculations to prove that data was stored to participate in consensus.

Modeling mining capabilities

Power Failure Tolerance : In this technical report, power failure tolerance is an abstract form that can reconstruct Byzantine faults based on the impact of participants on the outcome of the protocol. Each participant controls some power, where n is the total power in the network and f is part of the power controlled by the defective or hostile participant.

Filecoin power : In Filecoin, the power p of miner M at time t is the sum of M's memory allocations. Affecting my M is the fraction of M's power over the total power in the network. In Filecoin, power has the following attributes:

1. Public: The total amount of storage currently in use on the network is public. By reading the blockchain, anyone can calculate the storage allocation for each miner-so anyone can calculate the amount of electricity and total amount of electricity for each miner at any point in time.

2. Publicly verifiable: For each storage allocation, miners are required to generate space-time certificates to prove that services are being provided. By reading the blockchain, anyone can verify that the rights claimed by the miners are correct.

3. Variables: At any point in time, miners can add new storage to the network by promising a new sector and filling it. In this way, miners can change the amount of electricity they have over time.

To learn more about how this feature works (mathematically) in consensus algorithms, see the white paper.

We also need a mechanism to prevent malicious miners from exploiting three types of attacks to obtain rewards for unsupplied storage: Sybil attack, Outsourcing attack, Generation attack.

Sybil attack : By creating multiple Sybil identities, a malicious miner can pretend to store (and get paid) more copies than they actually store, but can only store data once.

Outsourcing attacks : Malicious miners may rely on quickly obtaining data from other storage providers to commit to storing more data than they actually store.

Generating attacks : Malicious miners may claim to store large amounts of data, but they use a small program to efficiently generate this data on demand. If the program is smaller than the data that is allegedly stored, this increases the likelihood that a malicious miner will win a Filecoin block reward, which is proportional to the storage currently used by the miner.

Storage providers must convince their customers that they have stored the data for a fee. In effect, the storage provider will generate a proof of storage (PoS) for verification by the blockchain network (or the client itself).

In order to make the storage behavior publicly verifiable, Filecoin introduces two consensus algorithms: proof of replication (PoRep) and proof of space-time (PoSt).

Proof of replication (PoRep) is a novel proof of storage that allows the server (ie the prover P) to convince the user (ie the verifier V) that some data D has been copied into its own unique dedicated physical storage. Our scheme is an interactive protocol that proves that P: (a) promises to store n different copies (physically independent copies) of some data D, and then (b) convinces the verifier V that P does store each The copy goes through a challenge / response protocol. PoRep has improved PoR and PDP schemes to prevent Sybil attacks, outsourcing attacks and generative attacks. Proof of time and space: The proof-of-storage solution enables users to check if a storage provider is storing outsourced data when challenged. How do we use PoS schemes to prove that certain data is stored over a period of time. The natural answer to this question is to ask users to repeatedly (for example, every minute) send challenges to the storage provider. However, the communication complexity required for each interaction can become a bottleneck for systems such as Filecoin, in which storage providers need to submit their proofs to the blockchain network. In order to solve this problem, we introduce a new proof, the "space-time proof", in which the verifier can verify whether the prover stores its outsourced data over a period of time. 1. Intuition requires the prover to generate a sequential storage certificate (in our case, a duplicate certificate) as a way to determine the time.

2. Combine execution recursively to generate a short proof.

The prover receives a random challenge (c) from the verifier, and uses the output of the proof as another input within the specified number of iterations to generate a replication proof in turn. So make sure everything you do is reusable (as mentioned above). PoSt & PoRep uses zk-SNARKS to make the proof very short and easy to verify.

Smart contract

Smart contracts enable users of Filecoin to write stateful programs that can spend tokens, request storage / retrieval of data in the market, and verify storage proofs. Users can interact with smart contracts by sending transactions to the ledger to trigger transactions that call functions in the contract. We have extended the smart contract system to support Filecoin specific operations (e.g. market operations, proof verification).

Filecoin supports data storage-specific contracts, as well as more general smart contracts:

Document contract: We allow users to program the conditions under which they provide or provide storage services. There are several examples worth mentioning: (1) signing a contract with a miner: customers can pre-designate miners to provide services without participating in the market; (2) payment strategies: customers can design different reward strategies for miners, such as contracts It can be that over time, the wages of miners are getting higher and higher, and another contract can set the storage price notified by trusted Oracle; (3) Ticketing services: The contract can allow miners to store tokens and pay storage on behalf of their users / Retrieval fee, (4) More complicated operation: The client can create a contract that allows data to be updated.

Smart contracts: Users can associate programs with their transactions, just like in other systems (such as in Ethereum), they do not directly depend on the use of storage. We foresee applications such as decentralized naming systems, asset tracking, and crowdfunding platforms.

Cross-chain interaction

Bridges are tools designed to connect different blockchains. While it is still ongoing, we plan to support cross-chain interactions to introduce Filecoin storage to other blockchain-based platforms and to introduce the functionality of other platforms into Filecoin.

Filecoin on other platforms : Other blockchain systems, such as Bitcoin, Zcash, and especially Ethereum and Tezos, allow developers to write smart contracts; however, these platforms provide very few storage functions and are costly. We plan to provide a bridge to support storage and retrieval for these platforms. We noticed that IPFS has been used by multiple smart contracts (and protocol tokens) as a way to reference and distribute content. Adding support for Filecoin will enable these systems to guarantee storage of IPFS content in exchange for Filecoin tokens.

Other platforms in Filecoin : We plan to provide bridges to connect other blockchain services with Filecoin. For example, integration with Zcash will allow support to send requests to store data privately.

03 some other questions

Here we list some potential issues that have not been well discussed in the white paper. Scalability of the search market : The micropayment system (search market) incurs a lot of overhead on search protocols. To achieve retrieval speeds that match today's centralized infrastructure, the use of file currency and IPFS is required to create a dense state channel network. Censorship (illegal content) : As we have seen in the past in Napster and Pirate Bay, the lack of censorship will eventually lead to illegal content on the Internet, effectively bringing the dark web to the surface. A possible solution might be AI-based protocols that learn over time and automatically detect illegal content and take necessary measures. But in order for the network to become democratic, the protocol needs to be managed by the users themselves (thus introducing Byzantine behavior) to decide whether the content requires some action.

Therefore, the summary review system is a different problem for different people, and it requires a more personalized approach, rather than a centrally open approach. Filecoin's job is to create a market for data management, not to propose a review management policy. Therefore, this "personalized" review layer can be transferred to the application on a file currency basis.

04 Possible improvements to the Filecoin protocol

Here we list some possible improvements in the Filecoin protocol. Tahor-LAFS encryption scheme : When adding value, the client first encrypts it (using a symmetric key), then divides it into segments of manageable size, and then erase-encodes them for redundancy. So, for example, "2-of-3" erasure coding means that the segment is divided into 3 segments in total, but any 2 of them are enough to reconstruct the original segment (more information about ZFEC). These segments are then shared and stored on specific storage nodes. Storage nodes are shared data repositories; users do not rely on them to ensure data integrity or confidentiality.

Eventually, the encryption key and some information to help find the right storage node will be part of the " function string" (more information about the encoding process). The important point is that the feature string is both necessary and sufficient to retrieve values ​​from the Grid-if too many nodes become unavailable (or offline) and you can no longer retrieve enough shares, this operation will fail. There are write function, read function and verification function. A "less authoritative" feature can be used offline. That is, someone with a write function can turn it into a read function (without interacting with the server). Validation can confirm the existence and integrity of the value, but cannot decrypt the content. Both variable and immutable values ​​can be put into the Grid. Naturally, immutable values ​​have no ability to write at all. Awesome IPFS is a community-maintained and updated project, list of tools, or almost anything related to IPFS, which is great. To see more or add your information to the list, visit Awesome IPFS on GitHub.

About the Author: Vaibhav Saini is TowardsBlockchain (Massachusetts Institute of Technology Cambridge Innovation Center incubator startups) co-founder. He is a senior blockchain developer and has worked on multiple blockchain platforms such as Ethereum, Quorum, EOS, Nano, Hash Graph, IOTA, etc.