Filecoin's Ultimate Guide: Digging the Filecoin White Paper

Foreword: According to the plan Filecoin will test the online line on December 11th, and in March 2020 the main online line. As a decentralized storage project, Filecoin can impact and disrupt the centralized storage market by building a storage and retrieval market that is similar to the decentralized Airbnb of the storage market. Technically, what exactly is it? How to interpret? The author of this article is "Vasa", translated by "SIEN" of the "Blue Fox Notes" community.

Since the decentralization of innovation in 2009, there have been many promising projects that have changed our perception of the world and our way of life. One of them is Protocol Labs, which spawns amazing projects like IPFS.

IPFS lacks an incentive layer that can help it achieve large-scale adoption, and its ultimate goal is to replace HTTP.

This is the origin of Filecoin. Since its release, Filecoin has gained a lot of attention in the community. But because of its token economy (crowdfunding and investment strategy), it has lost a lot of supporters. Obviously, some people seem to be dissatisfied with its plans.

There is a lot of information about its technology and token economy on the web, some of which are confusing and overwhelming. Here we will solve the problem in one place and tell you what is Filecoin. So, now that you have a seat belt and a cup of coffee, this will be a long journey.

Before we delve into its core technology, let's analyze the current state of the file storage market.

Document storage market status

Today, Amazon S3 is the giant of file storage on the Internet. there are many reasons:

  • Very cheap: $0.023 per GB, 0.04 cents per 10,000 read requests;
  • very fast
  • Reliable: Indeed, it has had several big downtimes and has caused a large portion of the Internet to go offline. But it still has 99.9% uptime.
  • Highly scalable
  • Provide a great developer experience. It can be easily integrated with other Amazon service suites to implement extensions (eg CloudFront)

In this world, we already have such excellent cloud storage services, and any competitor must have better service, or at least reach the same level. In the case of small-scale situations, decentralized networks are not doing well.

However, if IPFS is adopted on a large scale (larger adoption than BitTorrent), then this may prove to be a better version of the Internet and will open up a whole new economy.

Technical overview

There are four parts:

  • How does the Filecoin network work?
  • In-depth study of the Filecoin protocol
  • Other issues (not covered in the white paper)
  • Possible improvements to the Filecoin protocol

How does the Filecoin network work?

There are three groups of users in Filecoin: customers, storage miners and search miners.

Customers pay for services that store and retrieve data. They can choose from the available service providers. If they want to store private data, they need to encrypt it before submitting it to the service provider.

The storage miner stores the customer's data for rewards. They decide for themselves how much space to provide for storage. After the customer and the storage miner reach an agreement, the miner is obliged to continue to provide proof of his stored data. Everyone can view the certificate and make sure the storage miner is reliable.

Retrieve miners to provide data to customers based on their requests. They can get data from customers or storage miners. Retrieve miners and customers using micropayments to exchange data and tokens: data is cut into pieces and customers pay a small amount of tokens for each piece. The search miner can also act as a storage miner.

Finally, the network represents all the complete nodes that verify the behavior of customers and miners. These nodes count available storage, check storage certificates, and fix data errors.

Some terms in this article:

Fragmentation : Fragmentation is a part of the data that a customer stores in a decentralized storage network. For example, data (possibly a cat's picture) can be intentionally sliced ​​into many pieces, and each piece can be stored into a different storage miner.

Sector : A sector is some disk space that a miner provides to the network (which can be thought of as a unique ID that is associated with a particular portion of the disk space of a particular storage miner). Miners store customer data fragments in their sectors and earn replacement coins for their services. In order to store debris, the storage miner must guarantee its sector to the network.

Allocation Table : An allocation table is a data structure that keeps track of fragments and their assigned sectors. Each block of the allocation table is updated on the ledger, and its Merkle root is stored in the latest block. In practice, the allocation table is used to maintain the state of the DSN in order to quickly find it when validating the proof.

Order : An order is a statement of a request or service. The customer submits a buying order to the market to request the service (which includes requesting data storage in the storage market, requesting data retrieval in the retrieval market), and the miner submits the selling order to provide the service.

Order Book : The order book is the order set. Filecoin maintains separate orders for the storage and retrieval markets.

Guarantee : Guaranteed to provide storage (especially sectors) to the network. Storage miners must submit their guarantees to the ledger (filecoin blockchain) to accept orders in the storage market. The guarantee includes a guarantee including the size of the sector and the pledge token deposited in it.

The user expresses his or her intention by placing an order. The customer submits a buy order and specifies the price they want to pay. The miner submits the order for the sale and specifies the price he wants to charge. When the buy and sell orders match, both the customer and the miner sign the trade order agreement and submit it to the blockchain.

The buying and selling orders together form the storage market (the market for file storage) and the search market (the market for document retrieval). We can delve into these markets and see how they work.

Storage market

It is a decentralized exchange operated by the network where all the sell and buy orders are stored on the blockchain for storing data on the Filecoin network.

The customer submits a purchase order to the storage order book using the PUT protocol. Customers must deposit tokens in the specified order and specify the number of copies they want to store. Customers can submit multiple orders or specify the number of copies in one order. Higher redundancy (Blue Fox Note: This refers to more copies) can lead to higher fault tolerance for storage failures.

The storage miner guarantees the storage of the network by depositing the pledge token, which guarantees the guaranteed transaction in the blockchain through Manage.PledgeSector. The pledge token will be pledged during the period when the storage miner is willing to provide the service, and the token will be returned if the miner generates the data storage certificate they promised.

If some storage proves to be unsuccessful, the storage miner loses a collateral token that is proportional to it.

Once the transactions are guaranteed to appear on the blockchain (and therefore in the allocation table), the miners can provide their storage in the storage market: they set the price and submit the order to the market order book via Put.AddOrders.

When the sell order matches the buy order (via Put.MatchOrders), the customer sends the data fragment to the miner.

When receiving the debris, the miner runs Put.ReceivePiece. After receiving the data, both the miner and the customer sign the transaction order and submit it to the blockchain (in the store market order book).

The storage miner's storage is divided into multiple sectors, each containing fragments that are assigned to the miners. The network continuously tracks the sectors of each storage miner through an allocation table. At this point (when the trade order agreement is signed), the network assigns the data to the miner and leaves a record on the allocation table.

When the storage miner sector is full, the sector is sealed. A seal is a slow and ordered operation that converts data in a sector into a copy that is the only physical copy of the data associated with the miner's public key. Sealing is a necessary operation during Proof-of-Replication.

When assigning data to storage miners, they must repeatedly generate proof of replication to ensure they are storing data. The certificate is posted to the blockchain and verified by the network.

All storage allocations are exposed to all participants in the network. For each block, the network checks for the existence of the required certificates for each assigned task, checks if they are valid, and takes corresponding measures:

  • If any proof is missing or invalid, the network will take part of the miner’s mortgage token to show the penalty;
  • If a large number of proofs are missing or invalid (defined by the system parameter Δfault), the network will consider the storage miner to be faulty, settle the order as a failure, and then re-introducing the data fragment as a new order to the market;
  • If each storage miner stores the defragmentation, the shard will be lost and the customer will receive a refund.

Search market

This is an out-of-chain peer-to-peer trading market where customers and search miners find each other. Once the customer and the miner agree to the price, they begin to trade the data and tokens on a small basis.

Let's see how it works.

The search miners declare the service by posting their sell orders on the network: they set the price and add a sell order to the market order book.

The customer submits a buy order to the search market order book. Retrieve the miner to check if his order matches the customer's corresponding buy order.

Once the order matches, the miner is retrieved to send the scrap to the customer (the miner sends partial data, and the customer sends a partial payment token, based on the debris). When the debris is received, the miner and the customer will sign the agreement order and submit it to the blockchain.

to sum up

The following image shows all the activities that take place on the network

Filecoin network execution case, grouped by participants and sorted by row

In-depth study of the Filecoin protocol

Filecoin introduces the concept of a decentralized storage network (DSN). DSN is a solution that describes a network of independent customers and storage providers. DSN aggregates storage provided by multiple independent storage providers and coordinates itself to provide customers with data storage and data retrieval services.

Coordination is decentralized and does not require trusted third parties: the secure operation of these systems is achieved through agreements that coordinate and validate the actions of the various parties. (Blue Fox Note: Unlike Airbnb's scheduling through centralized companies, it does not need to trust third parties)

DSNs can be coordinated using different strategies based on system requirements, including Byzantine agreements, gossip agreements, or CRDTs.

The DSN involves the implementation of three functions: deposit, get, and manage. "Save in" allows the customer to store data under a unique identifier. "Acquire" allows customers to retrieve data using a unique identifier. "Management" is the management of a decentralized storage market network that measures the space available for lease, audits storage providers, and repairs possible data failures. Management protocols are typically run by storage providers and customers or network reviewers.

The DSN has several attributes. The first two are required.

  • Data integrity means that customers can always receive the same data they have deposited, and storage providers can't give customers peace of mind if they provide erroneous data.
  • Searchability means that customers can retrieve their data over time.

Other properties of the DSN:

  • Public verifiability, which allows anyone in the network to verify that data is stored without knowing the data.
  • Auditability, which allows verification of whether data is stored in the correct time period.
  • Incentive compatibility, it is designed to reward good service providers and punish poor providers.
  • Confidentiality: Customers who want to privately store their data must encrypt their data before committing to the network.

Fault tolerance

The DSN should be fault tolerant of two types of possible errors:

  • Management fault tolerance

This is Byzantine fault tolerance caused by participants (storage providers, customers & auditors) in the management agreement. The DSN mechanism relies on the fault tolerance of its management protocol. Fault tolerance assumptions that violate management errors can compromise system activity and security.

For example, consider the DSB mechanism, where the management protocol requires a Byzantine protocol (because the node can lie to the auditor) to audit the storage provider (if they store all the data that should be stored according to the agreed conditions).

In such an agreement, the network collects proof of storage from the storage provider and runs a Byzantine protocol to verify the validity of these certificates. If the Byzantine protocol can tolerate up to f errors in a total of n nodes, then our DSN can tolerate f < n/2 error nodes. In the event of a violation of these assumptions, the audit will be affected, causing the system to become useless.

  • Storage fault tolerance

Storage fault tolerance is also Byzantine fault tolerance, which prevents customers from retrieving data: that is, storage miners lose data fragments and retrieve miners to stop providing services. If you store its "storage" data in m independent storage providers (n total) and it can tolerate up to f Byzantine providers, then the successful "save" execution is (f,m)- Fault tolerance. The parameters f and m depend on the implementation of the protocol; the protocol designer can fix f and m or let the user make a choice and extend Put(data) to Put(data,f,m).

If the number of erroneous storage providers is less than f, the "acquisition" of the stored data is performed successfully. For example, consider a simple scenario in which the "save" protocol is designed to store all data for each storage provider. In this scheme, m = n and f = m-1.

So, will always be f=m-1? no. Some schemes can be designed to use erasure codes, where each storage provider stores a particular portion of data such that x is needed to retrieve data among a total of m storage providers; in this case, f = mx.

Consensus algorithm

Filecoin's DSN protocol can be implemented on any consensus protocol, allowing verification of Filecoin's certification. The workload proof mechanism usually needs to solve the puzzle, and its answer cannot be reused or requires a lot of calculations to find. (Blue Fox notes: that is, no memory, start again every time)

  • Non-reusable work

Most unlicensed blockchains require miners to solve tricky computational puzzles, such as reverse hash functions. Often, the answers to these puzzles are useless and have no other intrinsic value, except to provide security for the network. Some blockchains, such as Ethereum (executing smart contract logic) and Primecoin (finding new prime numbers), attempt to use some of the computational power to do useful work.

  • Wasted work

Solving puzzles can be very expensive in terms of machine and energy consumption costs, especially if these puzzles rely solely on computing power. When mining algorithms are embarrassingly parallel, the main factor in solving puzzles is computing power.

  • Try to reduce waste

Ideally, most of the network's resources should be spent on useful work. Some efforts are also trying to require miners to use more energy-efficient solutions. For example, Spacemint requires miners to use dedicated disk space instead of computing. Despite being more energy efficient, these disks are still "wasted" because they are full of random data.

Other efforts include the use of traditional Poin-based Byzantine protocols to replace puzzles. Among them, the token pledge stakeholders in the system vote for the next block according to their proportion of tokens.

The work of Filecoin miners is not a wasteful PoW proof calculation, they generate PoST (Proof-of-Spacetime) to participate in the consensus.

  • Useful work

We believe that if the calculation results are valuable to the network in addition to the safety of the protected blockchain, then the work done by the miners in the consensus agreement is useful.

Filecoin proposes a useful work consensus protocol in which the probability of a network electing a miner generating a new block is proportional to the proportion of their current storage space in the network. The Filecoin protocol is designed so that miners prefer to invest in storage rather than investing in the power of parallel mining. The miner provides storage and reuses calculations to prove that the data is being stored to participate in the consensus.

Modeling mining ability

  • Power fault tolerance

Power fault tolerance is an abstract form that reconstructs Byzantine fault tolerance based on the influence of participants on the outcome of the agreement.

Each participant controls some power, where n is the total power in the network, and f is part of the power, controlled by the wrong party or malicious agent.

  • Power in Filecoin

In Filecoin, the power p of miner M at time t is the sum of the storage allocations of M. The influence of M is the proportion of M's power to the total power of the entire network. In Filecoin, power has the following properties:

  • public

The total amount of storage currently in use in the network is public. By reading the blockchain, anyone can calculate the storage allocation for each miner, so anyone can calculate the power of each miner and the total power in the network at any point in time. (Blue Fox Note: The power here, similar to the influence in its network, can be used directly to miners who elect to generate blocks.)

  • Publicly verifiable

For each storage allocation, the miner requires a Proof-of-Spacetime certificate to prove the service it is providing. By reading the blockchain, anyone can verify that a miner claims the correct authority.

  • variable

At any point in time, miners can add new storage to the network by committing new sectors and filling the sectors. In this way, miners can change the proportion of their power over time.

We also need a mechanism to prevent three types of attacks, and malicious miners can use their unrealistic storage to get rewards: Sybil Attack, Outsourcing Attack, Generation Attack .

  • Witch attack

By creating multiple witch identities, a malicious miner can pretend that it stores more copies (and thus get paid for it) than it actually does, but actually only stores the data once.

  • Outsourcing attack

By relying on fast access to data from other storage providers, malicious miners promise to store data that exceeds their actual storage capacity.

  • Generate attack

Malicious miners can claim to store large amounts of data, but they use a small program to efficiently generate this data on demand. If the program is smaller than the claimed stored data, this will increase the probability that the malicious miner will win the block reward in Filecoin, which is proportional to the storage currently used by the miner.

Storage providers must convince their customers that they have stored their paid storage data. In effect, the storage provider generates a proof of storage (PoS) for verification by the blockchain network or the customer itself.

To make storage behavior publicly verifiable, Filecoin introduces two consensus algorithms: Proof-of-Replication (PoRep) and Proof-of-Spacetime (PoSt), which is proof of replication and space-time proof.

Proof of Copy (PoRep) is a novel proof of storage that allows the server (the prover P) to convince the user (verifier V) that certain data D has been copied into its own unique private physical store.

Our mechanism is an interactive protocol, where the proof party P:

(a) Commitment to store n different copies of some data D (physically independent copies)

(b) Convince the verifier V that P does store each copy through the challenge/response protocol. PoRep improves the PoR and PDP mechanisms to prevent witch attacks, outsource attacks, and generate attacks.

Proof-of-Spacetime: The PoS (Storage Proof) mechanism allows the user to check if the storage provider is storing outsourced data during the challenge. How do we use the PoS (storage proof) mechanism to prove that some data is stored for a period of time?

The natural answer to this question is to require the user to send a challenge to the storage provider repeatedly (eg, every minute). However, the communication complexity required for each interaction may be a bottleneck for a system like Filecoin, where the storage provider needs to submit its credentials to the blockchain.

To solve this problem, we introduce a new proof Proof-of-Spacetime, which is a proof of time and space, in which the verifier can check whether the prover stores her/his outsourced data for a period of time.

Intuition is to ask for a prover

  • Generate an orderly proof of storage (copying proof in Filecoin) as a way to determine time.
  • Recursive combination execution to generate short proofs
PoSt icon

The prover receives the random challenge (c) from the verifier and uses the output of the proof as the input of the other for the specified number of iterations t, which in turn generates a copy proof. Therefore, ensure that all work done is reusable (as described above).

PoSt & PoRep uses zk-SNARKS, which proves to be simple and easy to verify.

Smart contract

Smart contracts enable Filecoin users to write state-of-the-art programs that can cost tokens, request storage/retrieval of data in the market, and verify storage certificates. Users can interact with smart contracts by sending a transaction to a ledger that can trigger a function call in the contract. We have expanded our smart contract system to support Filecoin's specific operations, such as market operations and proof verification.

Filecoin supports specific data storage contracts as well as more general smart contracts.

  • Document contract

We allow users to program the conditions they provide for storage services. There are a few examples worth mentioning:

  • Signing with miners

Customers can specify miners to provide services in advance without having to participate in the market.

  • Payment strategy

Customers can design different reward strategies for miners, for example, contracts can be set to pay more and more fees to miners over time, or contracts can set the storage price notified by a trusted oracle.

  • Ticket service

The contract may allow the miner to store tokens to pay for storage/retrieval on behalf of their users.

  • More complicated operations

Customers can create contracts that allow data upgrades.

  • Smart contract

Users can associate programs with their transactions, just like other systems (such as Ethereum), which are not directly dependent on storage usage. We can anticipate these applications: DNS (Blue Fox Note: Decentralized Domain Name System), asset tracking, and crowdfunding platforms.

Cross-chain interaction

Bridges are tools designed to connect different blockchains. We plan to support cross-chain interactions to bring Filecoin storage to other blockchain-based platforms and bring the functionality of other platforms to Filecoin.

  • Filecoin on other platforms: Other blockchain systems, such as Bitcoin, Zcash, especially Ethereum and Tezos, allow developers to write smart contracts; however, these platforms offer very little storage and are extremely costly.

We plan to provide a bridge to provide storage and retrieval support for these platforms. We note that IPFS has been used by several smart contracts as a way to reference and distribute content. Adding support for Filecoin will allow these systems to guarantee the storage of IPFS content in exchange for Filecoin tokens.

  • Other platforms in Filecoin: We plan to provide a bridge to connect other blockchain services with Filecoin. For example, integration with zcash will allow requests to store private data.
  • Some other questions

    Here we list some potential issues that are not fully discussed in the white paper:

    • Search market scalability

    The micropayment system (search market) has generated a lot of overhead in the search protocol. In order to achieve the speed of retrieval that matches today's centralized infrastructure, Filecoin and IPFS require a large number of adoptions to create dense stateful channel networks. (Blue Fox Note: If the search market is large, then its small payment requires high throughput support)

    • Censorship system (illegal content)

    As we have seen in Napster and the Priate Bay in the past, the lack of censorship will eventually lead to illegal content on the network, bringing the dark screen to the bright side. A possible solution is that an AI-driven protocol can learn over time and automatically detect illegal content and take the necessary actions.

    But in order for the network to become a co-governing network, the agreement needs to be managed by the users themselves (introducing Byzantine behavior) to determine whether the content needs to be taken.

    • Open source?

    Inferred from the above issues, it is also possible to protect the network from illegal content at the beginning, which may be managed by Protocol Labs. This could mean a closed piece of software that can be used for free, but not publicly available for modification.

    But even if they do this, it may not help. Because people can run an uncensored version on it (by modifying the original software).

    • Token fluctuation

    Considering the fact that Filecoin will be listed on the exchange, in this market, how feasible is the micropayment system (storage and retrieval will design small payments)? (Blue Fox Note: The author's meaning is that Filecoin tokens are volatile in the market, it is difficult to use as a small payment currency, and it is necessary to stabilize the currency).

    From the current market maturity and decentralization areas, tokens are more like investment tools than utility tools. This is one of the biggest reasons why we have not seen too many token-based projects being adopted too much today.

    The possibility of improved Filecoin protocol

    Here we list possible improvements in the Filecoin protocol.

    • Tahor-LAFS encryption scheme

    When adding value, the customer first encrypts it (using a symmetric key) and then divides it into manageable sizes, then Erasure Code for redundancy. (Blue Fox Note: EC code, also known as erasure code, it can add n copies of raw data, m data, and can be restored to the original data by any n data in n + m shares).

    Thus, for example, "2 out of 3" erasure codes means a total of 3 copies, any 2 of which are sufficient to reconstruct the original data. These shares can be shared and stored on a specific storage node. Storage nodes are shared databases; users do not rely on them to ensure data integrity or confidentiality.

    Eventually, the encryption key and some information that helps discover the correct storage node become part of the "function string." Importantly, the function string is both necessary and sufficient to retrieve values ​​from the Grid. If too many nodes become unavailable or offline, you can't get enough shares to retrieve, in which case it will fail.

    • Write, read, and verify capabilities

    A "less authoritative" capability can be used offline, meaning that people with write capabilities can convert it to read capability (without having to interact with the server).

    Verification capabilities confirm the existence and integrity of values, but they cannot decrypt content. Both variable and immutable values ​​can be placed in the Grid. Of course, immutable values ​​are not possible to have write functionality.

    ——

    Risk Warning: All articles in Blue Fox Notes can not be used as investment suggestions or recommendations. Investment is risky. Investment should consider individual risk tolerance. It is recommended to conduct in-depth inspections of the project and carefully make your own investment decisions.