The "Millionaire" problem raised by Yao Zhizhi has been cracked. What kind of ghost is the multi-party safe computing MPC?

In an increasing number of concerns about data privacy, the government began to act to develop a data use compliance bill. On the other hand, the protection of data creates a contradiction: a large amount of data cannot be combined and calculated because it needs legal protection .

Conversely, if the world's genetic data can be combined and analyzed, humans may be able to find the antidote to cancer faster. This makes us boldly think about whether there is a way to protect data and use it effectively.

In the 1980s, Academician Yao Zhizhi raised the issue of "Millionaires": two millionaires were on the streets, they all wanted to show off the rich, more than anyone else, but because of privacy, they did not want to let the other party know how much wealth they have. How to let them know who is richer between them without using a third party?

Under this classic problem, the cryptography branch of Multiparty Computation (MPC) was born. MPC technology enables data to be jointly calculated by combining multiple data without leaking, and the result of plain text calculation is obtained, and finally the ownership of data and the separation of data usage rights are realized .

Today we will introduce the background and application scenarios of MPC. Before we get started, let's take a look at what data means for us today.

We are living in the era of data protection

Personal privacy and data privacy

The EU General Data Protection Regulations (GDPR), which came into effect on May 25, 2018, have attracted worldwide attention. This data protection law, known as "the strictest in history", will have a profound impact on the technology industry and personal life because it is The first rule in human history that defines the ownership of personal data, which legally stipulates that personal data is an individual-owned data asset .

This law will guarantee people more control over personal data. For example, social networking companies must obtain your consent before using your data. The law has undoubtedly affected the tech giants such as Facebook, because these companies are counting on user data to make money.

Data protection in business interests

Data is the core value and important asset of modern business and individuals. Data is reshaping all aspects of human life, including finance, advertising, retail, healthcare, logistics, energy and industry.

With the advent of the artificial intelligence era, data has become the most important competitive resource in modern business activities. Giant companies have taken advantage of the monopoly of data to build industry barriers.

For example, taxi software companies have data on people's daily travel, including the starting and ending points of passengers. They can use this data to optimize their products and business , and even use the data to make some predictions, such as a real estate price index or a government. Road optimization program.

The fusion of data can increase its value, and the cross-use of data can produce synergies . However, because of the reproducibility and ease of dissemination of data itself, sharing and collaborative development of data assets are severely constrained by the inability to track usage.

In this case, what is the value of data protection?

How does protected data generate value?

Although personal protection of privacy and data protection of commercial companies are legitimate interests, they have created a data island. Small and medium-sized companies with data sources cannot safely share or cash out data.

For data users, big data companies, developers, and scientists have limited access to limited data sets and are expensive. Cooperation with large data sources such as operators requires developers to deploy models on the server of the data source. The model algorithms are risky and inefficient.

On the one hand, data needs to be protected and isolated; on the other hand, the value of data to human society lies in the joint calculation and analysis . Is this an unbreakable contradiction?

Ideally, we can entrust a secure, trusted third party to calculate the data. However, in reality, either the data is too important and there is no third party, or the third party will have too many rights because of the data , such as credit card companies and e-commerce companies. If Party B can have the other party's data, it will be very scary.

To solve this problem, I returned to the "Multi-party Secure Computing Technology" (MPC) originally mentioned in this article. Through MPC , we can achieve joint multi-party privacy data, and without a trusted third party, calculate and get the results of the analysis without worrying about their data being leaked .

MPC is a set of protocols based on modern cryptography. This tool set consists of many components.

Simply put, this toolkit includes Zero Knowledge Proof (ZKP), Probabilistic Encryption, Information Theory Message Authentication (MAC), various distributed communication protocols and casual transfer (OT), and the most important underlying technologies. : Secret sharing and secret fragmentation are the basis for implementing secure multiparty computing .

In particular, in the case of passive opponents, Shamir's polynomial secret sharing is the cornerstone of multiparty computing, while the verifiable secret sharing of Chor, Goldwasser, Micali, and Awerbuch plays a similar role in the Byzantine opponent problem.

Over the past 35 years, MPC algorithms and engineering have been substantially improved, and the level of performance that does not require consideration of protocol performance as a major obstacle to use has been achieved.

The MPC community uses a de facto benchmark for performing AES encryption between two participants, one with an encrypted message and the other with a key . AES includes a variety of arithmetic and Boolean operators, making it ideal for calculations directly in hardware and MPC. In the past decade, safety calculations have increased by 4-5 orders of magnitude.

For comparison purposes and considering the effects of Moore's Law, the figure below shows the performance of native AES calculations over the same time period.

Performance improvement through MPC

Next, let's take a more clear understanding of the implementation of MPC through an example. Please see the picture below:

According to the above figure, let's assume that our goal is to jointly calculate the sum of the secret data of all parties, which can be achieved through secret sharing.

First , each party randomly divides its secret number into three parts and shares two of them separately with the other parts.

Each party then locally shares all three from the other peers and themselves to disclose the final result, and each party's local sum is exposed to the peers (Peers).

Finally , either party can know the end result by summing all three public local sums.

The key to secret sharing is that by knowing secret sharing, one party does not know about private data . For example, in a three-party calculation by revealing secret sharing 5, the secret data may be a random number such as 10, 79, or -11. Even if you know secret sharing, the party can guess private data instead of guessing random numbers.

Since private data is not displayed throughout the process, secret sharing calculations can protect privacy. The opponent cannot find the secret information.

Officially because of this feature, MPC has received more and more attention in the real world and has been adopted by more fields. For example, the following three types of scenarios.

Joint credit

MPC can enable financial and insurance companies to conduct joint analysis of risk indicators such as customer debt ratio. At present, various financial, insurance, and asset management organizations only have some data on customers, which leads to risk assessment errors. The joint analysis does not disclose the data of each participant, and has an overall assessment of the customer's risk, which can effectively reduce the default risk in the scenario of long-term borrowing.

Multi-dimensional health analysis

MPC-enabled medical institutions analyze the patient's medical records and intelligent hardware biodata in multiple hospitals to provide a more accurate diagnosis of patients without leaking data from patients, hospitals, and smart hardware manufacturers. At the same time, joint data analysis for medical institutions allows drug research institutions to have a more comprehensive understanding of specific diseases in a particular region.

Joint precision marketing

MPC empowers merchants to analyze the multi-dimensional information of potential customers to more accurately advertise. Advertisers can model customer purchase intentions from more data dimensions, and data sources do not reveal personal privacy data.

MPC and the real world

The above are a few examples that are too simple, and the real world situation is more complicated than this.

For example, MPC for adding is easy because the addition operation can be calculated locally on the secret share. However, multiplication is more difficult because it can't be calculated on a local share alone without the help of other tools . However, with the use of homomorphic encryption (Somewhat Homomorphic Encryption, SHE), a more complex MPC protocol can achieve secure multiplication.

The good news is that any function can be converted to a combination of addition and multiplication, so MPC based on secret sharing can perform any type of general-purpose calculations, just like modern PCs.

Another example is Actively Malicious . Active malice is defined as a node that will deviate from the protocol, as opposed to passive malice, where a node attempts to learn other peer secret data but always follows the protocol.

In the above secret sharing example, although no node can learn other private data, the malicious node can publish the wrong local shared sum, so that all other peers learn the wrong end result.

There are various ways to discover this malicious behavior and even prevent it from happening. The most popular type is called Message Authentication Code (MAC) , where each operation is associated with a number to verify its correctness. Once the node issues an error message, this error will be easily verified by other nodes .

It is extremely difficult to forge a erroneous data that can be verified. This is so difficult that the cost of fraud is greater than the benefit of the data .

Comparison of MPC and other implementation technologies

In addition to MPC, there are technologies that enable similar functions, including homomorphic encryption, zero-knowledge proof, and trusted execution environments.

However, these technologies have certain deficiencies compared with MPC. Let's take a look at them one by one.

Homomorphic encryption

Homomorphic Encryption (HE) is an encrypted form that allows ciphertext calculations to be generated to produce encrypted results. The encrypted results match the results of the operation as if they were executed on plaintext.

With such tools, storage and/or computing can be outsourced without compromising data privacy. Since HE allows the calculation of encrypted data while maintaining encryption, it has been widely studied as a candidate for secure computing.

However, even the most advanced homomorphic encryption scheme cannot provide an effective computational depth arithmetic circuit.

First, "bootstrapping" adds extra cost to already very heavy processes. At present, the practical application of HE mainly focuses on the optimization of the evaluation function, which avoids expensive processes by limiting the circuit multiplication depth.

In addition, depending on the solution and target security level, using the HE scheme will result in a huge ciphertext extension (from 2,000 to 500,000 or even 1,000,000 times the overhead). This is because the homomorphic scheme must be probabilistic to ensure semantic security and a specific underlying mathematical structure . As we have seen, the SHE scheme is the most promising in the HE variant and will be used in the secure computing programs we will mention later.

Zero knowledge proof

Zero Knowledge Proof (ZKP) is one such method: one party (Peggy) can prove to the other party (Victor) that she knows the value x without conveying any information except that she knows the value x (reads well around) Mouth) .

Recently, many blockchain projects are trying to use ZKP as a trusted offline computing solution. In these protocols, the arithmetic module is compiled into a circuit and transmitted to a third-party execution environment where the data will be used to evaluate the data .

However, similar to the FHE solution, ZKP cannot prove the actual amount of work done in a remote environment. In addition, ZKP cannot guarantee that the calculation is obtained from the hacker of the malicious party.

Trusted execution environment

The Trusted Execution Environment (TEE) is a tamper-resistant processing environment that runs on an anti-separation kernel. The ideal TEE guarantees the authenticity of the execution code, runtime state, the integrity of registers, memory and sensitive I/O, and the confidentiality of code, data, and runtime state stored in persistent memory . In addition, it should be able to provide remote proof of its trustworthiness to third parties.

Hardware manufacturers are eager to come up with their own trusted hardware solutions, but lack common standards for different platforms. The most prominent process unit designers have embedded their hardware security modules into their products (such as Intel Software Protection Extensions (SGX), ARM TrustZone, AMD Secure Encryption Virtualization (SEV) and NVIDIA Trusted Small Kernel (TLK).

However, some recent hacking attacks have proven that SGX is not capable of carrying protocol-level data security. In fact, this seemingly secure protocol is not secure.

Remote authentication does not prevent a malicious cloud service provider from faithfully responding to remote proof queries first, and then emulating remote authentication protocols (such as KeyGen and CSR) outside of the enclave.

In other words, SGX is not an agreement designed for Universal Composition, where the true behavior of the protocol and the ideal world definition (function) are computationally indistinguishable from the environment controlled by each opponent. Simply put, with TEE, you can trust the hardware, but you can't trust the people who control the hardware . Therefore, SGX is best used in a licensed network where all nodes are pre-approved and the environment is certified and trusted .

After a basic understanding of homomorphic encryption, zero-knowledge proof, and trusted execution environments, we can conclude that while some technologies have the advantage of computational efficiency, they do not provide a priorless network (permissionless network). ) the required security and functionality .

Specifically, a good technical solution needs to be able to verify the security, correctness and privacy of the calculation:

  • Efficiency: efficiency of calculation
  • Privacy Retention: Refers here to the functional computing power on the data set without revealing details to any node. This is the core of secure computing.
  • Proof of correctness: Prove that the calculation work actually uses the specified function. In an untrusted network, it is important to prove that a function is executed in the right way.
  • Proof of security: Prove that the calculations are actually carried out in a secure environment.

Then from the comparison between MPC and the above 3 technologies, we can get the following conclusions:

Finally, MPC is a large field of cryptography that combines many concepts and tools in cryptography and distributed systems . It is an evolving and active area of ​​research.

I hope this article will bring you information on what is MPC and why privacy calculations are possible. In future articles, we will bring more and more in-depth topics on cryptography. Please look forward to your friends!

* About the author:

Zhang Lei, ARPA co-founder & chief scientist, holds a master's degree in financial engineering from George Washington University in the United States. He has 10 years of deep learning, AI algorithm and risk modeling experience, and has deep research on cryptography. He has worked as a senior data scientist at CircleUp, the largest equity crowdfunding company in Silicon Valley. He previously worked for large financial institutions such as the World Bank, AIG, and PineBridge, and is proficient in artificial intelligence and quantitative strategies. At the same time, Zhang Lei founded Stardust Data in 2017 to provide data empowerment for the AI ​​industry.

ARPA is a company focused on the development of secure cryptographic computing and blockchain underlying technologies. Its core product is a secure multi-party computing-based privacy computing platform with a full blockchain + secure computing solution. At the same time, ARPA, as a member of the industry, participated in the drafting of the safe multi-party computing standard to be issued by the China Information and Communication Research Institute of the Ministry of Industry and Information Technology.

Author | Zhang Lei

Editor | Aholiab

Produced | CSDN, ARPA

(Source: Blockchain Base Camp)