Google Privacy Intersection and Technical Analysis 1 – Application Scenario Analysis

In the process of multi-party collaboration, the issue of privacy has been concerned for a long time.

How to solve it? The cryptography community has proposed a number of solutions, such as full homomorphic encryption and multiparty computing.

These technologies are universal and can be placed wherever you need them. However, due to the nature of the technology itself, there are more or less disadvantages in any place.

Google’s Private Join and Compute, an example of how to extract fruit from academia. It is to solve the problem in a specific scenario.

Academia, industry and business are a biological chain. It seems to be a different thing to do, but it is a supply relationship with each other. This is what we often call "production, learning, and research." But we often mess up the role, do not do their own role, but have to play multiple roles.

Google is a company that is willing to adopt early adopters. I remember a few years ago, when the lattice encryption algorithm was fired in the theoretical world because of its anti-quantum properties, Google took the lead in deploying a ring-based LWE-based key exchange protocol on its own server to test its performance and prepare for the anti-quantum era. . I don't know why, it will be restored to the original agreement in a few months.

◆◆ Privacy Intersection and Technology ◆◆

With the rise of big data, machine learning is everywhere, and the privacy of data is of great concern. For example, Google developed the "password checker" earlier this year. Users can submit their own usernames and passwords for querying. The system compares the compromised password data sets to determine the security of the user's password. However, this does not reveal the user's privacy (including the user's password). The system uses a secure multi-party computing technology based on Private Set-Intersection. This cryptography technology ensures multi-party collaboration without revealing user privacy.

The privacy release of Google's release of the open source library is based on a technology called "Private Intersection-Sum." This technique can hide the intersection of two data sets, but can display the summary calculation results of their intersections, such as the number of elements in the statistical intersection.

This technique is very useful in some scenarios. For example, two companies, one is an advertiser and the other is a seller. If you want to know how many users are click through the advertiser's ads, go to the sales merchant to make the purchase, and the total amount of sales. So as long as the two companies provide two data sets, then find the intersection, and then the number of centralized users and the transaction amount. However, neither company will be willing to share the data set of the other party.

◆◆ What should I do? ◆◆

Look at the traditional solution first.

Traditional solutions rely on the law. The two companies first signed a legal agreement, which stipulated that they should share their own data sets and destroy the data sets after obtaining the aggregated data. If it is not destroyed, resulting in data leakage, it will be subject to legal sanctions.

However, cryptography can be done without relying on the law.

The intersections can be aggregated through privacy intersections and techniques to provide aggregated information about the intersections without revealing the data sets of both parties. Both are the best.

Privacy Intersection and Technology provides a function similar to a legal contract that allows you to know what you want to know without a legal contract, but you won't know more. All of this is guaranteed by a cryptographic protocol.

This is similar to the blockchain technology. (The blockchain provides a credit in an untrusted environment)

For example, in the above application scenario, the final summary information that may be obtained is: 10,000 customers purchased the product after seeing the advertisement, and the total purchase amount is 1 million. Nothing else is known. I know it, but I don’t know why.

The above scenario is an application scenario for calculating the conversion rate of an advertisement. Google's privacy blending and technology can calculate ad conversion rates while protecting privacy. Of course, it can also be applied to research in other fields such as health care, vehicle safety, public policy, diversity and inclusiveness.

◆◆ Security model and deficiencies ◆◆

In Google's open source library, its security model assumes "an honest and curious adversary." If the participants violate the agreement, they may know more, not just the prescribed information. Therefore, the agreement does not ensure that parties using the protocol use legitimate input or prevent arbitrary input. There are weaknesses that can lead to leaks.

For example, if a user consumes a large amount of money, it is easy to determine whether the user is in the intersection by collecting and summarizing the data.

The open source library is not officially produced by Google, so it is for learning use only. At present, Ge Mi Chain Network Technology Co., Ltd. has begun to study the project, and combined with homomorphic encryption to design a more reasonable solution. Companies interested in this technology can contact us and welcome cooperation and exchange.

Source: Gemi chain

Author: Dr. Zhiyuan