Interview | Weizhong Bank Blockchain Security Scientist: Strictly adhere to the privacy data red line, business innovation compliance

On March 24th, "Chain Talk" was fortunate to invite Dr. Yan Qiang, the blockchain security scientist of Weizhong Bank, as a guest chain live broadcast room. Let's discuss the topic of "strictly adhere to the red line of privacy data and business innovation compliance."

Yan Qiang:

○ PhD in SMU Information Security, Winner of Best Paper Award at Top International Conference on Information Security;

The following is the content of AMA activities:

Moderator: In recent years, represented by the European Union's "General Data Protection Program" (GDPR), governments around the world have continued to refine the protection of privacy legislation and have continued to strengthen penalties. Earlier this month, the official release of the new national standard "Information Security Technology Personal Information Security Specifications" in 2020 sparked a new round of heated discussions in the industry. What changes in the new standard deserve our attention?

Yan Qiang: Thank you host. The introduction of this national standard supports the Cyber ​​Security Law from the perspective of technical standards. As far as personal privacy is concerned, it means that it has been increasingly protected at the national level.

There are several changes to this specification that deserve our careful study:

1. Make clear provisions on the collection, storage, and use of personal information, and provide that the subject of personal information has the power to query, correct, delete, withdraw authorization, cancel accounts, and obtain copies of personal information;

2. The collection and storage of biometric information (including facial features, fingerprints, irises, etc.) have all put forward stricter and more detailed requirements: "Individual notification" is required during the collection process, and "explicit consent" of users is obtained;

It is required to store biometric information and personally identifiable information separately during the storage process. Only the summary information is stored or the original image is deleted after the authentication function is completed.

For privacy protection, there are different punishment standards at home and abroad. Here we have sorted out some domestic and foreign punishment basis for your reference.

Moderator: The increasingly fine-grained privacy supervision is undoubtedly a happy event for the general public such as small assistants. So for data-driven business innovation, how can business innovation effectively meet the strict requirements of privacy compliance?

Yan Qiang: Due to differences in corporate development stages and regional market laws and regulations, before discussing effective responses to privacy risks, the first task is to clarify the goals of privacy compliance.

I don't know if everyone in this room feels the same. With the deepening of the legislation, in recent years, the dispute over "what data can be considered as private data" has been decreasing.

Although each region's laws and regulations have different definitions of privacy data, they provide specific types of definitions and sensitivity classifications, such as KYC identity data and financial data at the most sensitive level.

The benefit of this is that we can now avoid the problem of past ambiguity of rights and clarify the goals of privacy compliance.

For businesses operating in a region, the goals of privacy compliance can be summarized as:

Protect customers' legal rights by protecting the privacy data defined in current regional market laws and regulations, and providing corresponding features in product design.

Moderator: How to interpret the goal of corporate privacy compliance?

Yan Qiang: In short, we can extract the following two sets of keywords from it:

① Data content protection;

② Data rights protection.

These two sets of keywords also represent two main lines of privacy compliance. Around these two main lines, we can sort out nine dimensions to identify privacy compliance risks.

The compliance requirements in these nine dimensions are like the nine-level barrier to privacy compliance. For ordinary companies, focusing on the most basic dimensions can meet compliance needs.

However, for companies in a highly regulated industry, such as fintech companies, or companies operating multinational information services, such as online social networks and cross-border e-commerce, they may need to meet compliance requirements in all dimensions.

Moderator: Can you explain in detail the compliance requirements of the nine dimensions you mentioned and their specific technical implementation means?

Yan Qiang: OK, let ’s explain the specific compliance requirements and related technologies of each dimension one by one.

First dimension: interface data hiding;

Hidden data in the user interface, so that when customers use the product, their privacy data cannot be seen by malicious third parties in nearby locations.

As one of the most easily satisfied requirements in data content protection compliance, direct interface rendering operations, such as simple display coding and data truncation, are all effective technical means.

However, it is also often one of the most vulnerable to privacy incidents due to neglect. Especially under the premise of displaying multiple sensitive data fields at the same time, if the hiding technology is not used properly, it may be equivalent to having no hiding effect.

Moderator: Interrupt, does the hidden account balance function we often see belong to this dimension? There are similar functions for mobile phone number coding and ID card coding.

Yan Qiang: Yes, when it comes to ID card coding, I especially want to mention that the conventional coding method is very easy to use.

We can look at the error example here.

The first 14 digits of the 18-digit ID number are hidden here, and the first impression seems to be very private.

But if we can get other additional information, we can easily recover the hidden numbers.

Especially for public figures or users whose personal information has been leaked due to previous data theft events, it is not difficult to find this information.

At the same time, displaying the last few digits also provides additional information, such as judging the user's gender based on whether the penultimate digit is an odd number.

The last tip is that when opening an account in the past, the default password was often set to the last 6 digits of the ID card number, which is also risky.

Moderator: Assuming we have completed the complete interface data hiding, what do we need to do next?

Yan Qiang: OK, it depends on the compliance requirements of the next dimension.

Second dimension: network data hiding;

Hiding data in the network dimension makes it impossible for malicious data to intercept clear data during transmission of private data.

Classic transport layer security TLS / SSL series protocols can meet this demand.

However, it should be noted that the security of these types of protocols depends on the normal operation of digital certificate services. Once the service is attacked, it may cause certificate fraud, certificate expiration, etc., and ultimately affect the security and availability of existing services.

It cannot be considered that as long as TLS / SSL is used, data transmission is absolutely hidden, and it is very important to properly check the validity of the certificate.

There are many papers from top academic conferences that are also discussing this practical issue. A correct design does not mean that the project is also implemented correctly. In the end, an unexpected privacy accident is likely to occur.

The third dimension is more challenging compliance requirements.

The third dimension: hidden data in the domain;

Within the same computing domain, such as a cloud computing environment that is completely controlled and deployed by the enterprise, any plain text of the private data does not leave the secure isolation environment during the calculation and storage process, preventing the internal ghosts of the enterprise from performing unauthorized private data access .

Private data can only be decrypted into clear text in a secure isolated computing environment. Outside the secure isolated computing environment, only ciphertext operations can be performed and stored in media in the form of ciphertext. Trusted hardware or software isolation is used here to build a secure isolated computing environment. They rely on different security assumptions and need to be selected based on the characteristics of the business.

Moderator: There are usually many collaborations within the enterprise, and data interaction is inevitable. How can we achieve compliance goals in this dimension?

Yan Qiang: There are generally three types of causes for the risk of internal personnel in an enterprise:

The first type is that the insider's computer equipment is controlled by an external attacker and becomes a broiler who intrudes;

The second type is malicious attempts by insiders themselves;

The third category is the internal staff's mistakes.

No matter which one, we can use technology to minimize or even prevent corresponding risks. The key here is to minimize the empowerment of the people in the domain and use the data isolation scheme mentioned above to prevent internal personnel from operating on private data outside the prescribed process, resulting in unnecessary privacy risks.

Moderator: I understand that the construction of infrastructure related to data processes based on data isolation and access control is critical to ensuring the privacy of data within the enterprise. So what role can the cryptography technology that everyone has been talking about play in privacy protection?

Yan Qiang: Good question. The compliance requirements of privacy protection itself cover a wide range. Cryptography is one of the very important core technology areas, but it is not the only core technology that needs to be understood.

According to the requirements of different scenarios, we need to choose different technologies. Let's take a look at the compliance requirements of the next dimension. The cryptography technology mentioned earlier can be used here.

Fourth dimension: cross-domain computing data hiding;

The plaintext of the private data only appears in the same computing domain. When performing joint calculations with other computing domains, the controllers of other computing domains cannot directly access or indirectly infer the plaintext of the private data to prevent other partners from obtaining authorization from the cooperation agreement. Sensitive privacy data.

As the most challenging requirement in data content protection compliance, it is particularly important for highly sensitive data businesses such as medical data and financial data. Failure to meet compliance requirements usually means that the business is unavailable or faces huge fines. Moreover, there may be two-way penalties, that is, companies will not only be penalized for leaking private data due to their own program vulnerabilities, but also be penalized for using unauthorized company's program vulnerabilities to obtain unauthorized sensitive private data.

Common technical solutions that can be used in this dimension include data desensitization, secure multiparty computing, data outsourcing computing, zero-knowledge proofs, and more. In specific scenarios involving machine learning, emerging technologies such as federated computing can provide more effective solutions.

For specific types of requirements and which types of technology to choose, you can refer to the decision chart below.

After compliance with data content protection, we can look at data rights control, which is the content entered in the fifth dimension.

Fifth dimension: data access announcement;

Data access notice refers to the details of the privacy data circulation life cycle, such as letting customers know what private data the current business collects, why it is needed, how it will be used, how it will be stored, and how long it will be kept. As the most basic requirement in data rights protection compliance, it guarantees the right of customers to know.

The difficulty in meeting this demand is how to make customers understand the obscure technical language and the consequences of related privacy risks, so as to prevent relevant regulatory agencies from confusing the customer's understanding as a reason for judging corporate violations.

Research on user experience and human-computer interaction technology is the key to handling this demand.

People with different backgrounds focus differently on the same issue.

Developers may be more concerned about whether the language expression is machine-readable, can the compiler pass, and the unit tests and functional tests are incorrect?

The designer may care more about whether the language expression meets the business requirements. Does it provide enough technical details for developers to implement smoothly?

Customers have a high probability of not caring at all about the concerns of the above two types of personnel. They would like to know what rights they can obtain, and what risks accompany them?

Obviously, we need to communicate with customers in different languages ​​to express their rights and risks to customers.

Moderator: If we have tried our best to communicate, but the client still doesn't understand, what should we do?

Yan Qiang: Good question. To cope with this situation, in recent years, the industry's more respected technology is to moderately adopt automatic learning based on machine learning technology to simplify the customer's understanding of costs and help them more rationally assess the potential risks of the corresponding business.

Let ’s move on to the sixth dimension.

Sixth dimension: data collection control;

Data collection control refers to allowing customers to choose which private data will be collected by the business system, and after initial selection, allows adjustments to future data collection options. Because data collection is the starting point of the life cycle of private data circulation, this requirement can give customers global control over their own private data circulation.

Regarding the privacy data that customers are unwilling to share, under the effect of the data collection control mechanism, they cannot enter the business system in an unauthorized manner to create a customer's psychological security. Traditional access control technology can achieve this requirement well, but if the original system architecture design is not scalable, the related historical system transformation will be a huge engineering challenge.

Moderator: For the sixth dimension, I am curious. If users stop authorizing companies to collect privacy data, will the original business model be affected?

Yan Qiang: This is obvious. The impact of business model is on the one hand, and the related technical architecture upgrade costs may be more worthy of attention.

As we all know, a computer system is an extremely rigorous system. If the original system design has a high degree of coupling to private data, if the key input of private data is suddenly removed, the overall system may stop working.

The business model impact caused by privacy compliance must be industry-wide. If any company can adjust and upgrade its own business system faster to achieve technical compliance, it can actually seize market opportunities for the company itself.

Moderator: This reminds me of the impact of the EU GDPR on the information industry. Only companies with sufficient accumulation in technical compliance can enter the EU market safely.

Yan Qiang: Compared with the data collection control just now, what is more challenging is the compliance requirement of the next dimension.

Seventh dimension: data usage control

Data usage control refers to allowing customers to adjust or restrict the use of private data in specific business systems.

Originally one of the GDPR's unique compliance requirements, it was called restricted processing rights, and the latest version of the "Information Security Technology Personal Information Security Specifications" also has relevant regulations, which are only valid for some types of business. At present, it is mainly aimed at personalized recommendation services related to online advertising, and the original intention is to avoid a strong intrusion of personal privacy space caused by too personalized recommendations.

Given the huge GDPR fine mechanism, this is important for the steady development of related businesses in areas where personal privacy is important. The key to effectively responding to this demand is whether an enterprise can reserve space for related privacy data changes early in the system architecture design and reduce the cost of later system transformation.

Moderator: It is generally believed that privacy data is the core competitiveness of enterprises. In this sense, compared to stopping the collection of private data, this compliance requirement to stop using data sounds like self-defeating martial arts?

Yan Qiang: I can't understand it that way. Legally the right to data belongs to the user. The premise for companies to provide services to users and explore greater value based on their privacy data is to respect the wishes of users.

In fact, similar to the compliance requirements of the previous dimension, successive release conferences of refined privacy protection laws and regulations will reorganize the market order. In the long run, I hope to establish a healthier market environment and a sustainable industry ecology.

This transformation process is unavoidable. As an enterprise, the best choice is to actively participate in the preparation of privacy compliance. As long as conditions permit, complete the corresponding technical investment and strategic layout as early as possible to better adapt. Changes in the market environment.

Eighth Dimension: Derived Data Control

Derived data use control means that customers are allowed to have certain control over the derived data generated by their original privacy data after transformation and aggregation.

This is also one of the GDPR's unique compliance requirements, which are currently manifested in two main areas:

1) The right to be forgotten: After the customer deletes the account, the corresponding individual historical data and the aggregated data containing the customer are cleaned up;

2) Data carrying right: The customer has the intention to leave the current business platform, and extract all relevant historical data, such as e-mail, comments, and cloud host data.

The realization of this demand also usually faces a high cost of system transformation. It is recommended that companies consider a comprehensive privacy data traceability mechanism early in the system architecture design to reduce compliance costs for later transformation.

Moderator: Seeing here, I feel that privacy compliance is no longer a problem that can only be solved by two technical solutions. For related industries with compliance requirements, the entire information system architecture and data governance need to fully consider privacy. The need for compliance.

Yan Qiang: This understanding is very correct. The newly released "Personal Information Security Specification for Information Security Technology" also mentioned the concept of "personal information security engineering", and interested readers can learn more about it.

Regarding the scope of derived data, you can refer to the following figure, which also includes machine learning models that everyone may care about.

Moderator: When it comes to machine learning, various artificial intelligence systems do bring a lot of convenience to our daily lives. But I also worry, will the entire human society be controlled by AI in the end?

Yan Qiang: There is indeed a related compliance requirement in the field of privacy protection to protect humans from discrimination by AI.

Ninth dimension: data impact review

Data impact review refers to allowing customers to review business decisions based on their private data, thereby correcting unfair judgments that automated decision-making systems may make and eliminating negative effects such as data discrimination.

This is probably the most challenging requirement in entitlement data assurance compliance. Its focus is on the interpretability of data-driven decision-making system design and restricting the application of difficult-to-explain machine learning models in key areas such as people's livelihood and medical care. This requires companies to develop decision models with high interpretation capabilities or provide alternative technical solutions when designing automated decision-making system designs to reduce compliance costs caused by misjudgments.

Moderator: That is to say, no matter what decisions the AI ​​makes, in the framework of privacy compliance, always have to prepare a manual intervention path, just in case, the necessary review and correction can be made. The new version of the Information Security Technology Personal Information Security Specification also seems to mention related content.

Yan Qiang: Yes, privacy protection with the core goal of respecting human nature is the ultimate means of protection.

However, it is worth emphasizing that in the face of huge amounts of heterogeneous privacy data, the use of appropriate technical methods can greatly improve the efficiency of privacy protection and effectively reduce the cost of enterprise implementation protection.

Moderator: Thank you very much Dr. Yan for the wonderful sharing!