Popular science | Worry about privacy protection? Encrypted data warehouse shows its strength (core use cases and requirements analysis)

This article is derived from the second part of the paper " Encrypted Data Vaults " presented by the Rebooting Web of Trust at the RWOT IX — Prague, 2019 conference. Following the previous section that introduced the current method and architecture of encrypted data warehouses, derived requirements, design goals, and risks that developers should realize when implementing data storage, this section will mainly describe common use cases and requirements analysis of data storage systems And some guidelines and design goals for building an encrypted data warehouse . In the next issue, we will bring the last part of Encrypted Data Vaults to discuss the architecture of the encrypted data warehouse and some security and privacy considerations.

Original: https://github.com/WebOfTrustInfo/rwot9-prague/blob/master/final-documents/encrypted->

Authors (in alphabetical order): Amy Guy, David Lamers, Tobias Looker, Manu Sporny, and Dmitri Zagidulin

Contributors (in alphabetical order): Daniel Bluhm and Kim Hamilton Duffy

First, the core use case

The following four use cases are common application models for data storage systems, but they are by no means the only use cases.

Storage and use of data

The user wants to store the data in a secure location, but does not want the storage service provider to be able to see any data he stores, which means that only the user can see and use the data.

2. Searching for data

Over time, users will store large amounts of data. Users need to search for data, but don't want service providers to know what they want to store or search.

3. Share data with one or more entities

Users typically share their data with multiple entities such as others or services. When saving data for the first time, or during later use, users can decide to grant other entities access to the data in their storage area. Only with the user's explicit consent will his storage space and data be accessible to others.

Users want to be able to revoke access to others at any time, and when sharing data, they can set a validity period for third parties to access their data.

4. Store the same data in multiple places

Users need the system's ability to back up their data across multiple storage locations to prevent data loss. These locations can be hosted by different storage providers and can be accessed through different protocols. These locations may be users' phones or cloud storage. In addition, these locations should be able to synchronize with each other. Therefore, no matter how users create or update data, the data in these locations is up-to-date and can be automatically synchronized without user assistance.

Demand analysis

From the above four core use cases, we can extract some requirements for the storage system.

Privacy and multi-party encryption

One of the main goals of the system is to ensure the privacy of physical data to prevent access to it by unauthorized persons, including storage providers.

To do this, the data must be encrypted as it is transmitted (over the network) and saved (on the storage system).

Because data can be shared with multiple entities, the encryption mechanism must also support sharing encrypted data to multiple parties, allowing multiple parties to access.

2. Sharing and authorization

The system needs to provide an authorization mechanism to allow encrypted information to be shared between one or more entities.

In the system, a compulsory authorization scheme may be specified, and there may be other alternative authorization schemes. These authorization schemes include OAuth2.0, Web access control, and ZCAPs.

3. Identity

The system should be independent of identity. Generally, identifiers in the form of URNs or URLs are preferred. It is assumed that the system will use decentralized identity (DID) in some way, but hard-coding DID is not a good model.

4. Version management and copying

In general, we expect the system to continuously back up information. For this reason, the system needs to support at least one mandatory version management policy and one mandatory copy policy, while also allowing other version management and copy policies.

5. Metadata and search

The system generally stores a large amount of data, and users need to be able to efficiently and selectively retrieve the data. For this reason, the encrypted search mechanism is a necessary function of the system.

It is important for the client to be able to associate metadata with the data so that the data can be searched. At the same time, because the privacy of data and metadata needs to be guaranteed, metadata must be stored encrypted. In addition, the service provider must be able to perform those searches in an opaque and privacy-protected manner without viewing metadata.

6. Communication protocol

Because the system needs to be compatible with various business environments, at least one communication protocol must be enforced. But it is also important that the design should allow the system to use other protocols, such as HTTP, gRPC, Bluetooth, and other online protocols.

Design goals

This section details some guidelines and design goals for building an encrypted data warehouse.

Layered and modular architecture

Using a layered architecture approach ensures that the basic functions of the system are easy to implement, while allowing more complex functional layers to be superimposed on lower layers.

For example, the first layer of the system may contain some mandatory and basic functions; the second layer may contain functions that are useful for most deployments; and the third layer may contain the advanced functions required by a small number of ecological projects; The fourth layer may contain extremely complex functions, which are only required by a small part of ecological projects.

2. Prioritize privacy

The construction of an encrypted data warehouse must first protect the privacy of the entity. When exploring new features, always consider the impact on privacy. New features that negatively impact privacy will be scrutinized to determine if the new features are worth implementing.

3. Push implementation complexity to the client

The system server should focus on the implementation of encrypted data storage and retrieval functions. The more the server knows about the data, the greater the privacy risks faced by the entity storing the data, and at the same time the service provider will have more responsibility for hosting the data. Pushing complexity to the client allows the service provider to provide a stable server-side implementation, and the client can also do some innovation.


To be continued, we will bring the last part of Encrypted Data Vaults in the next issue, discussing the architecture of the encrypted data warehouse and some security and privacy considerations, so stay tuned! If you are interested in data privacy protection and other aspects, welcome to join the ontology technology community and discuss with us.