Technology Sharing | "Junk Input and Junk Output" of Blockchain Technology

Written in the foreword

A few days ago, we wrote an article introducing the "value that blockchain technology brings to different types of data", that is, the ability of the blockchain to guarantee the source, immutability, and authenticity of different data. In this article, we will continue to answer another question that is often (intentionally) ignored by everyone: how data interacts with the blockchain. Like many other systems, blockchain technology is experiencing the pain of "junk input, junk output" (GIGO).

Lie to the blockchain

In a previously published article on data , we found that for those data that is not locally generated by the blockchain and is not publicly available, the blockchain system cannot ensure its authenticity. Unfortunately, most of the world Class data. Therefore, if someone (a certain device) submits fraudulent data to the blockchain, we cannot determine its authenticity at all, and the result is that you keep submitting fake data to the blockchain history. So, you put the garbage on the chain, and the blockchain will return you garbage.

It is reported that applications that ignore this problem abound today, usually they will add some technical layers to ensure data accuracy, here are a few cases:

> Decentralized data market: Use tokens to incentivize companies to sell data-how do you know that the data you buy is real?

> Privacy protection query: This service calculates the number of high net worth individuals in the bank through a zero-knowledge proof, so you can harvest a number without the bank submitting any customer data- so how do you confirm that the bank has not forged the entire customer database?

For publicly available data, you can design a game that allows players at risk to challenge the authenticity of other players' data, just as Chainlink designed it. But as we said before, the vast majority of data in the world is not publicly available.

so what should I do now? The key is to ensure data security at the source .

Keep your data sources safe

If we do not obtain the data from the source, but through a third party or an intermediary, then the authenticity of this data will no longer be trusted without trusting this intermediary. The more mediators involved in data management, the more they have to believe, but if the number of mediators reaches a certain level, this data may be generated by a random number generator.

So, our goal is to get the data as close to the source as possible. E.g:

  • Rather than obtain sales data from the retailer's database, it is better to start with point of sale hardware;
  • Rather than subscribe to the weather forecast on the website, pay attention to the weather sensors that collect the data;
  • Rather than viewing PDF reports of bridge operating companies, it is better to obtain raw data from cameras and sensors installed on the bridge body.

But how to ensure data security from the source? Since most of the data in the world is generated or captured by equipment, we can also describe this issue here as how to protect the data generated by the equipment. We now face three potential failure points:

  • Identity: How do you know what device is generating the data? Is it your expected temperature sensor? Or a random number generator for perpetrators?
  • Processing and transmission: Even if the source of the data is truly determinable, how do you know that this data has not been altered, damaged, or directly replaced during the processing and transmission of the device (such as from a sensor to a communication module)?
  • Digital / analog interface: Even if the identity, processing, and transmission channels are secure, how do you prevent someone from physically changing the channel where the device collects data by accessing a fake input signal source?

Let's solve these problems one by one.

A practical method

  • Identity:

In order to ensure that the identity of the device that generates the data is protected, we can embed a set of public and private keys on the device, let the public key know and check the output of the actual device on the spot, by this practical means to ensure that the identity of the hardware is not a problem. — Of course, this is a relatively simple step.

The tricky part is, how do you ensure that this identity is not stolen or only known to the device? A hardware module called "Secure Element" (SE) can be used here. It can generate public and private key pairs on the chip and is highly tamper-resistant. Normally, this security module does only one thing: sign the message. -This is a great way to provide proof of identity. If you have ever held a credit card or used a modern smartphone, then you have enjoyed the benefits of this security element.

  • Processing and transmission:

To protect the data processing and transmission logic, we use a microcontroller (MCU) with a secure boot program (SB). You can think of the microcontroller here as a super simple computer.

SB ensures that only entities with the correct private key can load applications into the MCU. The logic and related validation of this application can be shared with stakeholders in advance (or directly open source), so that it can be verified after loading.

What's more critical is that after the application has been thoroughly tested, we need to disable all modification functions on the application and the MCU. This is to ensure that the logic of the application is completely immutable from now on, even the manufacturer can no longer make changes.

This solution also has some obvious disadvantages, such as the application can no longer be updated afterwards. But in contrast, we have achieved true device independence (in combination with SE) without external interference, and have perfect certainty and immutability that we can trust.

  • Digital / analog interface

This aspect of the problem is more difficult and cannot be solved by hardware embedded in the data acquisition and relay equipment. Innovative mechanisms must usually be designed to ensure that the interface is not interrupted, but this depends on the situation of each application. Let's take an example.

Suppose you have a refrigerated truck serving a cold chain logistics company. The daily work is to deliver fresh fish to a local supermarket. In order to keep fresh, the fish must be kept within a certain temperature range. If the temperature is too high, the fish will deteriorate; and if the temperature is too low, the taste and meat quality of the fish will deteriorate. In order to confirm that the logistics company complies with the contracted temperature range, the supermarket will install a temperature sensor on the truck.

But what if the truck driver raises the temperature of the refrigeration unit in order to save electricity bills, and then removes the sensor and puts it in a cooler in front of the car? The sensor does not know that it has been moved, but continues to collect and report data that is constant in the contracted temperature range. In other words, the sensor was cheated.

One way to mitigate this risk is to integrate the sensor into hardware and connect it to a refrigeration unit so that it is almost impossible to move it. However, this countermeasure can still be avoided in some way, such as wrapping a bag of ice around the sensor, while other parts of the truck are still warmer than the contract stipulates.

Another potentially better solution (and a bit more expensive) is to attach a tamper-resistant seal to each package of fish and configure a temperature sensor on each package. As a result, if the driver wants to disassemble the temperature sensor, he has to tear the seal, which can easily be found to be a breach of a key clause of the contract.

As mentioned earlier, solving digital / analog interface problems requires a lot of creativity, and solutions require "specific problem-specific analysis."