Zero Knowledge Machine Learning (zkML): Privacy and Technology Coexist in the Era of Artificial Intelligence

Zero Knowledge Machine Learning (zkML): Privacy and Technology in Artificial Intelligence

In this era of technological advancement, the emergence of generative artificial intelligence such as ChatGPT and Midjourney has opened up new possibilities for fields such as design and art, software development, publishing, and even finance. Generative AI is like a miracle, promising to break the boundaries of human creativity, greatly enhancing our productivity, and leading us to a higher level of innovation.

To develop software such as ChatGPT and Midjourney to their current level requires years of research and extensive training on large amounts of data to cultivate the artificial intelligence models behind the software. For example, ChatGPT requires a dataset of around 570GB of data from web pages, books, and other sources for training. Some of this data may come from users, who may be completely unaware that their personal data is being used to train AI software. Although most of the collected and used data may be harmless to users themselves, some sensitive or private data may inevitably be mixed in and fed to the model without the user’s consent.

Given the privacy issues that such systems raise, people’s awareness and emphasis on data privacy and security issues are growing. Some are calling for a harmonious balance between utilizing the advantages of AI and protecting individual privacy rights. Fortunately, there is a promising technology that can help bridge this gap –zero-knowledge proof (ZKP).

What is zkML?

Zero-knowledge protocols are a method by which one party (the prover) can prove to another party (the verifier) that a certain proposition is true, without revealing any other information except for the fact that the specific proposition is true. Zero-knowledge technology has been steadily developing since 2022, with projects in the ZK field continuously striving and making significant progress in scalability and privacy protection.

Machine learning is a branch of artificial intelligence that focuses on developing systems that can learn from past data, recognize patterns, and make logical decisions, reducing the need for significant human involvement. It is a data analysis technique that automatically creates analytical models by utilizing various types of digital information, such as numerical data, text content, user interactions, and visual data.

In supervised machine learning, we provide input to a pre-trained model with preset parameters, and the model generates output that can be used by other systems. However, we must emphasize the importance of maintaining the confidentiality and privacy of input data and model parameters. Input data may contain sensitive personal financial or biological feature information, while model parameters may involve sensitive elements such as confidential biometric authentication parameters.

The fusion of zero-knowledge technology and artificial intelligence has given birth to zero-knowledge machine learning (zkML), a powerful new technology with strong ethical implications that has the potential to completely disrupt the way we work.

The Modulus Labs team recently published a paper called “The Cost of Intelligence,” in which they comprehensively benchmarked various existing zero-knowledge proof systems using model sets of different sizes. Currently, in the on-chain machine learning field, ZK’s main application is verifying accurate computation. However, as time and development progress, especially with Succinct Non-Interactive Argument of Knowledge (SNARKs), ZKPs are expected to develop to a degree where they can ensure users’ privacy protection by preventing the disclosure of inputs and protecting them from overly curious verifiers.

zkML essentially integrates ZK technology into AI software to overcome its limitations in privacy protection, data integrity verification, and other aspects.

Use cases of zkML

Although zkML is still an emerging technology, it has already attracted widespread attention and has many remarkable applications. Some of the notable zkML applications include:

  • Computational integrity (validity ML) Validity proof such as SNARKs and STARKs have the ability to verify the correctness of computation, extending this ability to machine learning tasks by verifying that a specific input leads to a specific model output. The convenience of proving that the output is the result of a specific model and input combination helps deploy machine learning models off-chain on specialized hardware and conveniently verify ZKPs on-chain. For example, Giza is assisting Yearn, a decentralized finance (DeFi) yield aggregator protocol, in demonstrating the accuracy of using machine learning to perform complex yield strategies on-chain.
  • Fraud detection By utilizing smart contract data, anomaly detection models can be trained and subsequently recognized by DAOs as valuable indicators of automated security procedures. This proactive and preventive approach enables automatic execution of actions such as pausing contracts when potential malicious activity is identified, enhancing its effectiveness.
  • Transparency in ML as a service (MLaaS) With multiple companies providing machine learning models through their APIs, users have difficulty determining whether the service provider is actually providing the claimed model due to the opaqueness of the API. Providing validity proof alongside the machine learning model API will provide users with transparency, enabling them to verify the specific model they are using.
  • Filtering in Web3 social media The decentralized nature of Web3 social applications is expected to lead to an increase in spam and malicious content. An ideal approach for social media platforms is to utilize an open-source machine learning model that is agreed upon by the community. Additionally, the platform can provide proof of model inference when selecting posts for filtering. Daniel Kang’s analysis of the use of zkML for Twitter algorithms further explores this topic.
  • Protecting privacy The healthcare industry prioritizes the privacy and confidentiality of patient data. By utilizing zkML, medical researchers and institutions can develop models utilizing encrypted patient data, ensuring the protection of individual records. This enables collaborative analysis without the need to share sensitive information, promoting progress in disease diagnosis, treatment effectiveness, and public health research.

Overview of zkML Exploration Projects

Many applications of zkML are still in experimental stages and often appear in hackathons of innovative new projects. zkML opens up new avenues for designing smart contracts, and there are several projects actively exploring its applications:

Image source @bastian_wetzel

  • Modulus Labs: demonstrating the application of zkML through real-world use and related research. They demonstrate the applications of zkML through projects like RockyBot (an on-chain trading robot) and Leela vs. the World (an international chess game that pits the entire human population against a verified on-chain version of the Leela chess engine).
  • Giza: a protocol supported by Starkware that enables the deployment of AI models on-chain in a fully trustless manner.
  • Worldcoin: a zkML-based identity proof protocol. Worldcoin uses custom hardware to process detailed iris scans, which are incorporated into its Semaphore implementation. These iris scans enable important functionalities like membership proof and voting.

Conclusion

Just as ChatGPT and Midjourney had countless iterations to reach their current state, zkML is still being improved and optimized, undergoing iteration after iteration to overcome various challenges ranging from technical to practical:

  • Quantization with minimized accuracy loss
  • Managing circuit size, especially in multi-layer networks
  • Efficient matrix multiplication proof
  • Addressing adversarial attacks

Progress in the zkML field is happening at an accelerated pace, with the potential to reach levels comparable to the wider machine learning field in the near future, especially with the continuous development of hardware acceleration technology.

Integrating ZKPs into AI systems can provide users and organizations utilizing these systems with higher levels of security and privacy protection. As such, we eagerly anticipate further product innovation in the zkML field, where the combination of ZKPs and blockchain technology creates a secure and reliable environment for AI/machine learning operations in the Web3 permissionless world.

We will continue to update Blocking; if you have any questions or suggestions, please contact us!

Share:

Was this article helpful?

93 out of 132 found this helpful

Discover more