Blockchain + genetic testing feasible? Sequencing the genome without exposing personal information!

Source | Wired Compilation | Fire Sauce Editor | Carol Producer | Blockchain Base Camp (blockchain_camp)

Using blockchain technology, Nebula Genomics designed a way for customers to sequence genomes without revealing personally identifiable data .

When biological researchers fall asleep at night, they usually dream about the genome. Everyone involved in your "six-degree separation theory" between you, mine, and us (the six-degree separation theory means that "there will be no more than five people between you and any stranger, that is, You can meet any stranger with up to five people.").

Think about all the genetic information contained in the 6 billion genetic code letters, and it is these genetic information that make you unique. As long as scientists can gather enough information, they can find ways to destroy deadly diseases hidden in DNA.

So far, at least 26 million parts of the world's genomes have been decoded—mostly by companies like 23andMe and Ancestry, but a large part of them are partial decoding, and only a small part is fully decoded.

In 2009, a decade ago, cracking a complete genome would cost $100,000, and today it might cost as little as $1,000. Some companies in the industry believe that by 2021, it is possible to break through $100. So where are all the genomes? There are some voices that the future sequencers have been scared away by the “personal data privacy” factor.

Nevina Genomics' chief technology officer Kevin Quinn said that shortly after the Facebook/Cambridge Analytica scandal broke out in 2018, the awakening of privacy protection began. “People are beginning to realize that the services they use every day are not as expected,” he said. “This has had a big impact on the field of genetics.”

Anne Wojcicki, CEO of 23andMe, also said that concerns about privacy are the main reason for the decline in sales of DNA testing . Several startups, including Nebula, are trying to solve these problems by putting people's DNA on the blockchain.

The startup, co-founded by Harvard Genomics pioneer George Church, launched a low-quality genome sequencing service for $99 at the beginning of last year and wrote data access control into public accounts.

This summer, they added a “sponsored sequencing” model that will provide customers with free clinical-grade genomic testing if they allow Nebula to share certified DNA and other data with pharmaceutical partners. Later, the company launched the first “anonymous sequencing” program in the field, which was designed to achieve complete anonymization of personal information.

When you order a saliva collection package from a company like 23andMe or Ancestry, you must pay by credit card and enter the address. You need to register your account with your email to view the results. All of this is done on an internet browser.

Also, all of the data is associated with the rotating DNA in the saliva tube and quickly becomes a data file consisting of short characters As, Cs, Ts, and Gs. Before the company shares the data with researchers or pharmaceutical companies that want to mine these genetic data, they must strip all of these personal identifiers.

Nebula has done it, but Quinn said that customers must believe that everything can be properly cleaned up and that no one will screw things up. The concept of anonymous sequencing is separated from personal information from the very beginning.

That's why the first step in anonymous sequencing is to clean up your e-commerce habits more comprehensively. Nebula recommends using encrypted email (a service provided by companies such as Enigmail, Mailvelope, and Protonmai) and using VPN to mask your browsing behavior.

Moreover, you must have an address that is not related to your name, and PO Box can come in handy at this time. Secure encrypted wallets or pre-loaded credit cards are also essential. After completing all the steps, you can purchase and receive the Nebula Saliva Collection Pack anonymously. The company sequenced your genome and put it in their secure cloud, no one knows who the genes belong to.

Quinn said: "We don't need to identify who it belongs to, because it is essentially independent. No one has ever done this before." The company said that although the process is based on "not trusting Nebula" On, but it is actually building trust. I know this sounds a bit counterintuitive, but this is a blockchain after all.

There is only one small double helix problem. The genome itself is a unique identifier (may differ from the intricacies of the US gene privacy law), but in recent years, researchers have found that using public databases (such as the police used to capture the "Golden State Killer") Those databases), the possibility of identifying individuals by DNA alone is increasing. "If you have 6 billion pairs of base pairs, what do you do with what others call? It's a more unique identifier in itself," bioinformatician Mark Gerstein (Director of the Biomedical Data Science Center at Yale University) said. .

To prevent hackers from stealing data from the DNA genome repository and combining it with other data to re-identify people, the data should be encrypted, but this is only the first step in data security.

Gerstein mentioned that the problem is that reading the genome requires comparing it to other people's DNA, which is the only way to understand the meaning of the letters. Once the genome is encrypted, it will also keep all software encrypted, and the software will tell you "Where your ancestors came from" or "Your APOE4 version will make you more susceptible to Alzheimer's disease" .

"This process requires computation to understand, which means that the genome needs to move between the server and the database. It's tricky to do this without revealing the underlying sequence." Because the genomic data is huge. Bank numbers, tax returns, medical records, etc. are all small documents.

Therefore, companies that provide knowledge-less storage can encrypt this data and provide a unique key. The computational cost of encrypting the entire genome is much higher, and the cost of running calculations on the encrypted genome is higher.

But this is exactly what Nebula is going to do next. In the past year, Nebula has been working with researchers to build and test a secure computing environment, and related publications are currently being reviewed.

The company plans to deploy the technology from next year, first using the company's own genome interpretation service, which will introduce customers to their health and ancestral ancestry, and ultimately partner with academic and pharmaceutical research partners. Currently, these calculations are performed on a distributed network of Nebula stored genomic data.

Partners can submit queries (for example, if there is an APOE variant that causes Alzheimer's disease) and can only view the results of the query. Only Nebula and genome owners can access plain text data. The last thing you want to achieve is that even Nebula will not have access, only the genome owner can access it.

Although Gerstein is a person who loves to nitpick, he is still very excited about this progress. "This step is significant in terms of developing a true private genome sequencing and storage option," he said. "Because he expects sequencing in the near future, it will become a routine for doctors' offices. Standardizing these safeguards now may help prevent stronger opposition and resistance in the future.

Scientists, I wish you a good dream!