Data isn’t just increasing in quantity; it’s also growing in complexity. Data lakes are now replacing data warehouses. All your online activities now generate unstructured “data exhaust” that’s being mined by sophisticated machine learning algorithms for insights. However, due to increasing data breaches and privacy concerns, regulators have adopted stringent rules and regulations on how data can be stored, moved, and used.
The advent of cloud computing has only made it much more difficult. When data is sent to the cloud, where exactly does it reside? That’s now referred to as “data residency,” and it matters because it determines what regulations the data is subject to. Even something as simple as collecting emails on a website is subject to General Data Protection Regulation (GDPR) rules that govern what you need to disclose to people about what you plan to do with their data.
Companies can no longer just copy all their global data into a single data warehouse and run queries against it. There’s an increasing need to preserve data residency while manipulating data, making it challenging for machine learning algorithms that can’t work across disparate data sets. One company working to modernize Privacy-Preserving Machine Learning (ML) and data analytics experience for organizations is Inpher.
Founded in 2015, New York startup Inpher has taken in $14 million in disclosed funding, $10 million of which came in the form of a Series A led by JP Morgan about two years ago. Since then, Inpher’s team of world-leading cryptographers have built a solution called XOR Secret Computing(R) (or XOR, for short — pronounced ‘ex-or’). Half of the team is in the States, while the other half is in Lausanne, Switzerland, at the Swiss Federal Institute of Technology (EFPL). That’s where Inpher co-founder Dr. Dimitar Jetchev works as a Professor of Mathematics, heading a research group in mathematical cryptology that has published over 300 papers in Cryptography, Math and Advanced Machine Learning. Around half of the team at Inpher have PhDs, but what they’ve accomplished goes well beyond academic theory.
Inpher first came across our radar last month in a piece we wrote about 6 Privacy Solutions for Big Data and Machine Learning. In that article, we talked about how techniques such as homomorphic encryption allow companies to leverage sensitive data without having read-only permissions to the data. In other words, they’re able to train machine learning algorithms using data that remains totally private throughout the process. To get our heads around this up-and-coming technology, we met with the experts at Inpher to learn more about the data privacy problems companies face and the solutions being built to address them.
The Data Residency Problem
Companies are increasingly moving to store their big data in the cloud, where specialized chipsets power machine learning algorithms and produce valuable insights such as predictive analytics. With cloud computing growing in the double-digits, more and more of the world’s sensitive data comes online for companies to analyze. This results in two critical needs:
- Being able to analyze data regardless of where it resides
- Keeping data encrypted and secure at all times, even when it’s being analyzed
A WSJ article from last month talks about how the European Union may soon require companies to increase privacy safeguards for data that are transferred outside the bloc. Being able to process data where it’s being stored reduces regulatory complexity. It also allows you to analyze data across multiple data sources, each having a different owner.
There’s an increasing need for partners and even competitors within a particular domain to share data to solve large industry-wide problems. For example, machine learning algorithms that detect fraud are much more effective if they’re allowed to train across diverse data sets from many organizations. Companies will be more than willing to share their data in return for a highly accurate model that can dramatically increase revenues or decrease costs.
The Data In-Use Problem
In October of 2019, Inpher co-founder Dr. Jordan Brandt testified before the “Task Force on Artificial Intelligence – U.S. House Committee on Financial Services” about something called “the data in-use problem.” While encrypting data is decades old, processing the encrypted data isn’t something we’ve been able to do because it’s just too resource-intensive.
There are three states data can be in while encrypted (example encryption methods in parenthesis):
- Encryption for data in-transit (https://)
- Encryption for data-at-rest (Advanced Encryption Standard)
- Encryption for data in-use (Privacy-Enhancing Computation)
The first two bullet points above describe encryption methods that are a common practice in most organizations. It’s the third which hasn’t been solved yet, which is why the majority of data manipulated by any program or algorithm today needs to be in raw form. Because of this, many valuable big data sets remain off-limits because companies don’t want to run the risk of exposing sensitive data. That’s where Inpher’s Secret Computing solution comes into play.
The Secret Computing Solution
Encryption is a computationally intensive activity because it’s often based on mathematics. Until now, performance has been the roadblock keeping encryption-in-use technologies from becoming widely adopted. That’s all changed thanks to advancements in mathematics (number theory and cryptography advancements) and computing power (architecture, cloud, GPUs, and special-purpose hardware). Each of these has contributed to about equal proportions towards solving the problem, allowing Inpher to achieve multiple order-of-magnitude improvements in the performance of both homomorphic encryption and multiparty computation. Since we haven’t talked about “multiparty encryption” yet, let’s dig into that a bit with a simple example of how secret computing works.
How Does Secret Computing Work?
Secret Computing makes it so that data can stay resident in each country it needs to be in, while still being analyzed as if it were all in the same place. Using mathematical techniques, Secret Computing allows researchers to use data from these privacy zones while also not having any access to the raw data. A short YouTube video by Inpher explains the process using the simple example of employee salaries.
Let’s say you have three employees who don’t want to disclose their salaries, but you need to figure out the average salary of all three.
Secret Computing uses a mathematical concept called “additive secret sharing.” Each data record is assigned random numbers that represent information about the other records in the set. The values are then added and averaged to arrive at the result without knowing the actual data records being used.
This simple example shows how mathematics can be used to encrypt data so sufficiently strong that even a quantum computer couldn’t figure out the raw data being used. Machine learning scientists can now learn from data sets that they don’t even have read access to.
Who is Using Secret Computing?
Inpher is competing with big tech companies with research teams focused on deploying solutions for internal use, but nothing appears production-ready. Open source solutions all come with the usual “this is not hardened or ready for production” caveat. To date, nothing has been deployed as a product that an enterprise can purchase and use.
That’s where Inpher is ahead of the pack with a solution that’s already being used in some of the most demanding environments, such as large, regulated banks. While lots of companies are working on privacy-enhancing computation, only Inpher has a technology that’s enterprise-ready such that it’s able to meet the rigid requirements of compliance departments at multiple major banks.
Inpher’s initial focus is on industries where data science and analytics are mature and where the data being analyzed is subject to a great deal of regulatory scrutiny. It makes sense that financial services and life sciences are the fastest growing areas currently for Inpher. Clients in the finance sector include JP Morgan, ING, and BNY Mellon. On the life sciences side, they’re working with a consortium of healthcare companies championed by Philips:
In September of this year, Inpher announced XOR’s availability on Amazon Web Services (AWS) under a Software-as-a–Service (SaaS) model. They’re also participating and leading multiple industry alliances that help introduce some standardization to the entire process. In the future, using machine learning algorithms to process encrypted data will be the norm, not the exception.
Many of today’s CTOs are tasked with using machine learning algorithms to glean insights from proprietary data sets located throughout their global organizations. Solutions like those built by Inpher are quickly becoming must-haves for any company that wants to enjoy the incredible benefits of machine learning while preserving data residency and adhering to the ever-growing list of data regulations.
XOR is available as software-as-a-service for both on-premise and hybrid cloud implementations. Contact Inpher now to sleep well at night knowing your data is secure with Secret Computing.
View original post