Blog | AI and Data

Enabling AI on Personal Data with Privacy Preserving Analytics

share on

by Sanjeev Kapoor 01 Apr 2019

A large number of commercial services are nowadays using personal data of their users, such as their location, purchase transactions, payment transactions, social security numbers, activity data, authentication data, social interactions, browsing history, and more. These data are commonly used to deliver personalized applications in a variety of application sectors, including retail, finance, healthcare and social media. Nevertheless, the extensive collection and use of personal data raises important privacy and data protection concerns. For instance, consumers are interested in preventing the use of their data for purposes other than the ones specified in the data collection process. Likewise, they would like to have fine grained control over their data through controlling when and with whom to share data. Citizens and businesses are in several cases reluctant to share their data as means of mitigating the above risks. This leads to “data silos” i.e. data analytics processes and data intensive applications that are addressed to the data producers only and which do not provide opportunities for repurposing and reusing data across different applications.

Much as this reluctance is a privacy protection measure, it is also a set-back to innovation. The breaking of the data silos and the ability to reuse datasets across data-intensive applications could unveil significant opportunities for innovative applications. As a prominent example, by combining data from different financial organizations, it is possible to create more accurate user profile for highly personalized applications. As another example, the integration of healthcare data that are collected by different people, devices and applications can facilitate the extraction of medical knowledge, thus opening new horizons to clinical research and observational studies. Therefore, it’s important to lower the data sharing barriers, through alleviating risks and offering data providers with incentives to share their data beyond a single application. One of the best ways to provide incentives for data sharing is to create a data market as a tool for connecting the demand (i.e. data buyers) with the supply side (i.e. data providers) based on secure and trustworthy data exchange mechanisms. In the scope of a data market, multiple providers are willing to share their data in exchange of some monetary or other incentive. At the same time, data buyers have the opportunity of accessing data from multiple providers at a specified cost.

There are also technological measures that can encourage and facilitate data exchange, notably measures based on privacy enhancing and privacy awareness mechanisms. The latter aim at safeguarding the privacy of data providers at all times. One of the most prominent privacy enhancing mechanisms for data intensive applications, concerns the employment of privacy preserving analytics techniques.

AI and Data or something else.

Let's help you with your IT project.

Secure Computing for Privacy Preserving Analytics

Privacy preserving analytics schemes aim at analyzing data from a source, without disclosing information about it and in a way that minimizes the risk of presenting the source’s data to malicious parties (e.g., enterprises that would like to exploit users’ personal data). In essence, privacy preserving analytics techniques enable application developers to analyze vast amounts of personal data, without however compromising the source data. In practice, this means that application developers and end users can see the outcomes of the analysis (e.g., the answers and results of a query) without disclosing information about the source data where this query was executed.

When querying or analyzing data from a single source, there is a straightforward way for safeguarding the privacy of the source: Moving the query to the data source, rather than moving the data to the analytics through the network. As soon as the data remain at the system of the data provider, they can be secure and private. Nevertheless, the tactic of moving the query to the data source cannot work in cases where data from multiple data providers need to be processed, which is a common and useful scenario in the scope of a data market. In such cases some data have to be moved outside the organization of the data provider, which makes them susceptible to eavesdropping and other forms of hacking. In order to alleviate relevant privacy concerns, the data that are moved out of the organization can be encrypted. Furthermore, secure computing techniques can be employed in order to enable execution of queries on data that remain encrypted at all times.

Secure computation solutions provide a means to query data without ever decrypting them, which offers a solution for cases where data owners lose control of the data analytics process, typically when their data needs to be processed outside their organization. In particular, based on secure computation data analyzers can obtain answers to a query, without decrypting source data, which permits data owners to retain control of their data at all times. Secure computing schemes are based on advanced mathematical and cryptographic formulations such as:

Multiparty computation (MPC), which creates methods for parties to jointly compute a function over their inputs while keeping those inputs private. Contrary to traditional cryptography that assures security and integrity of communication or storage and the adversary is outside the system of participants, MPC can also work in cases where the adversary is one of the participants that directly engage in the system or indirectly controls some of the internal parties. Moreover, while in traditional cryptography the goal is to conceal content, MPC conceals partial information about data and uses data from different sources (parties) in order to correctly produce outputs (i.e. the answers).
Homomorphic encryption (HE), which enables computational operations on ciphertexts (i.e. texts encrypted using some cryptographic algorithm), that generate an encrypted result, yet when decrypted match the result of the operations exactly as if they had been performed on the plaintext. As such it provides also the means for viewing the ultimate answers, without ever decrypting the source data to a query.

Putting Secure Computing to Work: From Theory to Practice

These secure computing schemes have been known to the research community for several decades. However, their practical applicability in real life applications beyond research prototypes has been quite limited, mainly due to the poor performance of the corresponding implementations. This situation has changed during recent years, where the maturity and performance of secure computing frameworks have been improved. This is clearly reflected in the emergence of start-up enterprises that offer cryptographic products and privacy preserving services based on secure computing techniques. As a prominent example, Unbound (MPC) applies secure computing in order to ensure that secrets such as cryptographic keys, credentials and private data are never processed in complete form. Likewise, Sepior leverages secure computing techniques to provide threshold key management solutions for blockchain, cryptocurrencies, and cloud services providers infrastructures. There have also been remarkable academic projects on secure computing, such as MIT’s Enigma project that offers a decentralized platform for secure computing based on blockchain technology. Beyond research efforts and start-ups, secure computing is nowadays integrated with several products of large IT providers.

The practical applicability of secure computing is largely due to the emergence of specialized high performance hardware. Nevertheless, practical deployments remain challenging for performance reasons. According to some benchmarks, homographic encryption is still million times slower than traditional computations, which is the reason why secure computing techniques leverage specialized hardware mechanisms in modern CPUs like Intel’s Secure Guard Extensions (SGX).

In the coming years, we expect to see rising needs for privacy preserving processing of datasets, as part of the wave of BigData and Artificial Intelligence (AI) applications. Assuring end-users that their data remain encrypted at all times can lower the barriers of data sharing, while relaxing relevant privacy concerns. Secure computing can provide a great technical contribution in this forefront, especially as evolution in hardware and computing make it a viable alternative for privacy preserving analytics.