The privacy landscape has grown complex, with more countries rolling out new governance, standards, and requirements and a “hodge-podge and mishmash of laws” that people and organizations need to abide by when handling customer data. So said Dr. Chris Hazard, co-founder of Diveplane, provider of a cloud tool for creating AI-based decision models, in a recent talk at the artificial intelligence (AI) and machine learning (ML) conference Scale TransformX 2021.
Simply checking a box and thinking you’ve done your due diligence isn’t enough, he said.
Hazard gave examples of nefarious uses of data and how certain behaviors and decisions can lead to sharing PII with people who should not have access to it, even if it’s done innocently.
Figure. These are just a few of the more common ways privacy leaks can occur. Image credit: Chris Hazard, Diveplane
Hazard described some strategies that safeguard privacy. He said it’s important to balance the risks and rewards of using privacy-enhancing technologies.
Methods you can use to both convey information and help alleviate the problem include differential privacy, homomorphic encryption, symmetric encryption, and the Laplace mechanism. There are also newer techniques such as Bayesian networks, Generative Adversarial Networks (GANs), and variational autoencoders that you can apply to mitigate privacy issues. Here's a brief summary of
This technique involves inserting a specific amount of randomness, or ‘noise,’ into a dataset so that individuals’ privacy is protected. Researchers can still reach valid conclusions about the data as a whole.
Local differential privacy involves putting privacy and “noise making” on a user's browser, or any type of device, to make sure it randomly gives a response to the server. In this way, “the server, the consumer of the data, really doesn't ever have the data,’’ Hazard said.
This provides visibility into whether an attacker is looking at this data or if there's too much noise to be able to pull out what's going on.
“Synthetic data is a very powerful technology” that can be used in many ways to enhance privacy, Hazard said. The idea behind synthetic data is to try to obtain new samples from a dataset without relying upon the original data. There are many techniques you can use to create synthetic data, including Bayesian networks, GANS, and variational autoencoders. Hazard discussed their pros and cons in his presentation (see above).
He also noted that data can be synthesized multiple times, which unlocks many different use cases.
These networks use deep learning methods to generate data, including neural networks. For every new advancement in synthetic data or GAN that is created, there's a way of attacking it, Hazard said.
Privacy is beginning to be applied to unstructured data, images, and text. Hazard emphasized the importance of applying it correctly “because of things that could bite us in the future,” especially when it comes to customers.
Any approach a company takes should be “very mathematically powerful,” Hazard said, but make sure you are using multiple privacy techniques and applying checks and balances.
Synthetic data with differential privacy is one example of this. When methods are integrated tightly with your production systems and applied with other types of anonymity preservation and privacy measures, you can unlock the value of data while keeping it safe.
For more details on how to build an AI strategy for your business with real-world examples, watch Hazard’s talk, “How to Use Privacy to Prevent Adverse Customer Outcomes,” and read the full transcript here.