Differential Privacy for Privacy-Preserving Data Analysis: An Introduction to our Blog Series

This is fantastic...I've read several "non-technical" papers on differential privacy, and they're not easy to grasp! As I talk with interested clients and others, I'm going to direct them to this!

Do we guarantee that if the same data is analyzes more than once, it will result in the same answer.
So if Database D1 goes through the analysis 3 times, the answer will be the same A each time?

Great first blog!

Looking forward to examples of architectural patterns that included privacy by design concepts.

Informative. Waiting for the next post.

As someone math impaired it would be great if you built out the Joe example with illustrative data - for example what specific Joe's data would be discussed - perhaps via income or medical condition. Also what in reality would 'noise' look like in this account. Great start and looking forward to more.

Hi, Anil. In general, no, differential privacy doesn't make this guarantee – if you run an analysis on the same data 3 times, you'll choose a new noise sample to add each time, and you'll get 3 different answers. But, each time you run the analysis, you incur a "privacy cost" (ε), and these add up. So if you run the analysis 3 times, your total privacy cost is 3⋅ε.

This kind of composition prevents an "averaging attack" where you run the same analysis many times and average away the noise. Differential privacy systems typically set an upper bound on privacy cost called the "privacy budget," and stop answering queries when it's exceeded.

Thank you for sharing your question!

Hi... The idea is great, the Blog is looking very much interesting, But let me know is it possible to get high level of privacy just by having anonymization? because, Hope you all know the problem of NETFLIX during 2007, where researchers released around 99% of data from their data source which was built on collaborative filtering algorithm.
I am very interesting to see your next blog ,how you analyse the security issues for the Differential privacy models.

Thanks for this question! The major challenge with traditional "anonymization" (or "de-identification") techniques is that it's often difficult to measure what privacy level you have obtained. In general, it's not possible to prove that a given strategy for de-identification yields a particular level of privacy, and de-identified datasets are often susceptible to linking attacks using auxiliary data that can lead to the re-identification of individuals. This is one of the main motivations behind the development of formal privacy notions like differential privacy.

We’re glad you’re interested in this series. In case you missed it, the next post in this series on threat models is now live, and you can check it out via the following link: https://www.nist.gov/blogs/cybersecurity-insights/threat-models-differe…

Add new comment

Your name

CAPTCHA

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.

Please be respectful when posting comments. We will post all comments without editing as long as they are appropriate for a public, family friendly website, are on topic and do not contain profanity, personal attacks, misleading or false information/accusations or promote specific commercial products, services or organizations. Comments that violate our comment policy or include links to non-government organizations/web pages will not be posted.

Cybersecurity Insights

Differential Privacy for Privacy-Preserving Data Analysis: An Introduction to our Blog Series

Share

The Challenge

Differential Privacy

Benefits of Differential Privacy

Coming Up Next

References

About the author

Joseph Near

David Darais

Kaitlin Boeckl

Comments

Add new comment

Plain text