Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Team RMcKenna

The Differential Privacy Synthetic Data Challenge
1st Place - $25,000 Prize

*An additional $4,000 was awarded for posting their full code solution in an open source repository.

Colored font spelling UMass Amherst

About the Team
Photo of a man wearing short sleeves, shorts, and sunglasses sitting in the grass

Team member Ryan McKenna, from UMass Amherst competed as a one-man team.

Team RMcKenna used the NIST Collaboration Space as their open source repository and can be accessed here. *Note that other contestant source code may also be found on this site.

The Solution

At a high level, Team RMcKenna's algorithm is quite simple and can be broken up into two main steps.

  1. Measure a carefully chosen set of 1,2, and 3-way marginals of the private data using the Gaussian Mechanism.
  2. Find a synthetic dataset that (approximately) has those marginals.

More specifically, their algorithm combines three orthogonal ideas. These are listed below:

  1. Identify marginals to measure based on a mutual information criteria [1,2]. That is, measure marginals between attributes that are highly correlated. Rather than computing these statistics privately, we instead compute them on the (public) provisional dataset.
  2. Use the Gaussian mechanism with the moments accountant [3] to determine the magnitude of noise required for DP. This method leads to less noise on the measured marginals than the sequential and advanced composition theorems.
  3. Use graphical models to estimate the data distribution from the measured marginals [4], and obtain synthetic data.

 

[1] Zhang, Jun, et al. "Privbayes: Private data release via bayesian networks." ACM Transactions on Database Systems (TODS) 42.4 (2017): 25.
[2] Chen, Rui, et al. "Differentially private high-dimensional data publication via sampling-based inference." Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015.
[3] Abadi, Martin, et al. "Deep learning with differential privacy." Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2016. 
[4] McKenna, Ryan, Miklau, Gerome, and Sheldon, Daniel. "Graphical-model based estimation and inference for differential privacy." Proceedings of the 36th International Conference on Machine Learning. 2019.

 

Back to Differential Privacy Synthetic Data Challenge Page

Created July 30, 2019, Updated August 1, 2019