Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Team DPSyn

The Differential Privacy Synthetic Data Challenge
2nd Place - $15,000 Prize

*An additional $4,000 was awarded for posting their full code solution in an open source repository.
 

Purdue University Logo, which is text that reads "Purdue University"


Zhejiang University Logo, which comprises the school seal and the text reading "Zhejiang University"
      

About the Team
Photo of three men standing in front of a window

Team Members: Ninghui Li and Tianhao Wang from Purdue University; Zhikun Zhang from Zhejiang University

Team DPSyn used the NIST Collaboration Space as their open source repository and can be accessed here. *Note that other contestant source code may also be found on this site.

Ninghui Li is a professor of computer science at Purdue University. He received a Ph.D. in computer science in 2000 from New York University. He has been doing research in security and privacy for over two decades, and currently is Chair of ACM Special Interest Group on Security, Audit, and Control (SIGSAC). 

Tianhao Wang is a fifth-year Ph.D. student in computer science at Purdue University, advised by Professor Ninghui Li. He received a Bachelor of Engineering degree from software school, Fudan University, in 2015. His research interests include differential privacy and local differential privacy. He is a recipient of the Bilsland Dissertation Fellowship and the Email Stefanov Memorial Fellowship.

Zhikun Zhang is a fifth-year candidate in Zejiang University. From October 2017 to May 2019, he was a visiting student at Purdue University, supervised by Professor Ninghui Li. He received his Bachelor degree from Shandong University in 2014. He is a recipient of the National Scholarship for Excellent Ph.D. Students. His research interests include mechanism design, differential privacy and its applications in marginal release, location privacy, machine learning, and crowdsensing systems.

The Solution
Diagram with icons depicting the DPSyn solution described in the text below.

DPSyn aims to generate a synthetic dataset while satisfying differential privacy.

Their algorithm builds on their previous work PriView (published at SIGMOD'14). The synthetic dataset has similar low-degree marginals to the original dataset. Each marginal is specified by a subset of attributes, and can be viewed as a projection from the full contingency table to those attributes. Given a dataset as input, the algorithm has four steps.

  1. The first step is to select the set of marginals to be published. The selection depends on many factors, including the size of the dataset, the total privacy budget, the relationships between the attributes, and any specific utility objective. Some of the information can come from description of the data schema, as well as other available datasets of similar schema.
  2. The second step is to generate noisy marginals on the dataset. The algorithm computes these marginals and adds noises to them so that differential privacy is satisfied. Here we use noises drawn from the Gaussian distribution and recent results on composition of private algorithms to prove the privacy of the algorithm. 
  3. In the third step, we use techniques developed in PriView to make all noisy marginals consistent with each other.
  4. The last step is to generate a synthetic dataset given these consistent marginals. This step requires a new algorithm, which starts with a randomly generated dataset and iteratively changes it to be consistent with the noisy marginals.

Back to Differential Privacy Synthetic Data Challenge Page 

Created July 31, 2019, Updated August 1, 2019