Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

2018 The Unlinkable Data Challenge

Advancing Methods In Differential Privacy

In the 1st data challenge, PSCR's The Unlinkable Data Challenge: Advancing Methods in Differential Privacy, contestants submitted concept papers proposing a mechanism to enable the protection of personally identifiable information while maintaining a dataset's utility for analysis. In the 2nd data challenge, the Differential Privacy Synthetic Data Challenge, participants implemented their designs and empirically evaluated their performance on real data sets.

 

A man's torso in a suit selecting a lock off a transparent screen in front of them.

Our increasingly digital world turns almost all our daily activities into data collection opportunities, from the more obvious entry into a webform to connected cars, cell phones, and wearables. Dramatic increases in computing power and innovations can also be used to the detriment of individuals through linkage attacks: auxiliary and possibly completely unrelated datasets in combination with records in the dataset that contain sensitive information can be used to determine unique identifiable individuals.

This valid privacy concern is unfortunately limiting the use of data for research, including datasets with the Public Safety sector that might otherwise be used to improve protection of people and communities. Due to the sensitive nature of information contained in these types of datasets and the risk of linkage attacks, these datasets can't easily be made available to analysts and researchers. In order to make the best use of data that contains PII, it is important to disassociate the data from PII. There is a utility vs. privacy tradeoff; however, the more that a dataset is altered, the more likely that there will be a reduced utility of the de-identified dataset for analysis and research purposes. 

Currently, popular de-identification techniques are not sufficient. Either PII is not sufficiently protected, or the resulting data no longer represents the original data. Additionally, it is difficult or even impossible to quantify the amount of privacy that is lost with current techniques.

In this Challenge, contestants focused on creating new methods or improving existing methods of data de-identification in a way that makes de-identification of privacy-sensitive datasets practical. The first phase hosted on HeroX asked for ideas and concepts, while the second phase will be executed on Topcoder and focus on the performance of developed algorithms to produce differentially private synthetic datasets. Total prizes for all stages are worth $190,000. 


Congratulations to the Top 3 Concept Papers! 
 

 

A hexagon outlined in blue with the Georgia Tech logo in the center.

Grand Prize - $15,000
 Team DPGans
Differentially Private Generative Adversarial Network (DP-GAN)  to generate private synthetic data for analysis tasks.

 

Blue hexagon with the Purdue University logo in the center

Runner Up - $10,000
Team DPSyn
Generate a synthetic dataset that approximates many randomly-chosen marginal distributions of the input dataset.

 

A hexagon, outlined in blue with the Westat team logo in the center.

Honorable Mention - $5,000
Team WesTeam
Real solutions from the statistical community for differentially private and high-quality data releases by national statistical institutes.

*PEOPLE'S CHOICE AWARDS: $5,000Georgia Tech Privacy Team, Purdue University 

To find out more information about the challenge and the winners, visit Challenge.gov
Additional details about the Unlinkable Data Challenge can be found on HeroX website.

 

An icon of a blue trophy
Created February 13, 2018, Updated January 24, 2022