De-Identifying Government Datasets: Techniques and Governance
Simson Garfinkel, Joseph Near, Aref Dajani, Phyllis Singer, Barbara Guttman
De-identification is a general term for any process of removing the association between a set of identifying data and the data subject. This document describes the use of deidentification with the goal of preventing or limiting disclosure risks to individuals and establishments while still allowing for the production of meaningful statistical analysis. Government agencies can use de-identification to reduce the privacy risk associated with collecting, processing, archiving, distributing, or publishing government data. Previously, NIST IR 8053, De-Identification of Personal Information , provided a survey of deidentification and re-identification techniques. This document provides specific guidance to government agencies that wish to use de-identification. Before using de-identification, agencies should evaluate their goals for using de-identification and the potential risks that releasing de-identified data might create. Agencies should decide upon a data-sharing model, such as publishing de-identified data, publishing synthetic data based on identified data, providing a query interface that incorporates de-identification, or sharing data in non-public protected enclaves. Agencies can create a Disclosure Review Board to oversee the process of de-identification. They can also adopt a de-identification standard with measurable performance levels and perform re-identification studies to gauge the risk associated with de-identification. Several specific techniques for de-identification are available, including de-identification by removing identifiers, transforming quasi-identifiers, and generating synthetic data using models. People who perform de-identification generally use special-purpose software tools to perform the data manipulation and calculate the likely risk of re-identification. However, not all tools that merely mask personal information provide sufficient functionality for performing de-identification. This document also includes an extensive list of references, a glossary, and a list of specific de-identification tools, which is only included to convey the range of tools currently available and is not intended to imply a recommendation or endorsement by NIST.
data life cycle, de-identification, differential privacy, direct identifiers, Disclosure Review Board, k-anonymity, privacy, pseudonymization, quasi-identifiers, re-identification, synthetic data, The Five Safes.
, Near, J.
, Dajani, A.
, Singer, P.
and Guttman, B.
De-Identifying Government Datasets: Techniques and Governance, Special Publication (NIST SP), National Institute of Standards and Technology, Gaithersburg, MD, [online], https://doi.org/10.6028/NIST.SP.800-188, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=936829
(Accessed October 1, 2023)