Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

qDAR - Quality of Data at Rest

Improving Data Quality in Immunization Information Systems

The quality of the data residing in Immunization Information Systems (IIS) is critical for their role in disseminating accurate and timely immunization data to support quality healthcare and patient safety such as to help improve vaccine coverage, support outbreaks management, and enhance public health decision making. There are many reasons why IIS data may be inaccurate. Often, the problematic data originate with submissions from independent entities that are outside the control of the IIS. Assessment of existing data and continued monitoring of data will provide insight to the source of poor data. Methods and tools are needed to assess the data quality of consolidated immunization records.

To address these needs, the National Institute of Standards and Technology (NIST), the Centers for Disease Control and Prevention (CDC), and the American Immunization Registry Association (AIRA) are collaborating on a joint project that aims to improve the quality of hundreds of millions of records stored by IISs nationwide by providing a standardized way to perform testing and analysis. The analysis tools targets “data-at-rest”, i.e., a snapshot of patient immunization data that are being stored in the database of an IIS, as well as provide a quality assessment of patient immunization data over time.

The goal of the project, quality of Data-at-Rest (qDAR), is to improve the following quality metrics:

 Data Validity is the degree to which the data conform to the syntax (format, type, range) of their definitions (i.e., to the rules of what is accepted or expected by the IIS). This measure seeks to assess the consistency and validity of the data stored in the IIS so that the data can be used reliably.

Data Completeness is the degree to which full information about a data set, record, or individual data element is captured in the IIS (i.e., the proportion of stored data with complete information measured against the potential of “100%”). This measure seeks to assess the completeness of data, to identify commonly missing values which will inform the creation of guidelines to improve data quality.

Data Timeliness is the amount of time between the occurrence of the real-world event and its documentation in the IIS (i.e., the time lag between the date of vaccination or birth and the date the record is fully processed and ready to use in the IIS). This measure seeks to assess how long after the occurrence of the event the data is reported to the IIS, this helps detect anomalies at the policy level or at individual provider levels, improving timeliness helps grow the confidence that data at the IIS level is accurate at all times.

Data Uniqueness is an indication of the uniqueness of patient records and vaccination records, i.e., to what extent does the IIS contain duplicate records. Patient records can be reported from various sources and the lack of a unique global identifier to recognize patients create a challenge in merging relevant records together. This challenging problem is often mitigated with rule based engines and AI-driven algorithms. HIT vendors have their own proprietary implementation of such algorithms, qDAR uses an open source machine learning algorithm that allows the detections of “possible” duplicate records, this analysis detect records that likely should have been merged by the IIS and thus provides an indication of how well the IIS’s records matching algorithm is performing.

qDAR Approach

Figure 1 provides an overview of the approach for the qDAR tooling.

  • The IIS generates a patient (P) and immunization (V) extract for ages 0-2 years.
  • Data extract is transformed into an aggregate detections file (ADF) using the command line tool (CLI).
  • ADF uploaded to AIRA/NIST tool known as qDAR via the Aggregate Analysis Reporting Tool (AART).
  • qDAR produces two data quality reports: the IIS-Wide Report and Provider Site Breakdown Report. 
  • The Aggregate Analysis Reporting Tool (AART) displays results for IIS alignment with measures and test cases.

    Overview of the qDAR tool
    Credit: Robert Snelick

Figure 1: Overview of the qDAR Tool

See NIST Presentation at 2023 AIRA National Meeting.

Additional Information from AIRA:
https://repository.immregistries.org/files/resources/66a7c731b7c8d/dar_participation_one-pager_final.pdf
https://repository.immregistries.org/files/resources/5e31beef2ef2d/2-data_at_rest_faq_2024.pdf

Better Immunization Data Leads to Better AI

High-quality IIS data provides clean, structured, and comprehensive datasets which are essential for training reliable AI models used for public health. Improving the quality of data directly impacts the performance and accuracy of the AI models that would rely on it, such AI models can be used to create AI-driven simulations and evaluate the impact of policy decisions, create highly realistic test data and when correlated with other public health datasets such as syndromic surveillance datasets AI models can be created to detect early signs of vaccine-preventable disease outbreaks and provide a clear path to prevention. 

Created March 5, 2025, Updated March 14, 2025