The development and utility of trustworthy AI products and services depends heavily on reliable measurements and evaluations of underlying technologies and their use. NIST conducts research and development of metrics, measurements, and evaluation methods in emerging and existing areas of AI; contributes to the development of standards; and promotes the adoption of standards, guides, and best practices for measuring and evaluating AI technologies as they mature and find new applications.
NIST has a long history of AI measurement and evaluation activities, starting in the late 1960s with the measurement and evaluation of automated fingerprint identification systems. Since then, NIST has designed and conducted hundreds of evaluations of thousands of AI systems. While these activities typically have focused on measures of accuracy and robustness, other types of AI-related measurements and evaluations under investigation include bias, interpretability, and transparency. Working collaboratively with others, NIST aims to expand these efforts, driving AI research and enabling progress by:
NIST projects are carried out by researchers from a variety of disciplines across the NIST laboratories and frequently in collaboration with industry, other government agencies, and academia. These activities are part of NIST’s efforts to build a strong and active community around the measurement and evaluation of AI technologies – and complement NIST’s establishment of forums dedicated to the advancement of AI metrology research. This spurs collaboration among those who design, develop, deploy, test, and evaluate AI technologies and helps to meet the needs of a broad and diverse AI community. Events convened by NIST to strengthen the AI measurement and evaluation community include:
For more information about how to engage with NIST on AI, see: Engage
NIST has been engaged in focused efforts to establish common terminologies, definitions, and taxonomies of concepts pertaining to characteristics of AI technologies in order to form the necessary underpinnings for trustworthy AI systems. Those characteristics include accuracy, explainability and interpretability, privacy, reliability, robustness, safety, security (resilience), and mitigation of harmful bias. Each requires its own portfolio of measurements and evaluations, and context is crucial. How a given component is measured and evaluated can change based on the context in which the AI system operates.
For each characteristic, NIST has produced – or aims to document and improve – the definitions, applications, tasks, and strengths and limitations of metrics and measurement methods in use or being proposed. NIST also has developed – or may prepare and curate – meaningful data sets with respect to select attributes of interest and apply chosen metrics and measurement methods to various AI systems.
A selection of related projects is displayed here. A more complete list of NIST’s AI measurement and evaluation projects will be posted in the future.