Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

NIST AI Measurement and Evaluation Projects

Biometrics

Biometrics are human measurements that can be used to identify a person for a variety of applications, e.g., to grant access to devices, systems or data. NIST hasve been testing and evaluating biometric recognition technologies and assisting in determining where and how biometric recognition technology can best be deployed. 

Face Recognition

Face Recognition research and Face Recognition Vendor Tests (FRVT) provide independent government evaluations of face recognition technologies and assist in determining where and how facial recognition technology can best be deployed. 

Fingerprint

ITL evaluates fingerprint matching technologies by developing datasets to support standards, measurement and evaluation methods, and technology capabilities. 

Biometric Quality 

Performance of biometric systems is dependent on the quality of the acquired input samples. If quality can be improved, either by sensor design, by user interface design, or by standards compliance, better performance can be realized. For those aspects of quality that cannot be designed-in, an ability to analyze the quality of a live sample is needed. 

Minutiae Interoperability Exchange (MINEX) III

Minutiae Interoperability Exchange (MINEX) III is a continuing test of INCITS 378 fingerprint templates. The test is used to establish compliance of template generators and template matchers for the U.S. Government's Personal Identity Verification (PIV) program. MINEX III evaluates template generation and template matching software submitted to NIST in the form of a software library compliant to the MINEX III API . Participants submit a template generator, and optionally, a template matcher. 

Iris Recognition

ITL/IAD conducts and manages the Iris Challenge Evaluation projects. 

Iris recognition is the process of using visible and near-infrared light to take images of a person's iris, it is  then used for recognition in the same category as face recognition and fingerprinting. 

Tattoo Recognition

The Tattoo Recognition Technology Program features a family of activities designed with goals to evaluate and measure image-based tattoo recognition technology. 

Speaker Recognition

The NIST Speaker Recognition Evaluation (SRE) is an ongoing series of speaker recognition evaluations conducted by NIST since 1996. The objectives of the evaluation series are to measure system performance of the current state of technology. 

Computer Vision

The NIST computer vision program includes several activities contributing to the development of technologies that extract information from image and video streams through systematic, targeted annual evaluations and metrology advances. 

Open Media Forensics Challenge

The Open Media Forensics Challenge (OpenMFC) Evaluation is the annual evaluation open to public participants worldwide to support research and help advance the state of the art for image and video forensics technologies. ​OpenMFC releases a series of media forensics development and evaluation datasets to support different evaluation tasks. The participants can visualize the system performance on an online leaderboard evaluation platform.​ 

TrojAI

The IARPA-sponsored TrojAI challenge lies in detecting trojans in AI models. The problem is that an adversary that can disrupt the training pipeline can insert Trojan behaviors into an AI model. For example, an AI learning to distinguish traffic signs can be given just a few additional examples of stop signs with yellow squares on them, each labeled “speed limit sign.” If the AI were deployed in a self-driving car, an adversary could cause the car to run through the stop sign just by putting a sticky note on it. The goal of the TrojAI program is to combat such Trojan attacks by inspecting AIs for Trojans. The TrojAI challenge information can be found at https://pages.nist.gov/trojai/docs/about.html and https://pages.nist.gov/trojai/ with the AI models generated for image and natural language processing tasks by NIST. A baseline trojan detection method developed by NIST is available at https://arxiv.org/abs/2101.12016

ActEV

The Activities in Extended Video (ActEV) series of evaluations is designed to accelerate development of robust, multi-camera, automatic activity detection systems in known/unknown facilities for forensic and real-time alerting applications. 

TRECVID

The TREC Video Retrieval Evaluation (TRECVID) is an ongoing series of evaluation to promote progress in content-based video analysis and retrieval via open, metrics-based evaluation. 

Multimedia Event Detection 

The goal of Multimedia Event Detection (MED) is to assemble core detection technologies into a system that can search multimedia recordings for user-defined events based on pre-computed metadata. The metadata stores developed by the systems are expected to be sufficiently general to permit re-use for subsequent user defined Ad-Hoc events. 

Surveillance Event Detection

Surveillance Event Detection (SED) The objective of the Surveillance Event Detection evaluation is to promote the development of technologies that detect activities that occur in the surveillance video domain. 

Video Surveillance Technologies for Retail Security

The Video Surveillance Technologies for Retail Security (VISITORS) project was to advance predictive analysis technologies and methodologies that are able to detect persons engaged in suspicious activities in surveillance video. VISITORS is being applied in the retail domain. 

CLEAR

Classification of Events, Activities and Relationships (CLEAR) was a multi-national evaluation series that brought together the researchers from the US ARDA VACE Program and the European Union Computers in the Human Interaction Loop Program to focus research on detecting and tracking people, faces, vehicles, etc. and acoustic event detection. 

VACE 

Video Analysis and Content Extraction (VACE) program was established to develop novel algorithms for automatic video content extraction, multi-modal fusion, and event understanding. During the program, progress was made in the automated detection and tracking of moving objects, including faces, hands, people, vehicles, and text. 

Handwriting Recognition and Translation Evaluation

The NIST Open Handwriting Recognition and Translation Evaluation (OpenHaRT) is an evaluation of transcription and translation technologies for document images. The evaluation seeks to break new ground in the areas of document image recognition and translation toward the goal of document understanding capabilities. The objective is to assess the current state-of-the art and to build the critical mass required to solve challenges posed in these areas so that technologies developed from OpenHaRT can be used to distill the vast amount of information available only in foreign language documents in a timely manner. 

SD19

Special Database 19 contains NIST's entire corpus of training materials for handprinted document and character recognition. 

Forensics

NIST is strengthening forensic practice through research and improved standards. Efforts involve three key components: science, policy, and practice. 

Media Forensics Challenge

The Media Forensics Challenge (MFC) Evaluation is the annual evaluation to support research and help advance the state of the art for image and video forensics technologies – technologies.

General Frameworks for AI Measurement and Evaluation 

NIST has developed software to support the AI system evaluation and analysis workflow for several of its evaluations, including TrojAI, ActEV, among many others. Future plans involve increasing the generality of these frameworks to increase code sharing and reuse. Toward that end, the National Cybersecurity Center of Excellence (NCCoE) is building the NCCoE AI Software-Testbed, a modular test bed for organizing and running machine learning experiments, currently focused on security concerns, but has been built with a goal to support additional test and evaluation use cases. 

TrojAI

The IARPA-sponsored TrojAI challenge lies in detecting trojans in AI models. The problem is that an adversary that can disrupt the training pipeline can insert Trojan behaviors into an AI model. For example, an AI learning to distinguish traffic signs can be given just a few additional examples of stop signs with yellow squares on them, each labeled “speed limit sign.” If the AI were deployed in a self-driving car, an adversary could cause the car to run through the stop sign just by putting a sticky note on it. The goal of the TrojAI program is to combat such Trojan attacks by inspecting AIs for Trojans. The TrojAI challenge information can be found at https://pages.nist.gov/trojai/docs/about.html and https://pages.nist.gov/trojai/ with the AI models generated for image and natural language processing tasks by NIST. A baseline trojan detection method developed by NIST is available at https://arxiv.org/abs/2101.12016

Fingerprint

ITL evaluates fingerprint matching technologies by developing datasets to support standards, measurement and evaluation methods, and technology capabilities. 

Material Science

JARVIS-ML

JARVIS-ML introduced Classical Force-field Inspired Descriptors (CFID) as a universal framework to represent a material’s chemistry-structure-charge related data. With the help of CFID and JARVIS-DFT data, several high-accuracy classifications and regression ML models were developed.

Information Retrieval

The information retrieval research uses large, human-generated text, speech, and video files to create test collections through organizing the TREC, TRECVID, and TAC conferences. NIST continues to create new test collections, focusing mainly on collections to support specific information retrieval sub-tasks such as cross-language retrieval and multimedia retrieval. We also develop better evaluation methodology for information access, including improved evaluation measures for comparing systems using test collections and new evaluation measures for interactive searching and browsing operations. 

Text REtrieval Conference

The Text REtrieval Conference (TREC) is an ongoing series of evaluation workshops focusing on a list of different information retrieval (IR) research areas. 

Text Analysis Conference

The Text Analysis Conference (TAC) is a series of evaluation workshops organized to encourage research in Natural Language Processing and related applications, by providing a large test collection and common evaluation procedures. 

TREC Video Retrieval Evaluation

The TREC Video Retrieval Evaluation (TRECVID) is an ongoing series of evaluation to promote progress in content-based video analysis and retrieval via open, metrics-based evaluation. 

Spoken Document Retrieval

The spoken document retrieval (SDR) evaluation designs and implements evaluations of Spoken Document Retrieval (SDR) technology within a broadcast news domain. SDR involves the search and retrieval of excerpts from spoken audio recordings using a combination of automatic speech recognition and information retrieval technologies.

Manufacturing and Robotics

NIST has research efforts in  measurement science, standards, and technology evaluations related to manufacturing and robotics.

Manufacturing Robotics Testbed:

The Manufacturing Robotics Testbed consists of several labs located in three buildings on the main NIST campus. Combined, these serve as a resource for research in robotics for advanced manufacturing and material handling. The test bed contains representative state-of-the-art manufacturing robots, including ones that have been designed specifically for safe interactions with human workers in shared environments. The testbed also includes advanced multi-fingered grippers, sensors, conveyors, and an industrial robot arm that can be mounted on a linear rail or on a pedestal. 

Robotics Test Facility

The Robotics Test Facility is a laboratory for developing standard methods of measuring robot performance. The facility houses artifacts and equipment for measuring how well robots perform under a variety of tasks that abstract real-world challenges. The application domains supported by this facility include urban search and rescue, bomb-disposal, military ground operations, and manufacturing.

Smart Manufacturing Systems Test Bed

The goal of the Smart Manufacturing Systems Test Bed is to extend existing production-focused concepts by designing and architecting a test bed that enables smart manufacturing research and development across the product lifecycle. This process should highlight the challenges and requirements for introducing cyber-physical infrastructure in manufacturing, as well as create opportunities to provide a tangible source of data that other researchers may use to develop and validate smart manufacturing technologies.

Natural Language Processing

NIST's Natural Language Processing (NLP) research involves developing large corpora of human-generated text as well as common metrics and evaluation procedures. NIST's research in NLP supports broad technology areas, including information retrieval, machine translation, and low-resource language applications.

Text REtrieval Conference

The Text REtrieval Conference (TREC) is an ongoing series of evaluation workshops focusing on a list of different information retrieval (IR) research areas. 

Text Analysis Conference

The Text Analysis Conference (TAC) is a series of evaluation workshops organized to encourage research in Natural Language Processing and related applications, by providing a large test collection and common evaluation procedures. 

Machine Translation

The Multimodal Information Group's machine translation (MT) program includes several activities contributing to machine translation technology and metrology advancements, primarily through systematic and targeted annual evaluations. 

LORELEI/LOREHLT

Low Resource Languages for Emergent Incidents (LORELEI) was a DARPA-sponsored program. The goal of the program is to dramatically advance the state of computational linguistics and human language technology to enable rapid, low-cost development of capabilities for low-resource languages. The Low Resource HLT (LoReHLT) open evaluations serve to evaluate component technologies relevant to LORELEI. 

Speech Processing

NIST's Speech Processing program has a long history of activities supporting the development of technologies that extract content from recordings of spoken language and of metrology advancements, primarily through systematic and targeted annual evaluations. NIST's research in speech processing supports broad technology areas, including speech recognition, speaker recognition, diarization, speech activity detection, language recognition, keyword spotting, rich transcription, and speech-to-speech translation. 

Automatic Speech Recognition

NIST has a long history of conducting evaluations in Automatic Speech Recognition (ASR). Recetnly, the focus of the NIST's ASR evaluations is to assess the state of the art of ASR technologies for low-resource languages. 

Speaker Recognition

The NIST Speaker Recognition Evaluation (SRE) is an ongoing series of speaker recognition evaluations conducted by NIST since 1996. The objectives of the evaluation series are to measure system performance of the current state of technology. 

Open Speech Analytic Technologies

The Open Speech Analytic Technologies (OpenSAT) Evaluation Series focuses on the following tasks: Automatic Speech Recognition (ASR), Speech Activity Detection (SAD), and Keyword Search (KWS). 

Language Recognition

The NIST Language Recognition Evaluation (LRE) series is to evaluate the performance capability for language recognition of conversational telephone speech and to lay the groundwork for further research efforts in the field. 

Speech Activity Detection

The purpose of a Speech Activity Detection (SAD) system is to find regions of speech in an audio file. The NIST Open Speech-Activity-Detection evaluation (OpenSAD) is intended to provide Speech-Activity-Detection system developers with an independent evaluation of performance on a variety of audio data. 

Rich Transcription

The Rich Transcription evaluation series promotes and gauges advances in the state-of-the-art in several automatic speech recognition technologies. The goal of the evaluation series is to create recognition technologies that will produce transcriptions which are more readable by humans and more useful for machines. 

Keyword Spotting

An annual evaluation of technologies that perform keyword search in a new language each year. The evaluation is an outgrowth of the 2006 Spoken Term Detection evaluation. 

Spoken Document Retrieval

The spoken document retrieval (SDR) evaluation designs and implements evaluations of Spoken Document Retrieval (SDR) technology within a broadcast news domain. SDR involves the search and retrieval of excerpts from spoken audio recordings using a combination of automatic speech recognition and information retrieval technologies. 

Created March 15, 2022, Updated April 5, 2022