Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Test, Evaluation & Red-Teaming

Launch an initiative to create guidance and benchmarks for evaluating and auditing AI capabilities (EO Sec. 4.1(a)(i)(C)) and Develop and help to ensure the availability of testing environments in coordination with DoE and NSF (EO Sec. 4.1(a)(ii)(B)) and Develop guidelines for AI red teaming (EO Sec. 4.1(a)(ii)) 

On July 26, 2024, NIST released an Initial Public Draft of Managing Misuse Risk for Dual-Use Foundation Models

On July 26, 2024, NIST released open-source software, Dioptra, available for free download, which could help AI systems users and developers measure how certain types of attacks can degrade the performance of an AI system.  

On May 28, 2024, NIST launched ARIA, a new program to advance sociotechnical testing and evaluation for AI. On April 29, 2024, the agency announced the NIST GEN AI challenge

NIST advances the science and practice of AI safety and trust. As AI develops and becomes more woven into our daily lives, testing and evaluation of AI models needs to go beyond performance and accuracy to reflect today’s increasing risks and impacts on people and society. 

Under the Executive Order, NIST will create guidelines and benchmarks for evaluating AI capabilities which could cause harm such as cybersecurity and biosecurity. Additionally, the Executive Order directs NIST, in coordination with the Department of Energy (DOE) and the National Science Foundation (NSF), to ensure the availability of testing environments. 

NIST is developing guidelines for AI evaluation and red-teaming. On July 26, 2024, NIST released an Initial Public Draft of Managing Misuse Risk for Dual-Use Foundation Models This draft document provides guidelines for managing the risks to security, national economic security, and national public health or safety posed by dual-use foundation models. Examples of such risks include helping to develop chemical, biological, radiological, or nuclear weapons, enabling offensive cyber attacks, and helping bad actors to carry out malicious activity in a deceptive and obfuscated manner. Read More. The work of NIST Generative AI Public Working Group’s (PWG) supported this assignment. 

NIST is also establishing testing environments for test, evaluation, verification, validation (TEVV) of AI systems’ safety and trustworthiness. On July 26, 2024, NIST released open-source software, available for free download. Dioptra could help AI systems users and developers measure how certain types of attacks can degrade the performance of an AI system.  

NIST will launch evaluation efforts to measure risk and impacts of AI systems reflecting the context of real-world deployment. Test environments will be created in which risks and impacts of both individual user and collective behavior can be examined. Community evaluations or challenges will spur innovation in development of tools, methods, and techniques for trustworthy AI as well as to advance interoperable testing methodologies and testbeds across domains and industries. The outcome and output of these evaluations will accelerate the development of standards and guidelines to inform policy makers. 

On May 28, 2024, NIST launched ARIA, a new program to advance sociotechnical testing and evaluation for AI. On April 29, 2024, the agency announced the NIST GEN AI challenge

 

Created December 21, 2023, Updated July 26, 2024