Launch an initiative to create guidance and benchmarks for evaluating and auditing AI capabilities (EO Sec. 4.1(a)(i)(C)) and Develop and help to ensure the availability of testing environments in coordination with DoE and NSF (EO Sec. 4.1(a)(ii)(B)) and Develop guidelines for AI red teaming (EO Sec. 4.1(a)(ii))
NIST advances the science and practice of AI safety and trust. As AI develops and becomes more woven into our daily lives, testing and evaluation of AI models needs to go beyond performance and accuracy to reflect today’s increasing risks and impacts on people and society.
Under the Executive Order, NIST will create guidelines and benchmarks for evaluating AI capabilities which could cause harm such as cybersecurity and biosecurity. Additionally, the Executive Order directs NIST, in coordination with the Department of Energy (DOE) and the National Science Foundation (NSF), to ensure the availability of testing environments.
NIST will develop guidelines for AI evaluation and red-teaming. NIST will also establish testing environments for test, evaluation, verification, validation (TEVV) of AI systems’ safety and trustworthiness. The work of NIST Generative AI Public Working Group’s (PWG) supports this assignment. NIST will launch evaluation efforts to measure risk and impacts of AI systems reflecting the context of real-world deployment. Test environments will be created in which risks and impacts of both individual user and collective behavior can be examined.
NIST plans to organize community evaluations or challenges to spur innovation in development of tools, methods, and techniques for trustworthy AI as well as to advance interoperable testing methodologies and testbeds across domains and industries. The outcome and output of these evaluations will accelerate the development of standards and guidelines to inform policy makers.
NIST welcomes inputs on its EO assignments related to TEVV through February 2, 2024, via a Request for Information.
NIST will combine the work of the GAI PWG with responses to the RFI in a draft document that will be available for public comment.