Consistent practices to support the validity, transparency, and reproducibility of AI evaluations are only beginning to emerge. To further the development and voluntary adoption of such practices, the Center for AI Standards and Innovation (CAISI) at the National Institute of Standards and Technology (NIST) is requesting public comment on a draft document, NIST AI 800-2 Practices for Automated Evaluations of Language Models, through March 31, 2026.
NIST AI 800-2 documents preliminary best practices for evaluating language models and AI agent systems. The document addresses a particular type of evaluation: automated benchmark evaluations.
The primary audience for NIST AI 800-2 is technical staff at organizations evaluating AI systems, including AI deployers, developers, and third-party evaluators. However, all potential consumers of AI evaluation reports can benefit from robust, well-communicated evaluation practices that advance gold-standard science and inform AI procurement and implementation decisions. While such evaluations cannot meet all AI evaluation objectives, they are a common measurement instrument that may be particularly useful when organizations face constraints on time, expertise, or resources. CAISI anticipates producing additional voluntary guidelines for further types of evaluations in the future.
The practices in this initial public draft reflect CAISI’s experience partnering with leading U.S. AI organizations to evaluate frontier AI models, as well as ongoing measurement science research at NIST and beyond. The draft organizes practices into three sections and a glossary: (1) defining evaluation objectives and select benchmarks, (2) implementing and running evaluations, and (3) analyzing and reporting results.
CAISI invites input on any aspect of this draft document, particularly:
A 60-day comment period is now open, closing March 31, 2026. Feedback can be emailed to AI800-2 [at] nist.gov (AI800-2[at]nist[dot]gov) in any form, including markup of the draft, bulleted lists of comments, etc. CAISI does not plan to publish this feedback, but all emails, including attachments and other supporting materials, may be subject to public disclosure.
CAISI encourages all stakeholders to provide input, including organizations with experience conducting AI evaluations as well as users of AI evaluation reports – for instance, business decision-makers, procurement specialists, and technical integrators.
The document is available here.