With a long history of devising and revising metrics, measurement tools, standards, and test beds, NIST increasingly is focusing on evaluation of technical characteristics of trustworthy AI.
Advancing Measurements and Evaluating AI Technologies
- Benchmarks – made up of data, tests, and evaluations – provide quantitative measures for developing standards and assessing conformance with standards as well as examining limits and capabilities of AI technologies. Benchmarks drive innovation by measurable advancements aimed at addressing strategically selected scenarios; they also provide objective data to track the evolution of AI science and technologies.
- NIST benchmarks are among the key elements of the testing methodologies and metrics that the agency develops to effectively evaluate AI technologies. This includes testing methods that prescribe protocols and procedures for assessing, comparing, and managing the performance and functionality of AI technologies. NIST is defining quantifiable measures to characterize AI technologies – including measuring accuracy, complexity, explainability and interpretability, privacy, reliability, robustness, safety, security, and bias – and to enable comparisons to human performance.