Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Measurement Science for Automated Vehicles

Summary

Automated vehicles rely on artificial intelligence to make driving decisions every fraction of a second, such as when to brake, merge, change lanes, or yield to other drivers. These decisions are the most important factor in vehicle safety, yet no standard way exists to measure whether an automated vehicle makes good decisions. Research shows that two thirds of crashes result from poor decision-making rather than failure to see hazards. Automated vehicles must solve this same problem.

NIST is developing measurement tools to evaluate how well automated vehicles make driving decisions. The approach combines computer simulations with a physical research vehicle. In simulation, an automated vehicle navigates thousands of driving scenarios built from real crash data. Four levels of evaluation assess whether the vehicle’s actions were safe, whether outcomes could be predicted, whether better choices were available, and whether repeated mistakes point to deeper problems. Importantly, none of these evaluations require manufacturers to share their proprietary software.

The result will be a publicly available set of tools and methods that regulators and industry can use to evaluate automated vehicle decision-making before deployment on public roads, supporting safer transportation for everyone.

Description

Objective
Develop measurement science for evaluating AI-based decision-making systems in automated vehicles.

Technical Idea
This project produces measurement methods, reference baselines, test tools, and scenario datasets that solve the measurement gap described above. Current evaluation methods only answer “did a crash happen?” This project’s products answer the harder question: “Did the decision-making system make the best available choice?”

NIST will integrate its physical autonomous research vehicle with the open-source CARLA simulator to create an implementation-agnostic test harness. A server-client architecture allows any automated driving system (ADS) to connect as a plug-and-play client, receiving simulated world state and returning vehicle actions for evaluation. Working with USDOT and NHTSA, NIST will develop a curated scenario suite drawn from NHTSA crash data and the Safety Pool™ library. Decision-making algorithms from open-source implementations (e.g., Autoware) will serve as reference systems. The physical research vehicle validates that simulation-derived metrics are meaningful on real systems.

Using this testbed, NIST will develop measurement science organized across four progressive tiers:

  • Tier 1: Tracks surrogate safety metrics (e.g., Time-to-Collision) from vehicle actions in real time during each scenario.
  • Tier 2: Uses Tier 1 data to predict likely outcomes through forward simulation, providing real-time decision quality indicators.
  • Tier 3: Compares selected actions against NIST-defined optimal baselines computed through counterfactual simulation after each scenario.
  • Tier 4: Diagnoses systematic weaknesses by analyzing Tier 3 deviation patterns across many scenarios.

No tier requires access to proprietary algorithms, enabling evaluation of any ADS as a black box. Together, the four tiers define the ADS’s safe operating envelope and expose where decision-making falls short.

Tier outputs feed into an operational effectiveness analysis that produces actionable insights for regulators (pre-deployment evaluation), industry (benchmarking without IP exposure), and standards bodies (metrics proposed for ISO and SAE adoption). NIST’s role is to develop the measurement methods and reference baselines; determinations of acceptable performance thresholds remain with regulators and industry stakeholders.

Research Plan
The research plan describes how the team will produce the products described above. This work directly supports America’s AI Action Plan: “Solidify American Dominance in AI Innovation” by building capacity to evaluate AI capabilities, and “Bolster American Leadership in Standards” through ISO and SAE contributions.

Foundation: Build the CARLA/Autoware simulation testbed and server-client architecture. Survey and curate an initial scenario suite from NHTSA crash data and the Safety Pool™ library. Develop and validate the Tier 1 surrogate safety metric library (Time-to-Collision, Post-Encroachment Time, Deceleration Rate to Avoid a Crash). Begin scoping Tier 2 forward simulation methodology. Validate Tier 1 metrics on the physical research vehicle. Release open-source Tier 1 tools and initial scenario library.

Depth: Develop Tier 2 forward simulation for outcome prediction. Begin Tier 3 counterfactual simulation and baseline comparison methodology. Expand the scenario suite. Continue physical vehicle validation. Propose initial decision-quality metrics to ISO TC 22/SC 33 and SAE.

Maturity: Complete Tier 3 and Tier 4 development. Validate the full four-tier framework across the scenario library and on the physical vehicle. Build operational effectiveness analysis capability. Incorporate mature methodologies into ISO and SAE standards. Release the complete publicly available measurement framework.

The team will contribute to standards throughout: for ISO TC 22/SC 33, proposing decision-quality evaluation metrics extending ISO 34502’s scenario-based safety framework and addressing gaps in ISO 21448 (SOTIF); for SAE, contributing evaluation specifications aligned with J3131 and J3164. Physical testing will be conducted at external collaborator’s facility.

By the end of the project period, the team plans to establish a publicly available measurement framework for AI decision-making systems, develop testbed capabilities bridging simulation and real-world testing, created reference scenario datasets, and expanded collaborative relationships with industry partners and regulatory bodies.

Image
Credit: Credit: Zeid Kootbally/NIST
Created March 24, 2026, Updated April 2, 2026
Was this page helpful?