Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

New Report: Challenges to the Monitoring of Deployed AI Systems

As artificial intelligence (AI) systems are increasingly integrated into commercial and government applications, there is a growing demand to monitor these systems in real-world settings. While the concept of monitoring digital systems for quality assurance is not new, particularly in the cases of cybersecurity and software continuous monitoring, it is a vast and fragmented space in the AI sector. Given that AI systems have novel properties that introduce variability and manifest in unpredictable ways, post-deployment monitoring – from incident monitoring to field studies – is a crucial practice for confident, wide-spread AI adoption.

To address this pressing need, in 2025 the Center for AI Standards and Innovation (CAISI) held three practitioner workshops and conducted an in-depth literature review to map the landscape, focusing on current challenges to robust and effective post-deployment monitoring of AI systems.

Our findings are outlined in the new report, NIST AI 800-4: Challenges to the Monitoring of Deployed AI Systems, in which we identify monitoring categories and detail challenges (gaps, barriers, and open questions) to inform and spur future research in the field. The primary contribution of this report is the identification, organization, and documentation of monitoring challenges, and reporting of views expressed by experts in the field.

Six common categories of monitoring, developed via thematic coding, are listed in the table below. See Appendix B of the report for the full methodology, and Appendix C for the associated codebook.

Monitoring Category

Definition

Functionality Monitoring
Does the system continue to work as intended?

Measuring system functions, capabilities, and features to ensure the system works as intended

Operational Monitoring
Does the system maintain consistent service across its infrastructure?

Measuring system infrastructure components, for example to ensure the system maintains consistent levels of service

Human Factors Monitoring
Is the system transparent to humans and high quality?

Measuring human-system interactions, for example to ensure the system produces high-quality outputs and is transparent

Security Monitoring
Is the system secure against attacks and misuse?

Measuring where the system is potentially vulnerable to adversarial attacks and misuse

Compliance Monitoring
Does the system adhere to relevant regulations and directives?

Measuring system components for adherence to relevant laws, regulations, standards, controls, and guidelines

Large-Scale Impacts Monitoring
Does the system promote human flourishing?

Measuring system properties that have wide downstream impacts, for example to ensure the system promotes human flourishing

To manageably synthesize the many challenges reported by practitioners and subject matter experts, we organized the database of workshop quotes and literature excerpts in two ways: (1) by monitoring category, as, for example, some monitoring challenges are more applicable to human factors than security (e.g., overhead of collecting and gauging user feedback), and (2) those challenges that are shared across categories (e.g., poor incident sharing mechanisms). Finally, we sorted open questions on AI system monitoring into “who”, “what”, “when”, “why”, and “how” to monitor.

The table below highlights a sampling of post-deployment monitoring challenges. See the report for the full list.

 

Highlighted Gaps, Barriers, and Open Questions

Category-Specific Challenges

Gaps:

  • Insufficient research on human-AI feedback loops
  • Underexplored methods to detect deceptive behavior
  • Defining metrics for beneficial impacts to humans

Barriers:

  • Detecting performance degradation and drift
  • Fragmented logging across distributed infrastructure
  • Navigating the complexity of the policy landscape

Cross-Cutting Challenges

Gaps:

  • Lack of trusted guidelines or standards for methods and tools
  • Immature information sharing ecosystem

Barriers:

  • Scaling human-driven monitoring alongside rapid rollouts
  • Balancing competitive pressures with necessary oversight
  • Hiring and training qualified AI experts

Open Questions

  • How to reduce monitoring burden on the end user or customer?
  • Should monitoring be based on risk-level? Tailored to the use case?
  • What is the right cadence for monitoring?
  • What is the relationship between monitoring and auditing?
  • How to balance and integrate automated monitoring and human-validated monitoring?

The identified gaps, barriers, and open questions highlight impactful opportunities for further investigation and innovation. The monitoring categories can offer a common language for describing sub-fields within AI system monitoring, and the challenges identified highlight areas where additional solutions are needed.

We welcome your engagement as we evaluate how best to support stakeholders in post-deployment monitoring of AI systems. You can share comments via email to caisi-metrology [at] nist.gov (caisi-metrology[at]nist[dot]gov).

Released March 9, 2026
Was this page helpful?