Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Explaining poisoned AI models

Published

Author(s)

Peter Bajcsy, Antonio Cardone, Philippe Dessauw, Chenyi Ling, Michael Majurski, Timothy Blattner, Derek Juba, Walid Keyrouz

Abstract

This work presents a hierarchical approach to explaining poisoned artificial intelligence (AI) models. The motivation comes from using AI models in security and safety-critical applications, for instance, AI models for classifying road traffic signs in self-driving cars. Adversaries can poison training images of traffic signs to encode malicious triggers that change the trained AI model prediction from a correct traffic sign to another traffic sign in the presence of such a physically realizable trigger (e.g., with a sticky note or Instagram filter). We also address the lack of AI model explainability by (a) designing utilization measurements of trained AI models and (b) explaining how training data are encoded in AI models based on those measurements at three hierarchical levels. The three levels are defined at graph node (computation unit), subgraph, and graph representations of poisoned and clean AI models from the TrojAI Challenge.
Citation
Bi-directionality in Human-AI Collaborative Systems
Publisher Info
Academic Press, San Diego, CA

Keywords

Explainable AI, Cybersecurity, Adversarial attacks

Citation

Bajcsy, P. , Cardone, A. , Dessauw, P. , Ling, C. , Majurski, M. , Blattner, T. , Juba, D. and Keyrouz, W. (2025), Explaining poisoned AI models, Academic Press, San Diego, CA, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=935784 (Accessed March 25, 2026)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created June 11, 2025, Updated March 4, 2026
Was this page helpful?