Explaining poisoned AI models

Peter Bajcsy; Antonio Cardone; Philippe Dessauw; Chenyi Ling; Michael Majurski; Timothy Blattner; Derek Juba; Walid Keyrouz

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

Explaining poisoned AI models

Published

June 11, 2025

Author(s)

Peter Bajcsy, Antonio Cardone, Philippe Dessauw, Chenyi Ling, Michael Majurski, Timothy Blattner, Derek Juba, Walid Keyrouz

Abstract

This work presents a hierarchical approach to explaining poisoned artificial intelligence (AI) models. The motivation comes from using AI models in security and safety-critical applications, for instance, AI models for classifying road traffic signs in self-driving cars. Adversaries can poison training images of traffic signs to encode malicious triggers that change the trained AI model prediction from a correct traffic sign to another traffic sign in the presence of such a physically realizable trigger (e.g., with a sticky note or Instagram filter). We also address the lack of AI model explainability by (a) designing utilization measurements of trained AI models and (b) explaining how training data are encoded in AI models based on those measurements at three hierarchical levels. The three levels are defined at graph node (computation unit), subgraph, and graph representations of poisoned and clean AI models from the TrojAI Challenge.

Citation

Bi-directionality in Human-AI Collaborative Systems

Publisher Info

Academic Press, San Diego, CA

Pub Type

Books

Download Paper

Local Download

Keywords

Explainable AI, Cybersecurity, Adversarial attacks

Image and signal processing, Computational science, Complex systems and Artificial intelligence

Citation

Bajcsy, P. , Cardone, A. , Dessauw, P. , Ling, C. , Majurski, M. , Blattner, T. , Juba, D. and Keyrouz, W. (2025), Explaining poisoned AI models, Academic Press, San Diego, CA, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=935784 (Accessed May 27, 2026)

Additional citation formats

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created June 11, 2025, Updated March 4, 2026

Was this page helpful?