Baseline Pruning-Based Approach to Trojan Detection in Neural Networks

Peter Bajcsy; Michael Paul Majurski

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

Baseline Pruning-Based Approach to Trojan Detection in Neural Networks

Published

May 7, 2021

Author(s)

Peter Bajcsy, Michael Paul Majurski

Abstract

This paper addresses the problem of detecting trojans in neural networks (NNs) by analyzing how NN accuracy responds to systematic pruning. This study leverages the NN models generated for the TrojAI challenges. Our pruning-based approach (1) detects any deviations from the reference NN models, (2) measures the accuracy of a set of systematically pruned NN models using multiple pruning configurations, and (3) classifies each NN model as clean or poisoned by learning a mapping between accuracy measurements and reference clean or poisoned NN model labels. This work outlines a theoretical and experimental framework for finding the optimal mapping over a large search space of pruning parameters. Based on our experiments using Rounds 1 - 4 TrojAI Challenge datasets, the approach achieves average classification accuracy between 68.51 % and 91.06 %. Reference model graphs and source code are available from GitHub.

Proceedings Title

Proceedings of the International Conference on Learning Representations (ICLR) 2021, Security and Safety in Machine Learning Systems Workshop

Conference Dates

May 3-7, 2021

Conference Location

virtual, MD, US

Conference Title

Security and Safety in Machine Learning Systems Worksho

Pub Type

Conferences

Keywords

artificial intelligence, trojan attacks, AI model pruning

Data and informatics, Computational science and Artificial intelligence

Citation

Bajcsy, P. and Majurski, M. (2021), Baseline Pruning-Based Approach to Trojan Detection in Neural Networks, Proceedings of the International Conference on Learning Representations (ICLR) 2021, Security and Safety in Machine Learning Systems Workshop, virtual, MD, US (Accessed July 11, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created May 7, 2021, Updated January 6, 2023

Was this page helpful?

Baseline Pruning-Based Approach to Trojan Detection in Neural Networks

Author(s)

Abstract

Keywords

Citation

Additional citation formats

Issues