Baseline Pruning-Based Approach to Trojan Detection in Neural Networks
Peter Bajcsy, Michael Paul Majurski
This paper addresses the problem of detecting trojans in neural networks (NNs) by analyzing how NN accuracy responds to systematic pruning. This study leverages the NN models generated for the TrojAI challenges. Our pruning-based approach (1) detects any deviations from the reference NN models, (2) measures the accuracy of a set of systematically pruned NN models using multiple pruning configurations, and (3) classifies each NN model as clean or poisoned by learning a mapping between accuracy measurements and reference clean or poisoned NN model labels. This work outlines a theoretical and experimental framework for finding the optimal mapping over a large search space of pruning parameters. Based on our experiments using Rounds 1 - 4 TrojAI Challenge datasets, the approach achieves average classification accuracy between 68.51 % and 91.06 %. Reference model graphs and source code are available from GitHub.
Proceedings of the International Conference on Learning Representations (ICLR) 2021, Security and Safety in Machine Learning Systems Workshop
May 3-7, 2021
virtual, MD, US
Security and Safety in Machine Learning Systems Worksho
artificial intelligence, trojan attacks, AI model pruning
and Majurski, M.
Baseline Pruning-Based Approach to Trojan Detection in Neural Networks, Proceedings of the International Conference on Learning Representations (ICLR) 2021, Security and Safety in Machine Learning Systems Workshop, virtual, MD, US
(Accessed November 30, 2023)