October 14, 2025
Author(s)
Divyam Goel, Yufei Wang, Tiancheng Wu, Guixiu Qiao, Pavlo Piliptchak, David Held, Zackory Erickson
Standard evaluation protocols in robotic manipulation typically assess policy performance over curated, in-distribution test sets, offering limited insight into how systems fail under plausible variation. We introduce a red-teaming framework that probes