The intended applications of automatic face recognition systems include venues that vary widely in demographic diversity. Formal evaluations of algorithms do not commonly consider the effects of population diversity on performance. We document the effects of racial and gender demographics on estimates of the accuracy of algorithms that match identity in pairs of face images. In particular, we focus on the effects of the \background" population distribution of non-matched identities against which identity matches are compared. The algorithm we tested was created by fusing three of the top performers from a recent US Government competition. First, we demonstrate the variability of algorithm performance estimates when the population of non-matched identities was demographically \yoked" by race and/or gender (i.e., \yoking" constrains non-matched pairs to be of the same race or gender). We also report differences in the match threshold required to obtain a false alarm rate of :001 when demographic controls on the non-matched identity pairs varied. In a second experiment, we explored the effect on algorithm performance of progressively increasing population diversity. We found systematic, but non-general, effects when the balance between majority and minority populations of non-matched identities shifted. Third, we show that identity match accuracy differs substantially when the non-match identity population varied by race. Finally, we demonstrate the impact on performance when the non-match distribution consists of faces chosen to resemble a target face. The results from all experiments indicate the importance of the demographic composition and modeling of the background population in predicting the accuracy of face recognition algorithms.
Citation: Image and Vision Computing
Pub Type: Journals
face recognition, algorithm evaluation, demographics