NIST logo

Publication Citation: Further Studies of Bootstrap Variability for ROC Analysis on Large Datasets

NIST Authors in Bold

Author(s): Jin Chu Wu; Alvin Martin; Raghu N. Kacker;
Title: Further Studies of Bootstrap Variability for ROC Analysis on Large Datasets
Published: October 11, 2010
Abstract: The nonparametric two-sample bootstrap is successfully applied to computing the measurement uncertainties in receiver operating characteristic (ROC) analysis on large datasets in areas such as biometrics, speaker recognition system, etc. To determine the number of bootstrap replications in our applications, the bootstrap variability related to standard error and two bounds of 95% confidence interval was studied in a scenario where the statistic of interest was the true accept rate (TAR) of the genuine scores at a specified false accept rate (FAR) of the impostor scores. From the operational perspective, three more scenarios are of interest, in which the statistics are the TAR at a given threshold value, the FAR at a specified threshold value, and the equal error rate, respectively. Regarding the ROC analysis, the area under ROC curve is also of interest. In this article, the bootstrap variability was studied in all these five scenarios concerning both high- and low-accuracy matching algorithms. With the tolerance 0.02 of the coefficient of variation, which can be applied to all cases investigated, it is found that 2000 bootstrap replications are appropriate for ROC analysis on large datasets in order to reduce the bootstrap variance and ensure the accuracy of the computation.
Citation: NIST Interagency/Internal Report (NISTIR) - 7730
Pages: 27 pp.
Keywords: Bootstrap; variability; ROC analysis; biometrics; speaker recognition; standard error; confidence interval; large datasets
Research Areas: Data and Informatics, Information Technology, Statistics
PDF version: PDF Document Click here to retrieve PDF version of paper (566KB)