The nonparametric two-sample bootstrap is successfully applied to computing the measurement uncertainties in receiver operating characteristic (ROC) analysis on large datasets in areas such as biometrics, speaker recognition system, etc. To determine the number of bootstrap replications in our applications, the bootstrap variability related to standard error and two bounds of 95% confidence interval was studied in a scenario where the statistic of interest was the true accept rate (TAR) of the genuine scores at a specified false accept rate (FAR) of the impostor scores. From the operational perspective, three more scenarios are of interest, in which the statistics are the TAR at a given threshold value, the FAR at a specified threshold value, and the equal error rate, respectively. Regarding the ROC analysis, the area under ROC curve is also of interest. In this article, the bootstrap variability was studied in all these five scenarios concerning both high- and low-accuracy matching algorithms. With the tolerance 0.02 of the coefficient of variation, which can be applied to all cases investigated, it is found that 2000 bootstrap replications are appropriate for ROC analysis on large datasets in order to reduce the bootstrap variance and ensure the accuracy of the computation.
Citation: NIST Interagency/Internal Report (NISTIR) - 7730
NIST Pub Series: NIST Interagency/Internal Report (NISTIR)
Pub Type: NIST Pubs
Bootstrap, variability, ROC analysis, biometrics, speaker recognition, standard error, confidence interval, large datasets