The National Institute of Standards and Technology (NIST) has been conducting an ongoing series of Speaker Recognition Evaluations (SRE). Speaker detection performance is measured using a detection cost function defined as a weighted sum of the probabilities of type I error and of type II error. The sampling variability can result in measurement uncertainties. Thus, the uncertainties of the detection cost functions must be taken into consideration in SRE. In our prior study, the data independence was assumed while applying the nonparametric two-sample bootstrap methods based on our extensive bootstrap variability studies on large datasets to compute the standard errors (SE) of detection cost functions. In this article, the data dependency caused by multiple usages of the same subjects is taken into account. Hence, the data are grouped into target sets and non-target sets, and each set contains multiple scores. One-layer and two-layer bootstrap methods are proposed based on whether the two-sample bootstrap resampling takes place only on target sets and non-target sets, respectively, or subsequently on target scores and non-target scores in the sets. The SEs of the detection cost function using these two methods along with those with the assumption of data independency are compared. It is found that taking account of the data dependency increases the estimated SEs. Thus, in order to obtain more accurate measures in SRE, the data should be sampled as randomly as possible. Based on our research, some suggestions regarding the test design are provided.
Citation: NIST Interagency/Internal Report (NISTIR) - 7810Report Number:
NIST Pub Series: NIST Interagency/Internal Report (NISTIR)
Pub Type: NIST Pubs
Speaker recognition evaluation, Biometrics, Bootstrap, Uncertainty, Standard error, Confidence interval, Data dependency.