ROC analysis involving two large datasets is an important method for analyzing statistics of interest for decision making of a classifier in many disciplines. And data dependency due to multiple use of the same subjects exists ubiquitously in order to generate more samples because of limited resources. Hence, a two-layer data structure is constructed and the nonparametric two-sample two-layer bootstrap is employed to estimate standard errors of statistics of interest derived from two sets of data, such as a weighted sum of two probabilities. In this article, to reduce the bootstrap variance and ensure the accuracy of computation, Monte Carlo studies of bootstrap variability were carried out to determine the appropriate number of bootstrap replications in ROC analysis with data dependency. It is suggested that with a tolerance 0.02 of the coefficient of variation, 2,000 bootstrap replications be appropriate under such circumstances.
Communications in Statistics Part B-Simulation and Computation
Bootstrap variability, Bootstrap replications, ROC analysis, Data dependency, Large datasets, Standard error