Computing confidence intervals for common IR measures

Published: December 09, 2014


Ian M. Soboroff


Confidence intervals quantify the uncertainty in an average and o↵er a robust alternative to hypothesis testing. We measure the performance of standard and bootstrapped con- fidence intervals on a number of common IR measures using several TREC and NTCIR collections. The performance of an interval is its empirical coverage of the estimated statistic. We find that both standard and bootstrapped intervals give excellent coverage for all measures except in situations of abysmal retrieval performance. We recommend using stan- dard confidence intervals when statistical software is handy, and bootstrap percentile intervals as equivalent when no sta- tistical libraries are available.
Proceedings Title: Proceedings of the Workshop on Evaluation for Information Access (EVIA 2014)
Conference Dates: December 8, 2014
Conference Location: Tokyo, -1
Pub Type: Conferences

Download Paper


information retrieval, statistics
Created December 09, 2014, Updated February 19, 2017