The Effect of Topic Set Size on Retrieval Experiment Error

Ellen M. Voorhees; C E. Buckley

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

The Effect of Topic Set Size on Retrieval Experiment Error

Published

August 1, 2002

Author(s)

Ellen M. Voorhees, C E. Buckley

Abstract

Retrieval mechanisms are frequently compared by computing the respective average scores for some effectiveness metric across a common set of information needs or topics. Since retrieval system behavior is known to be highly variable across topics, good experimental design requires that a sufficient number of topics be used in the test. This paper uses TREC results to empirically derive error rates based on the number of topics used in a test and the observed difference in the average scores. The error rates quantify the likelihood that a different set of topics of the same size would lead to a different conclusion. We directly compute error rates for topic sets up to size 25, and extrapolate those rates for larger topic set sizes. The error rates found are larger than anticipated, indicating researchers need to take care when concluding one method is better than another, especially if few topics are used.

Proceedings Title

SIGIR Conference

Conference Location

Pub Type

Conferences

Keywords

evaluation, information retrieval, TREC

Citation

Voorhees, E. and Buckley, C. (2002), The Effect of Topic Set Size on Retrieval Experiment Error, SIGIR Conference, CA (Accessed May 15, 2026)

Additional citation formats

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created August 1, 2002, Updated February 17, 2017

Was this page helpful?