Evaluating Evaluation Measure Stability

C E. Buckley; Ellen M. Voorhees

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

Evaluating Evaluation Measure Stability

Published

July 1, 2000

Author(s)

C E. Buckley, Ellen M. Voorhees

Abstract

This paper presents a novel way of examining the accuracy of the evaluation measures commonly used in information retrieval experiments. It validates several of the rules-of-thumb experimenters use, such as the number of queries needed for a good experiment is at least 25 and 50 is better, while challenging other beliefs, such as the common evaluationmeasures are equally reliable. As an example, we show that Precision at 30 documents has about twice the average errorrate as Average Precision has. These results can help information retrieval researchers design experiments that provide a desired level of confidence in their results. In particular, we suggest researchers using Web measures such as Precision at 10 documents will need to use many more than 50 queries or will have to require two methods to have a very largedifference in evaluation scores before concluding that the two methods are actually different.

Proceedings Title

Research and Development in Information Retrieval; SIGIR | 23rd | | Association for Computing Machinery

Conference Dates

July 1, 2000

Conference Location

Athens, 1, GR

Conference Title

Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Type

Conferences

Keywords

evaluation measure, information retrieval, search engines

Citation

Buckley, C. and Voorhees, E. (2000), Evaluating Evaluation Measure Stability, Research and Development in Information Retrieval; SIGIR | 23rd | | Association for Computing Machinery, Athens, 1, GR (Accessed February 26, 2026)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created June 30, 2000, Updated October 12, 2021

Was this page helpful?

Evaluating Evaluation Measure Stability

Author(s)

Abstract

Keywords

Citation

Additional citation formats

Issues