Skip to main content

NOTICE: Due to a lapse in annual appropriations, most of this website is not being updated. Learn more.

Form submissions will still be accepted but will not receive responses at this time. Sections of this site for programs using non-appropriated funds (such as NVLAP) or those that are excepted from the shutdown (such as CHIPS and NVD) will continue to be updated.

U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness

Published

Author(s)

Ellen M. Voorhees

Abstract

Test collections have traditionally been used by information retrieval researchers to improve their retrieval strategies. To be viable as a laboratory tool, a collection must reliably rank different retrieval variants according to their true effectiveness. In particular, the relative effectiveness of two retrieval strategies should be insensitive to modest changes in the relevant document set since individual relevance assessments are known to vary widely. The test collections developed in the TREC workshops have become the collections of choice in the retrieval research community. To verify their reliability, NIST investigated the effect changes in the relevance assessments have on the evaluation of retrieval results. Very high correlations were found among the rankings of systems produced using different relevance judgment sets. The high correlations indicate that the comparative evaluation of retrieval performance is stable despite substantial differences in relevance judgments, and thus reaffirm the use of the TREC collections as laboratory tools.
Citation
Information Processing and Management
Volume
36 No. 5

Keywords

information retrieval, test collections

Citation

Voorhees, E. (2000), Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness, Information Processing and Management (Accessed October 14, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created January 1, 2000, Updated February 17, 2017
Was this page helpful?