Too many Relevants: Whither Cranfield Test Collections?

Ellen M. Voorhees; Nick Craswell; Jimmy Lin

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

Too many Relevants: Whither Cranfield Test Collections?

Published

July 11, 2022

Author(s)

Ellen M. Voorhees, Nick Craswell, Jimmy Lin

Abstract

This paper presents the lessons regarding the construction and use of large Cranfield-style test collections learned from the TREC 2021 Deep Learning track. The corpus used in the 2021 edition of the track was much bigger than the corpus used in previous years and contains many more relevant documents. The process used to select documents to judge that had been used in earlier years of the track failed to produce a reliable collection because most topics have too many relevant documents. Judgment budgets were exceeded before an adequate sample of the relevant set could be found, so there are likely many unknown relevant documents in the unjudged portion of the corpus. As a result, the collection is not reusable, and furthermore, recall-based measures are unreliable, even for the retrieval system results used in building it. Yet, early-precision measures cannot distinguish among system results because the maximum score is easily obtained for many topics. And since the existing tools for appraising the quality of test collections depend on systems' scores, they also fail when there are too many relevant documents. Collection builders will need new strategies and tools for building reliable test collections for continued use of the Cranfield paradigm on ever-larger corpora. Ensuring that the definition of 'relevant' truly reflects the desired systems' rankings is a provisional strategy for continued collection building.

Proceedings Title

Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference Dates

July 11-15, 2022

Conference Location

Madrid, ES

Conference Title

45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022)

Pub Type

Conferences

Download Paper

https://doi.org/10.1145/3477495.3531728

Local Download

Keywords

Cranfield, information retrieval, reusability, score saturation, test collections, TREC

Information retrieval

Citation

Voorhees, E. , Craswell, N. and Lin, J. (2022), Too many Relevants: Whither Cranfield Test Collections?, Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, ES, [online], https://doi.org/10.1145/3477495.3531728, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=934359 (Accessed October 14, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created July 11, 2022, Updated February 14, 2023

Was this page helpful?

Too many Relevants: Whither Cranfield Test Collections?

Author(s)

Abstract

Download Paper

Keywords

Citation

Additional citation formats

Issues