Skip to main content

NOTICE: Due to a lapse in annual appropriations, most of this website is not being updated. Learn more.

Form submissions will still be accepted but will not receive responses at this time. Sections of this site for programs using non-appropriated funds (such as NVLAP) or those that are excepted from the shutdown (such as CHIPS and NVD) will continue to be updated.

U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

On the Quality of the TREC_COVID IR Test Collections

Published

Author(s)

Ellen M. Voorhees, Kirk Roberts

Abstract

Shared text collections continue to be vital infrastructure for IR research. The COVID-19 pandemic offered an opportunity to create a test collection that captured the rapidly changing information space during a pandemic, and the TREC-COVID effort was created to build such a collection using the TREC framework. This paper examines the quality of the resulting TREC-COVID test collections, and in doing so, offers a critique of the state-of-the-art in building reusable IR test collections. The largest of the collections--called 'TREC-COVID Complete'--is found to be on par with previous TREC ad hoc collections with existing quality tests uncovering no apparent problems. Yet the lack of any way to definitively demonstrate the collection's quality and its violation of previously used quality heuristics suggest much work remains to be done to understand the factors affecting collection quality.
Proceedings Title
Proceedings of ACM Special Interest Group on Information Retrieval (ACM SIGIR 2021)
Conference Dates
July 11-15, 2021
Conference Location
virtual, originally Montreal, CA

Keywords

covid-19, datasets, test collections, TREC

Citation

Voorhees, E. and Roberts, K. (2021), On the Quality of the TREC_COVID IR Test Collections, Proceedings of ACM Special Interest Group on Information Retrieval (ACM SIGIR 2021), virtual, originally Montreal, CA, [online], https://doi.org/10.1145/3404835.3463244, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=932293 (Accessed October 24, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created July 11, 2021, Updated February 14, 2023
Was this page helpful?