On the Quality of the TREC_COVID IR Test Collections
Ellen M. Voorhees, Kirk Roberts
Shared text collections continue to be vital infrastructure for IR research. The COVID-19 pandemic offered an opportunity to create a test collection that captured the rapidly changing information space during a pandemic, and the TREC-COVID effort was created to build such a collection using the TREC framework. This paper examines the quality of the resulting TREC-COVID test collections, and in doing so, offers a critique of the state-of-the-art in building reusable IR test collections. The largest of the collections--called 'TREC-COVID Complete'--is found to be on par with previous TREC ad hoc collections with existing quality tests uncovering no apparent problems. Yet the lack of any way to definitively demonstrate the collection's quality and its violation of previously used quality heuristics suggest much work remains to be done to understand the factors affecting collection quality.
Proceedings of ACM Special Interest Group on Information Retrieval (ACM SIGIR 2021)
and Roberts, K.
On the Quality of the TREC_COVID IR Test Collections, Proceedings of ACM Special Interest Group on Information Retrieval (ACM SIGIR 2021), virtual, originally Montreal, CA, [online], https://doi.org/10.1145/3404835.3463244, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=932293
(Accessed March 1, 2024)