Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

On Building Fair and Reusable Test Collections using Bandit Techniques

Published

Author(s)

Ellen M. Voorhees

Abstract

While test collections are a vital piece of the research infrastructure for information retrieval, constructing fair, reusable test collections for large data sets is challenging because of the number of human relevance assessments required. Various approaches for minimizing the number of judgments required have been proposed including a suite of methods based on multi-arm bandit optimization techniques. However, most of these approaches look to maximize the total number of relevant documents found, which is not necessarily fair, and they have only been demonstrated in simulation on existing test collections. The TREC 2017 Common Core track provided the opportunity to build a collection de novo using a bandit method. Doing so required addressing two problems not encountered in simulation: giving the human judges time to learn a topic and allocating the overall judgment budget across topics. The resulting modified bandit technique was used to build the 2017 Common Core test collection consisting of approximately 1.8 million news articles, 50 topics, and 30,030 judgments. Unfortunately, the constructed collection is of lower quality than anticipated: a large percentage of the known relevant documents were retrieved by only one team, and for 21 topics, more than a third of the judged documents are relevant. As such the collection is less reusable than desired. Further analysis demonstrates that the greedy approach common to most bandit methods can be unfair even to the runs participating in the collection-building process when the judgment budget is small relative to the (unknown) number of relevant documents.
Proceedings Title
Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM'18)
Conference Dates
October 22-26, 2018
Conference Location
Torino, IT

Keywords

information retrieval evaluation, test collection building, TREC

Citation

Voorhees, E. (2018), On Building Fair and Reusable Test Collections using Bandit Techniques, Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM'18), Torino, IT, [online], https://doi.org/10.1145/3269206.3271766 (Accessed May 7, 2024)
Created October 17, 2018, Updated February 14, 2023