On Building Fair and Reusable Test Collections using Bandit Techniques

Ellen Voorhees

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

On Building Fair and Reusable Test Collections using Bandit Techniques

Published

October 17, 2018

Author(s)

Ellen Voorhees

Abstract

While test collections are a vital piece of the research infrastructure for information retrieval, constructing fair, reusable test collections for large data sets is challenging because of the number of human relevance assessments required. Various approaches for minimizing the number of judgments required have been proposed including a suite of methods based on multi-arm bandit optimization techniques. However, most of these approaches look to maximize the total number of relevant documents found, which is not necessarily fair, and they have only been demonstrated in simulation on existing test collections. The TREC 2017 Common Core track provided the opportunity to build a collection de novo using a bandit method. Doing so required addressing two problems not encountered in simulation: giving the human judges time to learn a topic and allocating the overall judgment budget across topics. The resulting modified bandit technique was used to build the 2017 Common Core test collection consisting of approximately 1.8 million news articles, 50 topics, and 30,030 judgments. Unfortunately, the constructed collection is of lower quality than anticipated: a large percentage of the known relevant documents were retrieved by only one team, and for 21 topics, more than a third of the judged documents are relevant. As such the collection is less reusable than desired. Further analysis demonstrates that the greedy approach common to most bandit methods can be unfair even to the runs participating in the collection-building process when the judgment budget is small relative to the (unknown) number of relevant documents.

Proceedings Title

Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM'18)

Conference Dates

October 22-26, 2018

Conference Location

Torino, IT

Pub Type

Conferences

Download Paper

https://doi.org/10.1145/3269206.3271766

Local Download

Keywords

information retrieval evaluation, test collection building, TREC

Information technology and Information retrieval

Citation

Voorhees, E. (2018), On Building Fair and Reusable Test Collections using Bandit Techniques, Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM'18), Torino, IT, [online], https://doi.org/10.1145/3269206.3271766, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=926509 (Accessed February 25, 2026)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created October 17, 2018, Updated September 29, 2025

Was this page helpful?

On Building Fair and Reusable Test Collections using Bandit Techniques

Author(s)

Abstract

Download Paper

Keywords

Citation

Additional citation formats

Issues