Human Preferences as dueling Bandits

Xinyi Yan; Chengxi Luo; Charles Clarke; Nick Craswell; Ellen M. Voorhees; Pablo Castells

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

Human Preferences as dueling Bandits

Published

July 11, 2022

Author(s)

Xinyi Yan, Chengxi Luo, Charles Clarke, Nick Craswell, Ellen M. Voorhees, Pablo Castells

Abstract

The dramatic improvements in core information retrieval tasks engendered by neural rankers create a need for novel evaluation methods. If every ranker returns highly relevant items in the top ranks, it becomes difficult to recognize meaningful differences between them and to build reusable test collections. Several recent papers explore pairwise preference judgments as an alternative to traditional graded relevance assessments. Rather than viewing items one at a time, assessors view items side-by-side and indicate the one that provides the better response to a query, allowing fine-grained distinctions. If we employ preference judgments to identify the probably best items for each query, we can measure rankers by their ability to place these items as high as possible. We frame the problem of finding best items as a dueling bandits problem. While many papers explore dueling bandits for online ranker evaluation via interleaving, they have not been considered as a framework for offline evaluation via human preference judgments. We review the literature for possible solutions. For human preference judgments, any usable algorithm must tolerate ties, since two items may appear nearly equal to assessors, and it must minimize the number of judgments required for any specific pair, since each such comparison requires an independent assessor. Since the theoretical guarantees provided by most algorithms depend on assumptions that are not satisfied by human preference judgments, we simulate selected algorithms on representative test cases to provide insight into their practical utility. Based on these simulations, one algorithm stands out for its potential. Our simulations suggest modifications to further improve its performance. Using the modified algorithm, we collect over 10,000 preference judgments for pools derived from submissions to the TREC 2021 Deep Learning Track, confirming its suitability. We test the idea of best-item evaluation and suggest ideas for further theoretical and practical progress.

Proceedings Title

Proceedings of ACM SIGIR 2022

Conference Dates

July 11-15, 2022

Conference Location

Madrid, ES

Conference Title

ACM Special Interest Group on Information Retrieval 2022

Pub Type

Conferences

Download Paper

https://doi.org/10.1145/3477495.3531991

Local Download

Keywords

offline evaluation, preferences, dueling bandits

Information retrieval

Citation

Yan, X. , Luo, C. , Clarke, C. , Craswell, N. , Voorhees, E. and Castells, P. (2022), Human Preferences as dueling Bandits, Proceedings of ACM SIGIR 2022, Madrid, ES, [online], https://doi.org/10.1145/3477495.3531991, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=934631 (Accessed July 30, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created July 11, 2022, Updated February 14, 2023

Was this page helpful?

Human Preferences as dueling Bandits

Author(s)

Abstract

Download Paper

Keywords

Citation

Additional citation formats

Issues