Publications

Displaying 1 - 25 of 86

LLM-Assisted Relevance Assessments

July 13, 2025

Author(s)

Rikiya Takehi, Ellen Voorhees, Tetsuya Sakai, Ian Soboroff

Test collections are information retrieval tools that allow researchers to quickly and easily evaluate ranking algorithms. While test col- lections have become an integral part of IR research, the process of data creation involves significant efforts of

Human Preferences as dueling Bandits

July 11, 2022

Author(s)

Xinyi Yan, Chengxi Luo, Charles Clarke, Nick Craswell, Ellen M. Voorhees, Pablo Castells

The dramatic improvements in core information retrieval tasks engendered by neural rankers create a need for novel evaluation methods. If every ranker returns highly relevant items in the top ranks, it becomes difficult to recognize meaningful differences

Too many Relevants: Whither Cranfield Test Collections?

July 11, 2022

Author(s)

Ellen M. Voorhees, Nick Craswell, Jimmy Lin

This paper presents the lessons regarding the construction and use of large Cranfield-style test collections learned from the TREC 2021 Deep Learning track. The corpus used in the 2021 edition of the track was much bigger than the corpus used in previous

Can Old TREC Collections Reliably Evaluate Modern Neural Retrieval Models?

January 26, 2022

Author(s)

Ellen M. Voorhees, Ian Soboroff, Jimmy Lin

Neural retrieval models are generally regarded as fundamentally different from the retrieval techniques used in the late 1990's when the TREC ad hoc test collections were constructed. They thus provide the opportunity to empirically test the claim that poo

Searching for Answers in a Pandemic: An Overview of TREC-COVID

September 1, 2021

Author(s)

Ellen Voorhees, Ian Soboroff, Kirk Roberts, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, Kyle Lo, Lucy L. Wang, William Hersh

We present an overview of the TREC-COVID Challenge, an information retrieval (IR) shared task to evaluate search on scientific literature related to COVID-19. The goals of TREC-COVID include the construction of a pandemic search test collection and the

On the Quality of the TREC_COVID IR Test Collections

July 11, 2021

Author(s)

Ellen M. Voorhees, Kirk Roberts

Shared text collections continue to be vital infrastructure for IR research. The COVID-19 pandemic offered an opportunity to create a test collection that captured the rapidly changing information space during a pandemic, and the TREC-COVID effort was

TREC Deep Learning Track: Reusable Test Collections in the Large Data Regime

July 11, 2021

Author(s)

Ellen M. Voorhees, Ian Soboroff, Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos

The TREC Deep Learning (DL) Track studies ad hoc search in the large data regime, meaning that a large set of human-labeled training data is available. Results so far indicate that the best models with large data are likely deep neural networks. This paper

System Explanations: A Cautionary Tale

May 8, 2021

Author(s)

Ellen M. Voorhees

There are increasing calls for systems that are able to explain themselves to their end users to increase transparency and help engender trust. But, what should such explanations contain, and how should that information be presented? A pilot study of

TREC-COVID: Constructing a Pandemic Information Retrieval Test Collection

February 19, 2021

Author(s)

Ellen Voorhees, Ian Soboroff, Tasmeer Alam, William Hersh, Kirk Roberts, Dina Demner-Fushman, Kyle Lo, Lucy L. Wang, Steven Bedrick

TREC-COVID is a community evaluation designed to build a test collection that captures the information needs of biomedical researchers using the scientific literature during a pandemic. One of the key characteristics of pandemic search is the accelerated

Overview of the TREC 2019 Deep Learning Track

July 27, 2020

Author(s)

Ellen Voorhees, Nick Craswell, Bhaskar Mitra, Daniel Campos, Emine Yilmaz

The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime. It is the first track with large human-labeled training sets, introducing two sets corresponding to two tasks, each with rigorous TREC

TREC-COVID: Rationale and Structure of an Information Retrieval Shared Task for COVID-19

July 8, 2020

Author(s)

Ellen Voorhees, Ian Soboroff, Tasmeer Alam, Kirk Roberts, William Hersh, Dina Demner-Fushman, Steven Bedrick, Kyle Lo, Lucy L. Wang

TREC-COVID is an information retrieval (IR) shared task initiated to support clinicians and clinical research during the COVID-19 pandemic. IR for pandemics breaks many normal assumptions, which can be seen by examining nine important basic IR research

The Evolution of Cranfield

August 14, 2019

Author(s)

Ellen Voorhees

This chapter examines how the test collection paradigm, the dominant evaluation methodology in information retrieval, has been adapted to meet the changing requirements for information retrieval research in the era of community evaluation conferences such

On Building Fair and Reusable Test Collections using Bandit Techniques

October 17, 2018

Author(s)

Ellen Voorhees

While test collections are a vital piece of the research infrastructure for information retrieval, constructing fair, reusable test collections for large data sets is challenging because of the number of human relevance assessments required. Various

Using Replicates in Information Retrieval Evaluation

August 2, 2017

Author(s)

Ellen M. Voorhees, Daniel V. Samarov, Ian M. Soboroff

This paper explores a method for more accurately estimating the main effect of the system in a typical test-collection-based evaluation of information retrieval systems, and thus increasing the sensitivity of system comparisons. Randomly partitioning the

Promoting Repeatability Through Open Runs

June 7, 2016

Author(s)

Ellen M. Voorhees, Shahzad K. Rajput, Ian M. Soboroff

TREC 2015 introduced the concept of Open Runs in response to the increasing focus on repeatability of information retrieval experiments. An Open Run is a TREC submission backed by a software repository such that the software in the repository reproduces

On the Behavior of PRES Using Incomplete Judgment Sets

September 30, 2015

Author(s)

Ellen M. Voorhees

PRES, the Patent Retrieval Evaluation Score, is a family of retrieval system evaluation measures that combines recall and user effort to better reflect the quality of a retrieval run with respect to recall-oriented search tasks. Previous analysis of the

Overview of the TREC 2014 Clinical Decision Support Track

April 22, 2015

Author(s)

Ellen M. Voorhees

The Text REtrieval Conference (TREC) Clinical Decision Support Track fosters research on systems that link electronic health records with information that is relevant for patient care. TREC 2014 is the initial year of the track. The focus of the first year

The Twenty-Second Text REtrieval Conference Proceedings (TREC 2013)

October 24, 2014

Author(s)

Ellen M. Voorhees

On Run Diversity in "Evaluation as a Service"

July 6, 2014

Author(s)

Ellen M. Voorhees

"Evaluation as a service" (EaaS) is a new methodology that enables community-wide evaluations and the construction of test collections on documents that cannot be distributed. The basic idea is that evaluation organizers provide a service API through which

The Effect of Sampling Strategy on Inferred Measures

July 6, 2014

Author(s)

Ellen M. Voorhees

Using the inferred measures framework is a popular choice for constructing test collections when the target document set is too large for pooling to be a viable option. Within the framework, different amounts of assessing effort is placed on different

Building Better Search Engines by Measuring Search Quality

March 3, 2014

Author(s)

Ellen M. Voorhees, Paul D. Over, Ian Soboroff

Search engines help users locate particular information within large stores of content developed for human consumption. For example, users expect web search engines to direct searchers to web sites based on the content of the site rather than the site

The TREC Medical Records Track

September 25, 2013

Author(s)

Ellen M. Voorhees

The Text REtrieval Conference (TREC) is a series of annual workshops designed to build the infrastructure for large-scale evaluation of search systems and thus improve the state-of-the-art. Each workshop is organized around a set "tracks", challenge

Overview of the TREC 2012 Medical Records Track

June 28, 2013

Author(s)

Ellen M. Voorhees, William Hersh

The TREC Medical Records track fosters research that allows electronic health records to be retrieved based on the semantic content of free-text fields. The ability to find records by matching semantic content will enhance clinical care and support the

Text Retrieval Conference (TREC)

December 9, 2009

Author(s)

Ellen M. Voorhees

This article summarizes the history, results, and impact of the Text Retrieval Conference (TREC), a workshop series designed to support the information retrieval community by building the infrastructure necessary for large-scale evaluation of retrieval

I Come Not To Bury Cranfield, but to Praise It

October 26, 2009

Author(s)

Ellen M. Voorhees

Search Publications by: Ellen M. Voorhees (Assoc)