Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

On Run Diversity in "Evaluation as a Service"

Published

Author(s)

Ellen M. Voorhees

Abstract

"Evaluation as a service" (EaaS) is a new methodology that enables community-wide evaluations and the construction of test collections on documents that cannot be distributed. The basic idea is that evaluation organizers provide a service API through which the evaluation task can be completed. This concept, however, violates some of the premises of traditional pool-based collection building, and, as a result, the quality of the resulting test collection may be compromised. In particular, the service API might restrict the diversity of runs that contribute to the pool: not only may this hamper innovation by researchers, but the lack of diversity might lead to incomplete judgment pools that affect the reusability of the collection. This paper shows that the distinctiveness of the retrieval runs used to construct the first test collection built using EaaS, the TREC 2013 Microblog collection, is not substantially different from that of the TREC-8 ad hoc collection, a high-quality collection built using traditional pooling. An additional test of collection reusability, the `leave out uniques' test, suggests the Microblog 2013 collection's pools are less complete than the TREC-8 collection, though both collections strongly benefit from the presence of a set of distinctive and effective manual runs. Although we cannot yet generalize to all EaaS evaluations, our analyses reveal no obvious flaws in the test collection built using the methodology in the TREC 2013 Microblog track.
Proceedings Title
Proceedings of SIGIR 2014
Conference Dates
July 6-11, 2014
Conference Location
Gold Coast
Conference Title
SIGIR 2014

Keywords

information retrieval, test collection, TREC

Citation

Voorhees, E. (2014), On Run Diversity in "Evaluation as a Service", Proceedings of SIGIR 2014, Gold Coast, -1, [online], https://doi.org/10.1145/2600428.2609484 (Accessed October 9, 2024)

Issues

If you have any questions about this publication or are having problems accessing it, please contact reflib@nist.gov.

Created July 6, 2014, Updated November 10, 2018