Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

The BETTER Cross-Language Information Retrieval Datasets

Published

Author(s)

Ian Soboroff

Abstract

The IARPA BETTER (Better Extraction from Text Through Enhanced Retrieval) program held three evaluations of information retrieval (IR) and information extraction (IE). For both tasks, the only training data available was in English, but systems had to perform retrieval or extraction from Arabic, Farsi, Chinese, Russian, and Korean. Pooled assessment and information extraction annotation were used to create reusable IR test collections. These datasets are freely available to researchers working in cross-language retrieval, information extraction, or the conjunction of IR and IE. This paper describes the datasets, how they were constructed, and how they might be used by researchers.
Conference Dates
July 23-27, 2023
Conference Location
Tapei, TW
Conference Title
ACM Conference on Research and Development in Information Retrieval (SIGIR 2023)

Keywords

cross-language information retrieval

Citation

Soboroff, I. (2023), The BETTER Cross-Language Information Retrieval Datasets, ACM Conference on Research and Development in Information Retrieval (SIGIR 2023), Tapei, TW, [online], https://doi.org/10.1145/3539618.3591910, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=936449 (Accessed April 27, 2024)
Created July 27, 2023, Updated October 2, 2023