Skip to main content

NOTICE: Due to a lapse in annual appropriations, most of this website is not being updated. Learn more.

Form submissions will still be accepted but will not receive responses at this time. Sections of this site for programs using non-appropriated funds (such as NVLAP) or those that are excepted from the shutdown (such as CHIPS and NVD) will continue to be updated.

U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

The BETTER Cross-Language Information Retrieval Datasets

Published

Author(s)

Ian Soboroff

Abstract

The IARPA BETTER (Better Extraction from Text Through Enhanced Retrieval) program held three evaluations of information retrieval (IR) and information extraction (IE). For both tasks, the only training data available was in English, but systems had to perform retrieval or extraction from Arabic, Farsi, Chinese, Russian, and Korean. Pooled assessment and information extraction annotation were used to create reusable IR test collections. These datasets are freely available to researchers working in cross-language retrieval, information extraction, or the conjunction of IR and IE. This paper describes the datasets, how they were constructed, and how they might be used by researchers.
Conference Dates
July 23-27, 2023
Conference Location
Tapei, TW
Conference Title
ACM Conference on Research and Development in Information Retrieval (SIGIR 2023)

Keywords

cross-language information retrieval

Citation

Soboroff, I. (2023), The BETTER Cross-Language Information Retrieval Datasets, ACM Conference on Research and Development in Information Retrieval (SIGIR 2023), Tapei, TW, [online], https://doi.org/10.1145/3539618.3591910, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=936449 (Accessed October 1, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created July 27, 2023, Updated October 2, 2023
Was this page helpful?