Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

The BETTER Cross-Language Information Retrieval Datasets



Ian Soboroff


The IARPA BETTER (Better Extraction from Text Through Enhanced Retrieval) program held three evaluations of information retrieval (IR) and information extraction (IE). For both tasks, the only training data available was in English, but systems had to perform retrieval or extraction from Arabic, Farsi, Chinese, Russian, and Korean. Pooled assessment and information extraction annotation were used to create reusable IR test collections. These datasets are freely available to researchers working in cross-language retrieval, information extraction, or the conjunction of IR and IE. This paper describes the datasets, how they were constructed, and how they might be used by researchers.
Conference Dates
July 23-27, 2023
Conference Location
Tapei, TW
Conference Title
ACM Conference on Research and Development in Information Retrieval (SIGIR 2023)


cross-language information retrieval


Soboroff, I. (2023), The BETTER Cross-Language Information Retrieval Datasets, ACM Conference on Research and Development in Information Retrieval (SIGIR 2023), Tapei, TW, [online],, (Accessed June 21, 2024)


If you have any questions about this publication or are having problems accessing it, please contact

Created July 27, 2023, Updated October 2, 2023