The BETTER Cross-Language Information Retrieval Datasets

Ian Soboroff

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

The BETTER Cross-Language Information Retrieval Datasets

Published

July 27, 2023

Author(s)

Ian Soboroff

Abstract

The IARPA BETTER (Better Extraction from Text Through Enhanced Retrieval) program held three evaluations of information retrieval (IR) and information extraction (IE). For both tasks, the only training data available was in English, but systems had to perform retrieval or extraction from Arabic, Farsi, Chinese, Russian, and Korean. Pooled assessment and information extraction annotation were used to create reusable IR test collections. These datasets are freely available to researchers working in cross-language retrieval, information extraction, or the conjunction of IR and IE. This paper describes the datasets, how they were constructed, and how they might be used by researchers.

Conference Dates

July 23-27, 2023

Conference Location

Tapei, TW

Conference Title

ACM Conference on Research and Development in Information Retrieval (SIGIR 2023)

Pub Type

Conferences

Download Paper

https://doi.org/10.1145/3539618.3591910

Local Download

Keywords

cross-language information retrieval

Natural language processing, Information retrieval and AI measurement and evaluation

Citation

Soboroff, I. (2023), The BETTER Cross-Language Information Retrieval Datasets, ACM Conference on Research and Development in Information Retrieval (SIGIR 2023), Tapei, TW, [online], https://doi.org/10.1145/3539618.3591910, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=936449 (Accessed July 23, 2026)

Additional citation formats

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created July 27, 2023, Updated October 2, 2023

Was this page helpful?