Abstract
The IARPA BETTER (Better Extraction from Text Through Enhanced Retrieval) program held three evaluations of information retrieval (IR) and information extraction (IE). For both tasks, the only training data available was in English, but systems had to perform retrieval or extraction from Arabic, Farsi, Chinese, Russian, and Korean. Pooled assessment and information extraction annotation were used to create reusable IR test collections. These datasets are freely available to researchers working in cross-language retrieval, information extraction, or the conjunction of IR and IE. This paper describes the datasets, how they were constructed, and how they might be used by researchers.
Conference Dates
July 23-27, 2023
Conference Location
Tapei, TW
Conference Title
ACM Conference on Research and Development in Information Retrieval (SIGIR 2023)
Keywords
cross-language information retrieval
Citation
Soboroff, I.
(2023),
The BETTER Cross-Language Information Retrieval Datasets, ACM Conference on Research and Development in Information Retrieval (SIGIR 2023), Tapei, TW, [online], https://doi.org/10.1145/3539618.3591910, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=936449 (Accessed April 25, 2026)
Additional citation formats
Issues
If you have any questions about this publication or are having problems accessing it, please contact [email protected].