The BETTER Cross-Language Information Retrieval Datasets
The IARPA BETTER (Better Extraction from Text Through Enhanced Retrieval) program held three evaluations of information retrieval (IR) and information extraction (IE). For both tasks, the only training data available was in English, but systems had to perform retrieval or extraction from Arabic, Farsi, Chinese, Russian, and Korean. Pooled assessment and information extraction annotation were used to create reusable IR test collections. These datasets are freely available to researchers working in cross-language retrieval, information extraction, or the conjunction of IR and IE. This paper describes the datasets, how they were constructed, and how they might be used by researchers.
July 23-27, 2023
ACM Conference on Research and Development in Information Retrieval (SIGIR 2023)
The BETTER Cross-Language Information Retrieval Datasets, ACM Conference on Research and Development in Information Retrieval (SIGIR 2023), Tapei, TW, [online], https://doi.org/10.1145/3539618.3591910, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=936449
(Accessed December 2, 2023)