The goal of the IARPA MATERIAL (Machine Translation for English Retrieval of Information in Any Language) program is to develop methods to locate text and speech content in “documents” (speech or text) in low-resource languages using domain-contextualized English queries, and to display a summary in English of the information of interest in the relevant documents. This capability is expected to enable effective triage and analysis of large volumes of data, and to do so in a way that takes into account an analyst’s domains of interest in a variety of less studied languages. The program will require that the capability be constructed using limited amounts of ground truth bitext data and no domain adaptation data. Successful systems will be able to adapt to new domains and new genres.
The queries will be in English, the material to be searched will be in different languages, and the summaries must be displayed in English. It should be noted that in real-world use, the output from the system would represent documents from multiple languages, mingled in one output “queue.”
A summary could be a word-cloud, an extractive summary, or an abstractive summary. The summary will be required to be formatted as static text, possibly with multiple colors, sizes, and spatial alignments and orientations, but with no animations, and no lines or arrows or other graphic elements. The central requirement is that the summary must suffice for the user to judge the relevance of the retrieved items to the domain-contextualized query. Research done under MATERIAL will need to include work on effective summarization.
A central aspect of a MATERIAL system is that an actual information need will be within a context characterized by domains of interest. An example is seeking information about Ebola, but only in the context of epidemiology. Another example is wheat, in the context of agriculture, vs. wheat in the context of nutrition and food availability, vs. wheat in the context of cultural norms of what the population in some location normally chooses to eat. MATERIAL will systematically address this association of context with an information need (which we will call a query in a domain).