ITL Researchers Document Speech Recognition Benchmark Tests

ITL's Spoken Natural Language Processing Group presented the results of several recent benchmark tests involving automatic speech recognition (ASR) at the recent DARPA Broadcast News Workshop. Held in Washington, D.C., Feb. 28 - Mar. 3, 1999, the workshop brought together approximately 125 researchers to focus on the development of automatic technologies to access broadcast news. ITL presented four technical papers at the workshop, documenting recent benchmark tests implemented by the group.

The first presentation reported on the research community's success in automatic transcription of a three-hour broadcast news test set prepared by the NIST group. Researchers at IBM T.J. Watson Laboratories reported the lowest word error rate,

13.5 percent. A system that ran in less than ten times real time (considerably faster than the IBM system), that was developed by researchers at Cambridge University's Engineering Department in collaboration with Entropics Ltd., achieved a word error rate of 16.1 percent. A second paper reported on the success of researchers in "tagging" information carrying expressions (e.g., names, dates/times, and numbers) in the broadcast news transcriptions generated by ASR systems, a first step toward information extraction. Word error rates are approximately 20 percentage points higher for these expressions. The third presentation, involving processing a corpus of some 54,000 stories collected over a six-month period, reported on the results of research in Topic Detection and Tracking. The fourth presentation summarized the results of Spoken Document Retrieval

studies reported at the NIST-sponsored Text REtrieval Conference (TREC-7) held at NIST in November 1998.

The NIST-developed Recognizer Output Voting Error Reduction (ROVER) software was incorporated in five of the nine systems developed by participants in the primary systems comparisons. The ROVER software incorporates a sub-optimal high dimensional string alignment process to create a network and then implements a "voting" process to select a one-best hypothesis string. Using the ROVER approach to re-scoring the hypothesis files submitted by the test participants, NIST researchers demonstrated an overall word error rate of 10.6 percent--21 percent lower than the lowest word error rate achieved by the test participants.

The Spoken Natural Language Processing Group has provided benchmark test material by which the development of ASR technology can be monitored to the ASR research community since 1987. Participation includes DARPA and other Department of Defense-sponsored researchers as well as others from the international research community.

CONTACT: David Pallett, ext. 2935