The broadcast news benchmark tests have potential as a source of ideas for improving continuous speech recognition systems. This paper presents a data analysis method for uncovering such ideas and applies the method to the 1996 and 1997 DARPA CSR Hub-4 results. The method is based on a latent variables model instead of a more familiar regression model. The method identifies certain portions of the test material that result in wide performance differences among system. Such portions, because some systems could handle them and others could not, are worth thinking about in terms of what system features lead to the performance differences. Identification of specific system differences that are responsible for performance differences may lead to system improvements.
February 8-11, 1998
DARPA Broadcast News Transcription and Understanding Workshop