Using data streams from acoustic, video, proximity, location, and even physiological sensors, to recognize user intent and respond appropriately is one of the grand challenges to the multimodal research community. We describe our sensor-net middleware, the NIST Meeting Room Recognition Corpus, metrology, and data management tools we provide to the speech, biometric, and multimodal research communities. We present a simple example of audiovisual sensor fusion to determine which of multiple faces in a video stream is speaking to illustrate the operational concepts. We also propose a collaborative development of a data flow application to integrate, engineer, and evaluate future pervasive perceptive interfaces.
Citation: IEEE Pervasive Computing
Pub Type: Journals