Link to the Information Technology Laboratory Website Link to the Information Access Division Website Link to the NIST Website NIST, IAD Banner

Example of a multimodal application

In this example, we describe an application, which recognizes words after a proper identification of the speaker. This application has real-time constraints. Due to the processing requirements, it is not feasible to have it running on a single computer. So we implement it using several logical blocks (or client nodes), which communicate by exchanging data, and we allocate these client nodes on two computers. The NDFS-II is used to transport the buffers of data encapsulated in flows between client nodes.

Figure 2.1. 

A simple multimodal application composed of four client nodes exchanging data allocated on two hosts.

A simple multimodal application composed of four client nodes exchanging data allocated on two hosts.


Since all the computation and data acquisition in this application may be beyond the ability of a single PC, we used the NDFS-II to transport data between client nodes. In this example client nodes are spread on two hosts. As shown on Figure 1, the entire application is pipelined into four steps.

  • First data from the Microphone array are captured by the client Read Audio Array and made available on the NDFS-II network as a Multichannel Audio flow.

  • The client Beamform Multichannel Audio subscribes to the Multichannel Audio Flow, and applies a beamforming algorithm on them to bear on the speaker. As a result, a Single Audio Channel flow is produced and made available for subscription.

  • The client Recognize Speaker subscribes to the Single Audio Channel flow and performs speaker identification on these data. If a match is found, the client node puts the ID of the speaker in a Speaker ID flow.

  • Finally the client Recognize Words recognizes words as its name suggests. In order to operate properly, it subscribes to both the Speaker ID flow and the Single Audio Channel flow. With these data the client node is able to load and use the trained profile of the speaker to recognize words, which are encapsulated inside the Words flow.

In this application, we use the middleware to transport data between client nodes either within a machine or between machines via the network.

Created on 2008-06-18 by Antoine Fillinger - Last updated on 2008-11-23 by Antoine Fillinger