The NDFS-II has been designed to support applications requiring high data transfer rates. It takes advantage of gigabyte networks and avoids redundant copying of data to minimize the bandwidth and thus speed up the communication between client nodes. A goal at the beginning of the project was to transport the data as efficiently as possible. In order to optimize the transport, we introduce the concept of duplicators. Duplicators are stand-alone programs handling flow within hosts. There is one duplicator per instance of flow per host.
The figure above describes a simple NDFS-II application composed of four client nodes allocated to two hosts. There is one producing client node feeding with data three consumers. The producer and one consumer run on the same host and the two remaining consumers run on an other one. On each host, a duplicator handling the flow manages the shared memory used to store data blocks and communicates with other remote duplicators if necessary.
The data is transported as follows. The producer writes a data block in the shared memory and notifies the duplicator that data are available for reading in the shared memory. The duplicator then notifies Consumer 1 running on the same host that data are available for reading in the shared memory and also sends the data block to the remote duplicator, which copies the data in the shared memory once received and notifies its consumers (Consumer 3 and Consumer 4) that data are available for reading. Once the duplicator on host A received the confirmation from Consumer 1 that it has read the data block, and also the confirmation from the duplicator on host B that every consumer on its host has also read the data, the duplicator on host A notifies the Provider that it can write a new data block in the shard memory.
Introducing the duplicator concept has several advantages. It allows concurrent reading of the shared memory within a host and avoids multiple transfers of the same data blocks via the network when several consumers are connected to the same flow on a remote host.
When a producer provides a data block to the system, this block is not sent directly but instead is en-queued in the internal flow queue of the client node. A separate thread running in the client node de-queues the data block in the background and passes it to the duplicator. Symmetrically for consumers, a thread gets data blocks from the shared memory on the duplicator signal and en-queues it in the internal flow queue. This method of transporting data avoids maintaining a reference count at the producer level and transports the data as fast as possible independently of consumers requests; therefore, when a consumer requests a new data block it could already be in the internal flow queue and available immediately. The behavior of the flow queue is customizable allowing blocking or non-blocking flows.
Created on 2008-06-18 by Antoine Fillinger - Last updated on 2008-11-23 by Antoine Fillinger