Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Information Technology Laboratory

This page is no longer being updated and the information may be out of date.

6.3 Fault-Tolerant Cloud Group

****WORKING DOCUMENT****

6.3 Fault-Tolerant Cloud Group

Actors: cloud-subscriber, cloud-provider-1, cloud-provider-2, cloud-provider-n

Goals: Synthesize a highly-reliable service using the facilities of multiple cloud-providers.

Assumptions: Assume that a cloud-subscriber has already opened accounts with N cloud-providers (See Use case "Open An Account"). We also assume that when comparisons of data or output results from the N cloud-providers is made, a majority of the data or results will be found to be equivalent. Also, the metadata about data objects includes time stamps or sequence numbers.

Success Scenario 1 (write data, IaaS, PaaS): The cloud-subscriber attempts to copy a data object onto all N of the cloud-providers using the data object APIs that each cloud-provider publishes (See Use Case "Copy Data Objects Into A Cloud"). Each cloud-provider returns a message indicating whether or not the copy operation succeeded. The cloud-subscriber records the number of successes M. If M < N, the cloud-subscriber may re-issue the request or evaluate whether or not the data has been stored with sufficient redundancy. If not, the cloud-subscriber may optionally open accounts with new cloud-providers.

Success Scenario 2 (read data, IaaS, PaaS): Assume the cloud-subscriber issues a number K of concurrent object read requests using the data object APIs that each cloud-provider publishes. The cloud-subscriber will choose K to be large enough so that at least one of the responses from the responding cloud-providers will contain data from the object's most recent update. The cloud-subscriber compares responses from the responding cloud-providers, and chooses the response representing the latest version of the object.

Success Scenario 3 (redundant batch jobs, IaaS, PaaS): The cloud-subscriber starts a processing job on each of the N cloud-providers (e.g., See Use Case "VM Control: Manage Virtual Machine Instances"). Each cloud-provider runs exactly the same job, on the same input data, and produces output data. The cloud-subscriber retrieves the output data from the first-completing cloud-provider, checksums it, and then checksums the output subsequently returning cloud-providers, comparing each for equality. If any of the equality checks fail, the cloud-subscriber can rerun the job, perhaps allocating it onto a different set of cloud-providers, or simply take a majority vote and consider that the result.

Success Scenario 4 (state machine replication, IaaS, PaaS): The cloud-subscriber starts a long-running server process in each of the N cloud-providers. Iteratively, the cloud-subscriber sends a service request to each server process in the N cloud-providers, receives each server's results, and compares the results. If the comparisons do not show equality, the cloud-subscriber re-initializes servers that are determined to have failed by perhaps migrating to new cloud-providers. If a server has failed to respond to requests for a timeout period, the cloud-subscriber reinitializes the server, bringing it up to the state of the others.

Failure Conditions: The requested action or process performed at one or more of the N cloud-providers fails or produces incorrect returning data to cloud-subscriber.

Failure Handling: Cloud-subscriber either reinitiates the requested action, or considers performing the action with new cloud-provider(s)

Requirements File:

Credit: Note: there is a lot of literature on how to implement replication in network services using protocols such as two-phase-commit or quorum-consensus or timestamps or transactions; this is just a sketch. One good source of information on how to compare results (termed "voting") can be found in the n-version programming literature.

Information technology and Cybersecurity

Created November 2, 2010, Updated August 12, 2025

Was this page helpful?