Skip to main content

NOTICE: Due to a lapse in annual appropriations, most of this website is not being updated. Learn more.

Form submissions will still be accepted but will not receive responses at this time. Sections of this site for programs using non-appropriated funds (such as NVLAP) or those that are excepted from the shutdown (such as CHIPS and NVD) will continue to be updated.

U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Dynamic Job Replication for Balancing Fault Tolerance, Latency, and Economic Efficiency: Work in Progress

Published

Author(s)

Vladimir V. Marbukh

Abstract

Recent research has demonstrated benefits of replication of requests with canceling, which initiates multiple concurrent replicas of a request and uses the first successful result immediately removing the remaining replicas of the completed request from the system. This paper suggests that benefits of replication may come at the risk of abrupt system transition to an undesirable highly congested equilibrium. To expose, evaluate, and ultimately manage these risk/benefit trade-offs, we generalize replication strategy by: (a) accounting for possible inefficiency of “remote” service, (b) allowing replication only when static routing fails to identify idle “local” server, and (c) requiring one or more replicas of the same request to be completed to improve fault tolerance using majority rule decision. Due to intractability of the Markov performance model, our analysis is based on mean-field and fluid approximations. Future research should evaluate accuracy of assertions based on these approximations, and ultimately develop practical solutions for optimization of various performance trade-offs in distributed systems with replication.
Proceedings Title
IEEE SERVICES 2018
Conference Dates
July 2-7, 2018
Conference Location
San Fransisco, CA

Keywords

Dynamic job replication, fault tolerance, latency, economic efficiency, risk/benefit trade-offs.

Citation

Marbukh, V. (2018), Dynamic Job Replication for Balancing Fault Tolerance, Latency, and Economic Efficiency: Work in Progress, IEEE SERVICES 2018, San Fransisco, CA, [online], https://doi.org/10.1109/SCC.2018.00043 (Accessed October 10, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created September 6, 2018, Updated May 14, 2020
Was this page helpful?