Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Predicting Failure in Complex Systems by Perturbing Markov Chain Models

Published

Author(s)

Christopher E. Dabrowski, Fern Y. Hunt

Abstract

In recent years, substantial research has been devoted to monitoring and predicting performance degradations in real-world complex systems within large entities such as nuclear power plants, electrical grids, and distributed computing systems. Special challenges are posed by the fact that such systems operate in uncertain environments, are highly dynamic, and can exhibit emergent behaviors that can lead to catastrophic failure. Discrete Time Markov chains (DTMCs) have been an important area of focus of this research, because they represent dynamic behavior succinctly, provide a means to measure uncertainty, and can model long-term system evolution, i.e., can be extended to be time-inhomogeneous. Moreover, DTMCs provide a means to measure potential changes to system performance. To date, DTMCs have been proposed for tasks such as fault detection and long-term condition equipment monitoring in real-world complex systems. However, the scope of these models has generally been restricted to describing states that directly concern fault conditions. Less work has been done on using DTMCs to represent a more complete range of states a complex system may enter into during normal operation. Such comprehensive, detailed models allow a system to be analyzed in the context of normal operation in order to understand more precisely how evolution into undesirable states occurs. This paper describes progress made on developing an approach for using larger, more detailed DTMC models to find potential failure scenarios in operational complex systems. The approach uses a combination of methods to perturb a DTMC, simulate alternative system evolutions, and identify scenarios in which a system may descend into failure. Key to the approach is the use of graph theory techniques to reduce the size of the search space of potential alternative behaviors to be explored. An example is provided of using a DTMC of significant size to predict failure in a distributed resource allocation system.
Proceedings Title
Proceedings of the 2011 American Society of Mechanical Engineers (ASME) Pressure Vessels & Piping Division (PVPD) Conference
Conference Dates
July 7, 2011
Conference Location
Baltimore, MD
Conference Title
American Society of Mechanical Engineers (ASME)
2011 Pressure Vessels & Piping Division (PVPD) Conference

Keywords

Complex system, Discrete Time Markov chain, time-inhomogeneous Markov chain, matrix perturbation
Created July 21, 2011, Updated February 19, 2017