In large-scale grid systems with decentralized control, the interactions of many service providers and consumers will likely lead to emergent global system behaviours that result in unpredictable, often detrimental outcomes. This possibility argues for developing analytical tools to allow understanding, and prediction of complex system behaviour in order to ensure availability and reliability of grid computing services. This paper presents an approach for using piece-wise homogeneous Discrete Time Markov chains to provide rapid, potentially scalable, simulation of large-scale grid systems. This approach, previously used in other domains, is used here to model dynamics of large-scale grid systems. A Markov chain model of a grid system is first represented in a reduced, compact form. This model can then be perturbed to produce alternative system execution paths and identify scenarios in which system performance is likely to degrade or anomalous behaviours occur. The expeditious generation of these scenarios allows prediction of how a larger system will react to failures or high stress conditions. Though computational effort increases in proportion to the number of paths modelled, this cost is shown to be far less than the cost of using detailed simulation or testbeds. Moreover, cost is unaffected by size of system being modelled, expressed in terms of workload and number of computational resources, and is adaptable to systems that are non-homogenous with respect to time. The paper provides detailed examples of the application of this approach and discusses future work.
Citation: NIST Interagency/Internal Report (NISTIR) - 7566Report Number:
NIST Pub Series: NIST Interagency/Internal Report (NISTIR)
Pub Type: NIST Pubs
Markov chain, transition probability matrix, discrete time Markov chain, piece-wise homogenous Markov chain