Large infrastructures, such as clouds, can exhibit substantial outages, sometimes due to failure scenarios that were not considered during system design. We define a method that uses a genetic algorithm (GA) to search system simulations for parameter combinations that result in system failures, so that designers can take mitigation steps before deployment. We apply the method to study an existing infrastructure-as-a-service cloud simulator. We characterize the dynamics, quality, effectiveness and cost of GA search, when applied to seek a known failure scenario. Further, we iterate the GA search to reveal unknown failure scenarios. We find that, when schedule permits and failure costs are high, combining GA search with simulation proves useful for exploring and improving system designs.
Proceedings Title: Proceedings of the Fifth International Conference on Advances in System Simulation
Conference Dates: October 27-November 1, 2013
Conference Location: Venice, -1
Conference Title: SIMUL 2013, The Fifth International Conference on Advances in System Simulation
Pub Type: Conferences
failure prediction, genetic algorithms, simulation methodology, system design