The recent explosion of affordable multicore, multichip systems, coupled with cluster management software, encourages the development of novel distributed applications for exploring large parameter spaces. We expect many such applications will soon appear. For example, we recently applied a genetic algorithm to steer a population of cloud-computing simulators toward low-probability, costly failure scenarios. We aim to provide a design-time tool that system engineers can use to identify and mitigate such scenarios. We found that our idea was much simpler in theory than in practice, largely due to implementation challenges that arose. In this paper, we describe the design and deployment of our application, and we identify and discuss the practical challenges that we faced. We outline pragmatic solutions that we adopted to overcome those challenges. We believe many near-future applications will face similar challenges, so we hope that our experiences prove instructive.
ICSE 2013 SEIP proceedings
May 18-26, 2013
San Francisco, CA
International Conference on Software Engineering
Computational steering, cloud computing, cluster computing, discrete event simulation, distributed systems, fault tolerance, genetic algorithms, software for parallel and distributed systems