A Method Level Test Generation Framework for Debugging Big Data Applications
Huadong Feng, Jagan Chandrasekaran, Yu Lei, Raghu N. Kacker, David R. Kuhn
When a failure occurs in a big data application, debugging with the original dataset can be difficult due to the large amount of data being processed. This paper introduces a framework for effectively generating method-level tests to facilitate debugging of big data applications. This is achieved by running a big data application with the original dataset and by recording the inputs to a small number of method executions, which we refer to as method-level tests, that preserves certain code coverage, e.g., edge coverage. The inputs of these method-level tests are further reduced if needed, while maintaining code coverage. When debugging, a developer could inspect the execution of these method-level tests, instead of the entire program execution with the original dataset, which could be time-consuming. We implemented the framework and applied the framework to seven algorithms in the WEKA tool. The initial results show that a small number of method-level tests are sufficient to preserve code coverage. Furthermore, these tests could kill between 57.58% to 91.43% of the mutants generated using a mutation testing tool. This suggests that the framework could significantly reduce the efforts required for debugging big data applications.