Multiple techniques and tools prove effective for software assurance. One technique that has grown in acceptance since the early 2000s is static analysis, which examines software for weaknesses without executing it. The National Institute of Standards and Technology (NIST) Software Assurance Metrics and Tool Evaluation (SAMATE) project has organized five Static Analysis Tool Expositions (SATEs), designed to advance research in static analysis tools that find security-relevant weaknesses in source code. This paper discusses our experiences with SATE, which can be useful for the software assurance community. Specifically, the paper focuses on the selection of test cases and how to analyze the output warnings from static analysis tools. Three selection criteria are used: 1) representative of real, existing code, 2) large amounts of test data to yield statistical significance, and 3) knowledge of the weakness locations in code (ground truth). SATE V proposed three types of test cases, each expressing two of the three criteria: 1) production test cases, which had real code and statistical significance, 2) CVE-related test cases, which had real code and ground truth, and 3) synthetic test cases, which had ground truth and statistical significance. We describe metrics that can be used for evaluating tool effectiveness. Metrics, such as precision, recall, discrimination, coverage and overlap, are discussed in the context of the three types of test cases. Our main goal for future SATEs is to improve the quality of our test suites by producing test cases satisfying all three criteria. One approach is to insert realistic security-relevant weaknesses into real-world software. We will discuss this approach and other plans for SATE VI.
Citation: Journal of Cyber Security and Information Systems
Pub Type: Journals
software assurance, static analysis, programming language test material, software quality, cybersecurity