A Software Assurance Reference Dataset: Thousands of Programs With Known Bugs

Published: April 16, 2018


Paul E. Black


The Software Assurance Reference Dataset (SARD) is a growing collection of over 170 000 programs with precisely located bugs. The programs are in C, C++, Java, PHP, and C# and cover more than 150 classes of weaknesses, such as SQL injection, cross-site scripting (XSS), buffer overflow, and use of broken cryptographic algorithm. Most are automatically generated synthetic programs, each a few pages of code long, but there are also over 7000 full-sized applications. In addition, SARD has production code and has hundreds of cases written by hand. The code is typical quality. It is neither pristine nor abhorrent. Many cases have corresponding "good" cases, in which weaknesses are fixed, to test for false positives. The SARD web interface allows users to browse test cases and test suites or search for test cases by programming language, weakness type, file name, size, words in the description, and several other criteria. The user can select and download any or all of the resulting cases.
Citation: Journal of Research (NIST JRES) -
Volume: 123
Pub Type: NIST Pubs


cybersecurity, software assurance, software quality, static analysis
Created April 16, 2018, Updated November 10, 2018