A Software Assurance Reference Dataset: Thousands of Programs With Known Bugs
Paul E. Black
The Software Assurance Reference Dataset (SARD) is a growing collection of over 170 000 programs with precisely located bugs. The programs are in C, C++, Java, PHP, and C# and cover more than 150 classes of weaknesses, such as SQL injection, cross-site scripting (XSS), buffer overflow, and use of broken cryptographic algorithm. Most are automatically generated synthetic programs, each a few pages of code long, but there are also over 7000 full-sized applications. In addition, SARD has production code and has hundreds of cases written by hand. The code is typical quality. It is neither pristine nor abhorrent. Many cases have corresponding "good" cases, in which weaknesses are fixed, to test for false positives. The SARD web interface allows users to browse test cases and test suites or search for test cases by programming language, weakness type, file name, size, words in the description, and several other criteria. The user can select and download any or all of the resulting cases.
A Software Assurance Reference Dataset: Thousands of Programs With Known Bugs, Journal of Research (NIST JRES), National Institute of Standards and Technology, Gaithersburg, MD, [online], https://doi.org/10.6028/jres.123.005
(Accessed June 3, 2023)