Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

A Software Assurance Reference Dataset: Thousands of Programs With Known Bugs

Published

Author(s)

Paul E. Black

Abstract

The Software Assurance Reference Dataset (SARD) is a growing collection of over 170 000 programs with precisely located bugs. The programs are in C, C++, Java, PHP, and C# and cover more than 150 classes of weaknesses, such as SQL injection, cross-site scripting (XSS), buffer overflow, and use of broken cryptographic algorithm. Most are automatically generated synthetic programs, each a few pages of code long, but there are also over 7000 full-sized applications. In addition, SARD has production code and has hundreds of cases written by hand. The code is typical quality. It is neither pristine nor abhorrent. Many cases have corresponding "good" cases, in which weaknesses are fixed, to test for false positives. The SARD web interface allows users to browse test cases and test suites or search for test cases by programming language, weakness type, file name, size, words in the description, and several other criteria. The user can select and download any or all of the resulting cases.
Citation
Journal of Research (NIST JRES) -
Volume
123

Keywords

cybersecurity, software assurance, software quality, static analysis

Citation

Black, P. (2018), A Software Assurance Reference Dataset: Thousands of Programs With Known Bugs, Journal of Research (NIST JRES), National Institute of Standards and Technology, Gaithersburg, MD, [online], https://doi.org/10.6028/jres.123.005 (Accessed March 29, 2024)
Created April 16, 2018, Updated May 4, 2021