The purpose of the reference dataset is to provide researchers (particularly in the SAMATE Project), software assurance tool developers, and end users with a set of artifacts with known software security errors and fixes for them. The artifacts will be designs, source code, binaries, etc., that is, from all phases of the software lifecycle. The samples include "synthetic" (written to test), collected from "the wild" (production), and academic (from university researchers). This dataset will also contain real, production software applications with known bugs and vulnerabilities. This will allow developers to test their methods and end users to evaluate a tool when considering it. The dataset intends to encompass a wide variety of vulnerabilities, languages, platforms, and compilers. There is more information about the ideas behind the reference dataset in the Software Assurance Reference Dataset philosophy page. To access the set itself, visit https://samate.nist.gov/SARD/.
The dataset is a large effort with more than 170 000 test cases. It has benefited from many contributors. The groups of contributions are detailed on the Acknowledgments and Test Case Descriptions page.
Any software artifact with security vulnerabilities is welcome to be submitted. Samples of avoiding or mitigating such vulnerabilities are also welcome. Although we intend to have security errors from the whole software lifecycle, this dataset concentrates on source code for now.
A test case consists of one or more files, which manifest the security error, and metadata about the file(s), such as the weakness type, language, etc. samate [at] nist.gov (Contact us) to submit test cases.
Any user can view or search then download test cases. The view/download screen and its subsequent screens present all the test cases in the SARD, except by default Deprecated cases. You can download selected test cases on a page, download the entire SARD, or download those on a page.
Clicking a Test Case ID displays that test case.
You can search for test cases according to certain search criteria, such as test case id, test case description, language, weakness type, string in the test file, etc. As above, you can download selected test cases from the set of test cases found, download all test cases found, or all the test cases on a page.
A test suite is a pre-defined collection of test cases. Anyone can view and download entire test suites.
When a test case is first added, its status is "Candidate". After review, the status of a test case could be "Accepted". If a test case needs to be withdrawn, it is marked "Deprecated". It is still available, for historical purposes, but should not be used in any new work. See Test Case Status - What it Means for details of the review process.
These are major subsystems to be added, which will require many changes.
These are internal to the SARD. They are not visible to users.
The collection was created in 2005 and was originally called the Standard Reference Dataset, abbreviated SRD [Black, "Software Assurance And Tool Evaluation", SERP 2005]. The name was quickly changed to Software Reference Dataset to be more specific, then to SAMATE Reference Dataset to be less presumptuous.
In 2013, we were notified that the abbreviation SRD conflicted with the federally legislated Standard Reference Data (SRD). About the same time we came to desire an acronym that was less common for web searches. Several rounds of soliciting suggestions, brain storming, polling, and discussion produced over two dozen possible new names. The new name, Software Assurance Refence Dataset, was suggested by Bertrand Stivalet and was announced at the Static Analysis Tool Exposition (SATE) V Experience Workshop on Friday, 14 March 2014.