Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Workshop on Software Security Assurance Tools, Techniques, and Metric USB

[SAMATE Home | IntrO TO SAMATE | SARD | SATE | Bugs Framework | Publications | Tool Survey | Resources]

Welcome to the workshop on Software Security Assurance Tools, Techniques, and Metrics, organized by the U.S. National Institute of Standards and Technology (NIST). The purpose of the workshop is to convene researchers, developers, and government and industrial users of Software Security Assurance (SSA) tools to

  • discuss and refine the taxonomy of flaws and the taxonomy of functions, which are under development,
  • come to a consensus on which SSA functions should first have specifications and standards tests developed,
  • gather SSA tools vendors for "target practice": see how reference datasets fare against various tools, and
  • identify gaps or requirements for research in SSA functions.

The material and papers for the workshop will be distributed on USB drives to the participants. The content of the USB drives are:

We thank those who worked to organize this workshop, particularly Elizabeth Fong, who handled much of the correspondence and Debra A. Brodbeck, who provided conference support. We appreciate the program committee for their efforts in reviewing the papers. We are grateful to NIST, especially the Software Diagnostics and Conformance Testing division, for providing the organizers' time. On behalf of the program committee and the whole SAMATE team, thanks to everyone for taking their time and resources to join us.


Dr. Paul E. Black

Call for Papers
Description of Reference Data Set
Software Flaw Taxonomy

Workshop Call For Papers (SSATTM'05)

National Institute of Standards and Technology (NIST) workshop on
Software Assurance Tools, Techniques, and Metrics
7-8 November 2005
Co-located with ASE 2005
Long Beach, California, USA

   Funded in part by the Department of Homeland Security (DHS), the National Institute of Standards and Technology (NIST) started a long-term, ambitious project to improve software security assurance tools. Security is the ability of a system to maintain the confidentiality, integrity, and availability of information processed and stored by a computer. Software security assurance tools are those that help software be more secure by building security into software or determining how secure software is. Among the project's goals are:

  • develop a taxonomy of software security flaws and vulnerabilities,
  • develop a taxonomy of software security assurance (SSA) tool functions and techniques which detect or prevent flaws, and
  • develop testable specifications of SSA functions and explicit tests to evaluate how closely tools implement the functions. The test materials include reference sets of buggy code.

These goals extend into all phases of the software life cycle from requirements capture through design and implementation to operation and auditing. The goal of the workshop is to convene researchers, developers, and government and industrial users of SSA tools to

  • discuss and refine the taxonomy of flaws and the taxonomy of
  • functions, which are under development,
  • come to a consensus on which SSA functions should first have
  • specifications and standard tests developed,
  • gather SSA tools suppliers for "target practice" on reference
  • datasets of code, and
  • identify gaps or research needs in SSA functions.


Sets of code with known flaws and vulnerabilities, with corresponding correct versions, can be references for tool testing to make research easier and to be a standard of evaluation. Working with others, we will bring reference datasets of many types of code, like Java, C, binaries, and bytecode. We welcome contributions of code you've used. To help validate the reference datasets, we solicit proposals not exceeding 2 pages to participate in SSA tool "target practice" on the datasets. Tools can range from university projects to commercial products. Participation is intended to demonstrate the state of the art in finding flaws, consequently the proposals should not be marketing write-ups, but should highlight technical contributions: techniques used, precision achieved, classes of vulnerabilities detected, suggestions for extensions to and improvements of the reference datasets, etc. Participants are expected to provide their own equipment.


SSATTM encourages contributions describing basic research, novel applications, and experience relevant to SSA tools and their evaluation. Topics of particular interest are:

  • Benchmarks or reference datasets for SSA tools
  • Comparisons of tools
  • ROI effectiveness of SSA functions
  • Flaw catching effectiveness of SSA functions
  • Evaluating SSA tools
  • Gaps or research needs in SSA functions
  • SSA tool metrics
  • Software security assurance metrics
  • Surveys of SSA tools
  • Relation between flaws and the techniques that catch them
  • Taxonomy of software security flaws and vulnerabilities
  • Taxonomy of SSA functions or techniques


Papers should not exceed 8 pages in the conference format . Papers exceeding the length restriction will not be reviewed. Papers will be reviewed by at least two program committee members. All papers should clearly identify their novel contributions. All papers should be submitted electronically in PDF format by 26 August 2005 to Elizabeth Fong (efong [at] (efong[at]nist[dot]gov)).


Accepted papers will be published in the workshop proceedings. The workshop proceedings, along with a summary of discussions and the output of the reference dataset "target practice", will be published as a NIST Special Publication.


  • Freeland Abbott - Georgia Tech
  • Paul Ammann - George Mason U.
  • Paul E. Black - NIST
  • Elizabeth Fong - NIST
  • Michael Hicks - U. Maryland
  • Michael Kass - NIST
  • Michael Koo - NIST
  • Richard Lippmann - MIT
  • Robert A. Martin - M ITRE Corp.
  • W. Bradley Martin - NSA
  • Nachiappan Nagappan - Microsoft Research
  • Samuel Redwine - James Madison U.
  • Ravi Sandhu - George Mason U.
  • Larry D. Wagoner - NSA



  • 19 Aug: Paper and tool proposal submission deadline
  • 19 Sep: Paper and proposal notification
  • 15 Oct: Final camera-ready copy due
  • 7-8 Nov: Workshop

Workshop Program

Monday - November 7, 2005

Time : Session :
8:30 – 9:00 Welcome – Paul Black
9:00 – 10:30 Tools and Metrics - Elizabeth Fong
Where do Software Security Assurance Tools Add Value – David Jackson, David Cooper
Metrics that Matter: Quantifying Software Security Risk – Brian Chess
The Case for Common Flaw Enumeration – Robert Martin, Steven Christey, Joe Jarzombek
10:30 – 11:00 Break
11:00 – 12:30 Flaw Taxonomy and Benchmarks - Robert Martin
Seven Pernicious Kingdoms: A Taxonomy of Software Security Errors – Katrina Tsipenyuk, Brian Chess, Gary McGraw
A taxonomy of Buffer Overflows for Evaluating Static and Dynamic Software Testing Tools – Kendra Kratkiewicz, Richard Lippmann
ABM – A Prototype for Benchmarking Source Code Analyzers – Tim Newsham, Brian Chess
12:30 – 1:30 Lunch
1:30 – 4:00 New Techniques - Larry Wagoner
A Benchmark Suite for Behavior-Based Security Mechanisms – Dong Ye, Micha Moffie, David Kaeli
Testing and Evaluation of Virus Detectors for Handheld Devices – Jose A. Morales, Peter Clarke, Yi Deng
Eliminating Buffer Overflows, Using the Compiler or a Standalone Tool – Thomas Plum, David Keaton
A Secure Software Architecture Description Language – Jie Ren, Richard Taylor
Prioritization of Threats Using the K/M Algebra – Supreeth Vendataraman, Warren Harrison
End of day one

Tuesday - November 8, 2005

9:00 – 11:30
Reference Dataset Discussion – Michael Kass
11:30 – 1:00 Lunch
1:00 – 2:30 Invited Presentation - Vadim Okun
Correct by Construction: The Case for Constructive Static Verification – Roderick Chapman

End of workshop


The SAMATE Reference Dataset and Target Practice Test Suite

   The SAMATE Reference Dataset (SRD) is a rapidly growing set of contributed test cases for measuring software assurance (SwA) tool capability against a functional specification for that tool.

This initial distribution is a compilation of C source code test cases that will be used for evaluating the functional capability of C source code scanning tools. Contributions from MIT Lincoln Lab and Fortify Software Inc. make up this initial set. Additional contributions from Klocwork Inc. and Ounce Labs Inc. will be added soon.

We expect to expand the SRD to include other languages (e.g. C++, Java) as well as to include test suites for other SwA tools (such as requirements and software design documents).

MIT Contribution

Documentation for each test case is contained in the source files themselves. In the case of the MIT contribution, the first line of each test case contains a classification code describing the test case “signature” (in terms of code complexity). All MIT discrete test cases are “buffer overflow” examples, with permutations of some of the 22 coding variation factors to challenge a tool's ability to discover a buffer overflow or recognize a patched version of the overflow. Also, MIT contributed 14 models (scaled-down versions) of 3 real world applications (bind, sendmail, and wu-ftpd).

Fortify Software Test Case Contribution

Fortify Software has contributed C code test cases, the majority of which are also buffer overflow vulnerabilities. Additionally a number of race condition, command injection and other vulnerabilities are also included in the test suite. Like the MIT test cases, the Fortify test cases are “self documenting”, with keywoSRD describing the type of software flaw present in the code. Additionally, to provide a uniform way of classifying the complexity of the test cases, the MIT classification code is placed at the top of each test file.

Klocwork Test Case Contribution

Klocwork Inc. has donated an initial contribution of C++ test cases, the majority of which are memory management related (e.g. memory leak, bad frees, use after frees ). They intend to follow up with an additional donation of Java test cases.

Target Practice Test Suite - [Download the files (zip)]

A subset of both the MIT (152 discrete test cases and 3 models) and Fortify (12) test cases make up the “target practice” test suite. A representative group of well-understood and documented tests are presented as a “starting point” to get initial feedback from tool developers and users as to how useful the test suite is. Both a “bad” (flawed) and “good” (patched) version exists for each test case.

  • Test Suite Execution - It is expected that each tool developer/user will run their tool against the target practice test suite before attending the workshop on Tuesday, so as to provide maximum time for discussion of the merits/deficiencies in the test suite. Tests are provided in two separate directories (MIT and Fortify). How a tool scans the test suite is at the discretion of the tool implementer/user.
  • Test Suite Evaluation - After running their tool on the Target Practice test suite, tool developers/users will be asked to fill out a questionnaire regarding usefulness of the test suite in the following areas:
    • Validity of the tests
    • Do test cases reflect real world examples?
    • Test case coverage (What software flaws should we focus on initially?)
    • Complexity (Were the tests challenging/enlightening for discovering a tool's capability?)
    • Sufficient metadata for describing test case flaws and code complexity (e.g. MIT's metadata scheme - do we need more? If so what?)
  • Confidentiality of Test Results - At no time is a tool developer required to report anything about their tool's performance against the Target Practice test suite. The purpose of the target practice is to solicit feedback on the SRD… NOT the tools that run against it. If a tool developer wishes to provide further insight into the usefulness of the SRD by disclosing how their tool performed against it, they do so at their own discretion.

Agenda for the Target Practice:

9 AM - 11:30 AM - Discussion of Test Results and Reference Dataset by target practice participants and workshop attendees :

  • 9:00 - 10:30 - Usefulness of test cases:
    • Validity
      • Do test cases reflect real world examples?
    • Coverage
      • Where (what flaws) should we focus on initially?
    • Complexity
      • What levels of code complexity are necessary to properly evaluate a tool's capability
    • Variation
      • Expressed in Taxonomy of Flaws. Or in Test Case itself?
  • 10:30 - 11:00 - Test Case Metadata:
    • Classification of software flaws in test cases
      • What common taxonomy to use for all code scanning tools? (Plover, CLASP, Fortify, Klocwork)
      • How can all the taxonomies be harmonized?
      • Correct metadata for describing test case complexity ( e.g. MIT's metadata scheme - do we need more? if so what? )
  • 11:00 - 11:20 - Requirements for an “easy to use” Reference Dataset:
    • Security
    • Web Accessibility?
    • Ad Hoc Query Capability
    • Validatable Submission
    • Batch Submission(1000s of Test Cases)
    • Dynamic Test Case Generation
    • Access Control
    • Demo NIST Prototype SRD
  • 11:20 - 11:30 - Next Steps:
    • Harmonize ideas for a common taxonomy of SwA flaws and vulnerabilities
    • Test Case Submission by SwA community
      • Fortify
      • Klocwork
      • Ounce Labs
      • MIT
      • Other
  • 11:30 Lunch

A Possible Harmonizing Software Flaw Taxonomy

   One of the conclusions from the August ’05 workshop “Defining the State of the Art in Software Security Tools” was the need for a reference taxonomy of software flaws and vulnerabilities. To further this goal, the NIST SAMATE team developed a harmonization scenario extending ideas in the Tsipenyuk/Chess/McGraw paper Seven Pernicious Kingdoms: A Taxonomy of Software Security Errors.

This scenario was created from the following publicly available taxonomies:

  • The Kingdoms - Katrina Tsipenyuk, Brian Chess, Gary McGraw - November 2005 - IEEE/ACM Conference, Long Beach, CA -
  • CLASP - Comprehensive, Lightweight Application Security Process - Pravir Chandra, John Viega et al. - OWASP
  • “19 Deadly Sins of Software Security”, M. Howard, D. LeBlanc, and J. Viega, McGraw-Hill Osborne Media, July 2005.
  • OWASP Top Ten Most Critical Web Application Security Vulnerabilities
  • PLOVER - Preliminary List Of Vulnerability Examples for Researchers - Steve Christey - CVE MITRE

The scenario is unavailable at this time. You can contact us for further information. Construction details may be found below. This working document was developed by the NIST SAMATE team from publicly available sources without consultation with any of the taxonomy authors. The goal is to stimulate discussion.

Please join the samate-subscribe [at] (subject: subscribe) (samate[at]yahoogroups[dot]com) email group to comment.

Notes on the Construction of the Harmonization Scenario :

  • At the topmost level are the Kingdoms. The sublevels under that are a collection of categories from each of the five taxonomies. This enables commonalities/differences to be visible. That each of the five has a category for buffer overflow, and that the buffer overflow category for each is located under Input Validation and Representation is an example of commonality. That the CLASP category of Uninitialized variable appears under Errors and Kingdoms’ category of Uninitialized variable appears under Code Quality is an example of difference.
  • Suffixes on the category names indicate from which taxonomy the name comes, and at which step (see next note) in the construction process the category appeared.
    • Suffixes of the form “--f-c” indicate step (a).
    • Suffixes of the form “--<taxonomy name>--f-c” indicate step (b).
    • Suffixes of the form “--plover” indicate step (c).


  • The following describes the process of scenario construction:







    Environmental problems



    General logic errors

    Security Features


    Protocol errors

    Input Validation and Representation


    Range and type errors

    Time and State errors


    Synchronization and timing errors


    1. The five topmost levels of the CLASP taxonomy match reasonably well with five of the eight Kingdoms. Thus, Kingdoms’ topmost levels were chosen as the scenario’s topmost level, and sublevels of CLASP taxonomy were merged under corresponding topmost levels of the Kingdoms taxonomy as follows:
    2. As described in the Tsipenyuk/Chess/McGraw paper, elements of the lists from the “19 Deadly Sins of Software Security” and OWASP top ten were added.
    3. The PLOVER WIFF (Weaknesses, Idiosyncrasies, Faults, Flaws) categories were added under the topmost levels of Kingdoms.
  • The scenario was constructed using the ontology development tool Protégé ( This choice was made because Protégé was convenient and because a taxonomy is an elementary ontology. We recognize that XML tools may be more appropriate and that a schema for representing the reference taxonomy may need to be developed.

Disclaimer: Any commercial product mentioned is for information only; it does not imply recommendation or endorsement by NIST nor does it imply that the products mentioned are necessarily the best available for the purpose.

Created March 30, 2021, Updated March 22, 2023