Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Information Technology Laboratory / Software and Systems Division

Software Quality Group

Workshop on Software Security Assurance Tools, Techniques, and Metric USB

Welcome to the workshop on Software Security Assurance Tools, Techniques, and Metrics, organized by the U.S. National Institute of Standards and Technology (NIST). The purpose of the workshop is to convene researchers, developers, and government and industrial users of Software Security Assurance (SSA) tools to

discuss and refine the taxonomy of flaws and the taxonomy of functions, which are under development,
come to a consensus on which SSA functions should first have specifications and standards tests developed,
gather SSA tools vendors for "target practice": see how reference datasets fare against various tools, and
identify gaps or requirements for research in SSA functions.

The material and papers for the workshop will be distributed on USB drives to the participants. The content of the USB drives are:

Introduction,
workshop Call for Papers ,
workshop agenda,
reference dataset "target practice",
accepted papers, and
a possible harmonizing software flaw taxonomy.

We thank those who worked to organize this workshop, particularly Elizabeth Fong, who handled much of the correspondence and Debra A. Brodbeck, who provided conference support. We appreciate the program committee for their efforts in reviewing the papers. We are grateful to NIST, especially the Software Diagnostics and Conformance Testing division, for providing the organizers' time. On behalf of the program committee and the whole SAMATE team, thanks to everyone for taking their time and resources to join us.

Sincerely,

Dr. Paul E. Black

Call for Papers
Agenda
Description of Reference Data Set
Software Flaw Taxonomy

Workshop Call For Papers (SSATTM'05)

National Institute of Standards and Technology (NIST) workshop on
Software Assurance Tools, Techniques, and Metrics
7-8 November 2005
Co-located with ASE 2005
Long Beach, California, USA

Funded in part by the Department of Homeland Security (DHS), the National Institute of Standards and Technology (NIST) started a long-term, ambitious project to improve software security assurance tools. Security is the ability of a system to maintain the confidentiality, integrity, and availability of information processed and stored by a computer. Software security assurance tools are those that help software be more secure by building security into software or determining how secure software is. Among the project's goals are:

develop a taxonomy of software security flaws and vulnerabilities,
develop a taxonomy of software security assurance (SSA) tool functions and techniques which detect or prevent flaws, and
develop testable specifications of SSA functions and explicit tests to evaluate how closely tools implement the functions. The test materials include reference sets of buggy code.

These goals extend into all phases of the software life cycle from requirements capture through design and implementation to operation and auditing. The goal of the workshop is to convene researchers, developers, and government and industrial users of SSA tools to

discuss and refine the taxonomy of flaws and the taxonomy of
functions, which are under development,
come to a consensus on which SSA functions should first have
specifications and standard tests developed,
gather SSA tools suppliers for "target practice" on reference
datasets of code, and
identify gaps or research needs in SSA functions.

REFERENCE DATASET "TARGET PRACTICE":

Sets of code with known flaws and vulnerabilities, with corresponding correct versions, can be references for tool testing to make research easier and to be a standard of evaluation. Working with others, we will bring reference datasets of many types of code, like Java, C, binaries, and bytecode. We welcome contributions of code you've used. To help validate the reference datasets, we solicit proposals not exceeding 2 pages to participate in SSA tool "target practice" on the datasets. Tools can range from university projects to commercial products. Participation is intended to demonstrate the state of the art in finding flaws, consequently the proposals should not be marketing write-ups, but should highlight technical contributions: techniques used, precision achieved, classes of vulnerabilities detected, suggestions for extensions to and improvements of the reference datasets, etc. Participants are expected to provide their own equipment.

TOPICS OF PAPERS:

SSATTM encourages contributions describing basic research, novel applications, and experience relevant to SSA tools and their evaluation. Topics of particular interest are:

Benchmarks or reference datasets for SSA tools
Comparisons of tools
ROI effectiveness of SSA functions
Flaw catching effectiveness of SSA functions
Evaluating SSA tools
Gaps or research needs in SSA functions
SSA tool metrics
Software security assurance metrics
Surveys of SSA tools
Relation between flaws and the techniques that catch them
Taxonomy of software security flaws and vulnerabilities
Taxonomy of SSA functions or techniques

PAPER SUBMISSION:

Papers should not exceed 8 pages in the conference format . Papers exceeding the length restriction will not be reviewed. Papers will be reviewed by at least two program committee members. All papers should clearly identify their novel contributions. All papers should be submitted electronically in PDF format by 26 August 2005 to Elizabeth Fong (efong [at] nist.gov (efong[at]nist[dot]gov)).

PUBLICATION:

Accepted papers will be published in the workshop proceedings. The workshop proceedings, along with a summary of discussions and the output of the reference dataset "target practice", will be published as a NIST Special Publication.

CURRENT PROGRAM COMMITTEE:

Freeland Abbott - Georgia Tech
Paul Ammann - George Mason U.
Paul E. Black - NIST
Elizabeth Fong - NIST
Michael Hicks - U. Maryland
Michael Kass - NIST
Michael Koo - NIST
Richard Lippmann - MIT
Robert A. Martin - M ITRE Corp.
W. Bradley Martin - NSA
Nachiappan Nagappan - Microsoft Research
Samuel Redwine - James Madison U.
Ravi Sandhu - George Mason U.
Larry D. Wagoner - NSA

----------------------------------------------------------------------

IMPORTANT DATES:

19 Aug: Paper and tool proposal submission deadline
19 Sep: Paper and proposal notification
15 Oct: Final camera-ready copy due
7-8 Nov: Workshop

Workshop Program

Monday - November 7, 2005

Time :

Session :

8:30 – 9:00

Welcome – Paul Black

9:00 – 10:30

Tools and Metrics - Elizabeth Fong

Where do Software Security Assurance Tools Add Value – David Jackson, David Cooper

Metrics that Matter: Quantifying Software Security Risk – Brian Chess

The Case for Common Flaw Enumeration – Robert Martin, Steven Christey, Joe Jarzombek

10:30 – 11:00

Break

11:00 – 12:30

Flaw Taxonomy and Benchmarks - Robert Martin

Seven Pernicious Kingdoms: A Taxonomy of Software Security Errors – Katrina Tsipenyuk, Brian Chess, Gary McGraw

A taxonomy of Buffer Overflows for Evaluating Static and Dynamic Software Testing Tools – Kendra Kratkiewicz, Richard Lippmann

ABM – A Prototype for Benchmarking Source Code Analyzers – Tim Newsham, Brian Chess

12:30 – 1:30

Lunch

1:30 – 4:00

New Techniques - Larry Wagoner

A Benchmark Suite for Behavior-Based Security Mechanisms – Dong Ye, Micha Moffie, David Kaeli

Testing and Evaluation of Virus Detectors for Handheld Devices – Jose A. Morales, Peter Clarke, Yi Deng

Eliminating Buffer Overflows, Using the Compiler or a Standalone Tool – Thomas Plum, David Keaton

A Secure Software Architecture Description Language – Jie Ren, Richard Taylor

Prioritization of Threats Using the K/M Algebra – Supreeth Vendataraman, Warren Harrison

End of day one

Tuesday - November 8, 2005

9:00 – 11:30

Reference Dataset Discussion – Michael Kass

11:30 – 1:00

Lunch

1:00 – 2:30

Invited Presentation - Vadim Okun

Correct by Construction: The Case for Constructive Static Verification – Roderick Chapman

End of workshop

The SAMATE Reference Dataset and Target Practice Test Suite

The SAMATE Reference Dataset (SRD) is a rapidly growing set of contributed test cases for measuring software assurance (SwA) tool capability against a functional specification for that tool.

This initial distribution is a compilation of C source code test cases that will be used for evaluating the functional capability of C source code scanning tools. Contributions from MIT Lincoln Lab and Fortify Software Inc. make up this initial set. Additional contributions from Klocwork Inc. and Ounce Labs Inc. will be added soon.

We expect to expand the SRD to include other languages (e.g. C++, Java) as well as to include test suites for other SwA tools (such as requirements and software design documents).

MIT Contribution

Documentation for each test case is contained in the source files themselves. In the case of the MIT contribution, the first line of each test case contains a classification code describing the test case “signature” (in terms of code complexity). All MIT discrete test cases are “buffer overflow” examples, with permutations of some of the 22 coding variation factors to challenge a tool's ability to discover a buffer overflow or recognize a patched version of the overflow. Also, MIT contributed 14 models (scaled-down versions) of 3 real world applications (bind, sendmail, and wu-ftpd).

Fortify Software Test Case Contribution

Fortify Software has contributed C code test cases, the majority of which are also buffer overflow vulnerabilities. Additionally a number of race condition, command injection and other vulnerabilities are also included in the test suite. Like the MIT test cases, the Fortify test cases are “self documenting”, with keywoSRD describing the type of software flaw present in the code. Additionally, to provide a uniform way of classifying the complexity of the test cases, the MIT classification code is placed at the top of each test file.

Klocwork Test Case Contribution

Klocwork Inc. has donated an initial contribution of C++ test cases, the majority of which are memory management related (e.g. memory leak, bad frees, use after frees ). They intend to follow up with an additional donation of Java test cases.

Target Practice Test Suite - [Download the files (zip)]

A subset of both the MIT (152 discrete test cases and 3 models) and Fortify (12) test cases make up the “target practice” test suite. A representative group of well-understood and documented tests are presented as a “starting point” to get initial feedback from tool developers and users as to how useful the test suite is. Both a “bad” (flawed) and “good” (patched) version exists for each test case.

Test Suite Execution - It is expected that each tool developer/user will run their tool against the target practice test suite before attending the workshop on Tuesday, so as to provide maximum time for discussion of the merits/deficiencies in the test suite. Tests are provided in two separate directories (MIT and Fortify). How a tool scans the test suite is at the discretion of the tool implementer/user.
Test Suite Evaluation - After running their tool on the Target Practice test suite, tool developers/users will be asked to fill out a questionnaire regarding usefulness of the test suite in the following areas:
- Validity of the tests
- Do test cases reflect real world examples?
- Test case coverage (What software flaws should we focus on initially?)
- Complexity (Were the tests challenging/enlightening for discovering a tool's capability?)
- Sufficient metadata for describing test case flaws and code complexity (e.g. MIT's metadata scheme - do we need more? If so what?)
Confidentiality of Test Results - At no time is a tool developer required to report anything about their tool's performance against the Target Practice test suite. The purpose of the target practice is to solicit feedback on the SRD… NOT the tools that run against it. If a tool developer wishes to provide further insight into the usefulness of the SRD by disclosing how their tool performed against it, they do so at their own discretion.

Agenda for the Target Practice:

9 AM - 11:30 AM - Discussion of Test Results and Reference Dataset by target practice participants and workshop attendees :

9:00 - 10:30 - Usefulness of test cases:
- Validity
  - Do test cases reflect real world examples?
- Coverage
  - Where (what flaws) should we focus on initially?
- Complexity
  - What levels of code complexity are necessary to properly evaluate a tool's capability
- Variation
  - Expressed in Taxonomy of Flaws. Or in Test Case itself?
10:30 - 11:00 - Test Case Metadata:
- Classification of software flaws in test cases
  - What common taxonomy to use for all code scanning tools? (Plover, CLASP, Fortify, Klocwork)
  - How can all the taxonomies be harmonized?
  - Correct metadata for describing test case complexity ( e.g. MIT's metadata scheme - do we need more? if so what? )
11:00 - 11:20 - Requirements for an “easy to use” Reference Dataset:
- Security
- Web Accessibility?
- Ad Hoc Query Capability
- Validatable Submission
- Batch Submission(1000s of Test Cases)
- Dynamic Test Case Generation
- Access Control
- Demo NIST Prototype SRD
11:20 - 11:30 - Next Steps:
- Harmonize ideas for a common taxonomy of SwA flaws and vulnerabilities
- Test Case Submission by SwA community
  - Fortify
  - Klocwork
  - Ounce Labs
  - MIT
  - Other
11:30 Lunch

A Possible Harmonizing Software Flaw Taxonomy

One of the conclusions from the August ’05 workshop “Defining the State of the Art in Software Security Tools” was the need for a reference taxonomy of software flaws and vulnerabilities. To further this goal, the NIST SAMATE team developed a harmonization scenario extending ideas in the Tsipenyuk/Chess/McGraw paper Seven Pernicious Kingdoms: A Taxonomy of Software Security Errors.

This scenario was created from the following publicly available taxonomies:

The Kingdoms - Katrina Tsipenyuk, Brian Chess, Gary McGraw - November 2005 - IEEE/ACM Conference, Long Beach, CA -
CLASP - Comprehensive, Lightweight Application Security Process - Pravir Chandra, John Viega et al. - OWASP
“19 Deadly Sins of Software Security”, M. Howard, D. LeBlanc, and J. Viega, McGraw-Hill Osborne Media, July 2005.
OWASP Top Ten Most Critical Web Application Security Vulnerabilities
PLOVER - Preliminary List Of Vulnerability Examples for Researchers - Steve Christey - CVE MITRE

The scenario is unavailable at this time. You can contact us for further information. Construction details may be found below. This working document was developed by the NIST SAMATE team from publicly available sources without consultation with any of the taxonomy authors. The goal is to stimulate discussion.

Please join the samate-subscribe [at] groups.yahoo.com (subject: subscribe) (samate[at]yahoogroups[dot]com) email group to comment.

Notes on the Construction of the Harmonization Scenario :

At the topmost level are the Kingdoms. The sublevels under that are a collection of categories from each of the five taxonomies. This enables commonalities/differences to be visible. That each of the five has a category for buffer overflow, and that the buffer overflow category for each is located under Input Validation and Representation is an example of commonality. That the CLASP category of Uninitialized variable appears under Errors and Kingdoms’ category of Uninitialized variable appears under Code Quality is an example of difference.
Suffixes on the category names indicate from which taxonomy the name comes, and at which step (see next note) in the construction process the category appeared.
- Suffixes of the form “--f-c” indicate step (a).
- Suffixes of the form “--<taxonomy name>--f-c” indicate step (b).
- Suffixes of the form “--plover” indicate step (c).

The following describes the process of scenario construction:

Kingdoms		CLASP
Environment	<------	Environmental problems
Errors	<------	General logic errors
Security Features	<------	Protocol errors
Input Validation and Representation	<------	Range and type errors
Time and State errors	<------	Synchronization and timing errors

The five topmost levels of the CLASP taxonomy match reasonably well with five of the eight Kingdoms. Thus, Kingdoms’ topmost levels were chosen as the scenario’s topmost level, and sublevels of CLASP taxonomy were merged under corresponding topmost levels of the Kingdoms taxonomy as follows:
As described in the Tsipenyuk/Chess/McGraw paper, elements of the lists from the “19 Deadly Sins of Software Security” and OWASP top ten were added.
The PLOVER WIFF (Weaknesses, Idiosyncrasies, Faults, Flaws) categories were added under the topmost levels of Kingdoms.

The scenario was constructed using the ontology development tool Protégé (http://protege.stanford.edu). This choice was made because Protégé was convenient and because a taxonomy is an elementary ontology. We recognize that XML tools may be more appropriate and that a schema for representing the reference taxonomy may need to be developed.

Disclaimer: Any commercial product mentioned is for information only; it does not imply recommendation or endorsement by NIST nor does it imply that the products mentioned are necessarily the best available for the purpose.

Software research and Software testing

Created March 30, 2021, Updated April 21, 2025

Was this page helpful?