[SAMATE Home | IntrO TO SAMATE | SARD | SATE | Bugs Framework | Publications | Tool Survey | Resources]
SATE 2010 is the third annual SATE. The experience workshop was on 1 October 2010.
There is information about SATE 2009, SATE 2008, and latest SATE online.
The NIST Software Assurance Metrics And Tool Evaluation (SAMATE) project conducted the third Static Analysis Tool Exposition (SATE) in 2010 to advance research in static analysis tools that find security defects in source code. The main goals of SATE were to enable empirical research based on large test sets, encourage improvements to tools, and promote broader and more rapid adoption of tools by objectively demonstrating their use on production software.
Briefly, participating tool makers ran their tool on a set of programs. Researchers led by NIST performed a partial analysis of tool reports. The results and experiences were reported at the SATE 2010 Workshop in Gaithersburg, MD, in October, 2010.
"Report on the Third Static Analysis Tool Exposition (SATE 2010)", Vadim Okun, Aurelien Delaitre, Paul E. Black, editors, U.S. National Institute of Standards and Technology (NIST) Special Publication (SP) 500-283, October, 2011.
This special publication consists of the following three papers.
The data includes tool reports in the SATE output format, analysis of the tool reports (tool warnings selected randomly, based on CVEs, and based on manual findings), and additional information submitted by teams.
SATE 2010, as well as its predecessors, taught us many valuable lessons. Most importantly, our analysis should NOT be used as a basis for rating or choosing tools; this was never the goal of SATE.
There is no single metric or set of metrics that is considered by the research community to indicate or quantify all aspects of tool performance. We caution readers not to apply unjustified metrics based on the SATE data.
Due to the variety and different nature of security weaknesses, defining clear and comprehensive analysis criteria is difficult. While the analysis criteria have been much improved since the previous SATEs, further refinements are necessary.
The test data and analysis procedure employed have limitations and might not indicate how these tools perform in practice. The results may not generalize to other software because the choice of test cases, as well as the size of test cases, can greatly influence tool performance. Also, we analyzed a small subset of tool warnings.
In SATE 2010, we added CVE-selected programs to the test sets for the first time. The procedure that was used for finding CVE locations in code and selecting tool warnings related to the CVEs has limitations, so the results may not indicate tools' ability to find important security weaknesses.
The tools were used in this exposition differently from their use in practice. We analyzed tool warnings for correctness and looked for related warnings from other tools, whereas developers use tools to determine what changes need to be made to software, and auditors look for evidence of assurance. Also in practice, users write special rules, suppress false positives, and write code in certain ways to minimize tool warnings.
We did not consider the user interface, integration with the development environment, and many other aspects of the tools, which are important for a user to efficiently and correctly understand a weakness report.
Teams ran their tools against the test sets in July 2010. The tools continue to progress rapidly, so some observations from the SATE data may already be out of date.
Because of the stated limitations, SATE should not be interpreted as a tool testing exercise. The results should not be used to make conclusions regarding which tools are best for a particular application or the general benefit of using static analysis tools.
Download: SATE 2010 data
Note. Per requests by Coverity and Grammatech, their tool output is not released as part of SATE data. Consequently, our detailed analysis of their tool warnings is not released either. However, the observations and summary analysis in our paper are based on the complete data set.
Instructions for downloading and installing the test cases
Paul Anderson wrote a detailed proposal for using CVE-based test cases to provide ground truth for analysis. Romain Gaucher helped with planning SATE. Romain Gaucher and Ramchandra Sugasi of Cigital are the security experts that quickly and accurately performed human analysis of the test cases. We thank Sue Wang, now at MITRE, for great help with all phases of SATE 2010, including planning, selection of CVE-based test cases, and analysis. All members of the NIST SAMATE team contributed to SATE 2010.
SATE is modeled on the Text REtrieval Conference (TREC): https://trec.nist.gov/
Bill Pugh first proposed organizing a TREC-like exposition for static analysis tools: http://www.cs.umd.edu/~pugh/JudgingStaticAnalysis.pdf (slides 48-50)
Static Analysis Tool Exposition (SATE) is designed to advance research (based on large test sets) in, and improvement of, static analysis tools that find security-relevant defects in source code. Briefly, participating tool makers run their tools on a set of programs. Researchers led by NIST analyze the tool reports. The results and experiences are reported at a workshop. The tool reports and analysis are made publicly available later.
The goals of SATE are:
Our goal is not to evaluate nor choose the "best" tools.
SATE is aimed at exploring the following characteristics of tools: relevance of warnings to security, their correctness, and prioritization.
Note. A warning is an issue (usually, a weakness) identified by a tool. A (tool) report is the output from a single run of a tool on a test case. A tool report consists of warnings.
The following summarizes the steps in the SATE procedure. The dates are subject to change.
The exposition consists of 2 language tracks: C/C++ track and Java track.
Teams run their tools and submit reports following specified conditions.
Finding all weaknesses in a reasonably large program is impractical. Also, due to the likely high number of tool warnings, analyzing all warnings may be impractical. Therefore, we select subsets of tool warnings for analysis.
Generally the analyst first selects issues for analysis. Second, find associated warnings from tools. This results in a subset of tool warnings. Analyze this subset.
Methods 1 and 2 below apply to the general programs only. Method 3 applies to the CVE-selected programs. We will perform separate analysis and reporting for the resulting subsets.
Statistically select the same number of warnings from each tool report, assigning higher weight to categories of warnings with higher severity and avoiding categories of warnings with low severity.
This selection method is useful to the tool users because it considers warnings from each tool.
We selected 30 warnings from each tool report using the following procedure:
If a tool did not assign severity, we assigned severity based on weakness names and our understanding of their relevance to security.
Security experts manually analyze the test cases and identify the most important weaknesses (manual findings). Analyze for both design weaknesses and source code weaknesses focusing on the latter. Since manual analysis combines multiple weaknesses with the same root cause, we anticipate a small number of manual findings, e.g., 10-25 per test case. Take special care to confirm that the manual findings are indeed weaknesses. Tools may be used to aid human analysis, but static analysis tools cannot be the main source of manual findings.
Check the tool reports to find warnings related to the manual findings. For each manual finding, for each tool: find at least one related warning, or conclude that there are no related warnings.
This method is useful because it is largely independent of tools and thus includes weaknesses that may not be found by any tools. It also focuses analysis on weaknesses found most important by security experts.
For each CVE-selected pair of test cases, check the tool reports to find warnings that identify the CVEs in the vulnerable version. Check whether the warnings are still reported for the fixed version.
This method is useful because it focuses analysis on exploited weaknesses.
Detailed criteria for analysis of correctness and significance and criteria for associating warnings.
Assign one of the following categories to each warning analyzed.
For each tool warning in the list of selected warnings, find warnings from other tools that refer to the same (or related) weakness. For each selected warning instance, our goal is to find at least one related warning instance (if it exists) from each of the other tools. While there may be many warnings reported by a tool that are related to a particular warning, we do not attempt to find all of them.
We will use the following degrees of association:
Mark tool warnings related to manual findings with one of the following:
We plan to analyze the data collected and present the following in our report:
The SATE 2010 output format is the same as the SATE 2009 format, except for an additional correctness category in the evaluation section. SATE 2008 and 2009 outputs are subsets and are therefore valid for 2010.
In the SATE tool output format, each warning includes:
The SATE 2010 XML schema file can be downloaded.
Teams are encouraged to use the schema file for validation, for example:
Website: https://www.dovecot.org/
Download Dovecot from our server.
SHA256: 3f9b4d0501bf04b4bb940b8bf66e43265b53b0165293c166f4428d182b6e8587
dovecot-2.0.beta6.20100626.tar.gz
./configure
make
NOTE. Dovecot does memory allocation differently from other C programs. Its memory management is described here:
https://wiki.dovecot.org/Design/Memory
Website: https://www.wireshark.org/
On a fresh installation of Ubuntu 10.04 with GCC 4.4.3, install the following packages, configure and compile:
sudo apt-get install bison flex libgtk2.0-dev libgnutls-dev libpcap-dev
./configure
make
Website: http://www.chromium.org/
We used a fresh install of Ubuntu 10.04 with GCC 4.4.3
NOTE. Compiling Chrome requires a lot of memory. It succeeded on computer with 2GB or RAM and 20GB of disk space.
Install dependencies:
wget http://src.chromium.org/svn/trunk/src/build/install-build-deps.sh
chmod +x install-build-deps.sh
sudo ./install-build-deps.sh
Install code downloading tools:
wget http://src.chromium.org/svn/trunk/tools/depot_tools.tar.gz
tar zxf depot_tools.tar.gz
export PATH=$PATH:~/depot_tools
Download and compile the sources for the vulnerable version:
gclient config http://src.chromium.org/svn/releases/5.0.375.54
gclient sync
make
Download and compile the sources for the fixed version:
gclient config http://src.chromium.org/svn/releases/5.0.375.70
gclient sync
make
Website: http://pebble.sourceforge.net/
Download Pebble 2.5-M2 from our server.
SHA256: 02885022103cfdbaf984cfe72f84bf4ce0c7841003343b8b8058c27cdd413315
pebble.tar.gz
Compile:
sudo apt-get install subversion maven2 default-jdk
cd pebble
mvn
Pebble requires Java 6.0. To install Java EE 6:
sudo apt-get install default-jdk ant
wget http://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_SMI-Site/en_US…
chmod +x java_ee_sdk-6u1-unix.sh
sudo ./java_ee_sdk-6u1-unix.sh
Website: http://tomcat.apache.org/
NOTE. To compile different versions of Tomcat on the same computer, it may be necessary to remove files left over from a previous compilation in /usr/share/java.
On a fresh installation of Ubuntu 10.04, install the latest version of Sun JDK 5 (1.5), install ant and compile:
sudo apt-get install ant
ant