Cautions on Interpreting and Using the SATE Data

SATE 2010, as well as its predecessors, taught us many valuable lessons.
Most importantly, our analysis should NOT be used as a basis for rating
or choosing tools; this was never the goal of SATE.

There is no single metric or set of metrics that is considered by the
research community to indicate or quantify all aspects of tool performance.
We caution readers not to apply unjustified metrics based on the SATE data.

Due to the variety and different nature of security weaknesses, defining
clear and comprehensive analysis criteria is difficult. While the analysis
criteria have been much improved since the previous SATEs, further
refinements are necessary.

The test data and analysis procedure employed have limitations and might not
indicate how these tools perform in practice. The results may not generalize
to other software because the choice of test cases, as well as the size of
test cases, can greatly influence tool performance. Also, we analyzed a
small subset of tool warnings.

In SATE 2010, we added CVE-selected programs to the test sets for the first
time. The procedure that was used for finding CVE locations in code and
selecting tool warnings related to the CVEs has limitations, so the results
may not indicate tools' ability to find important security weaknesses.

The tools were used in this exposition differently from their use in
practice. We analyzed tool warnings for correctness and looked for related
warnings from other tools, whereas developers use tools to determine what
changes need to be made to software, and auditors look for evidence of
assurance. Also in practice, users write special rules, suppress false
positives, and write code in certain ways to minimize tool warnings.

We did not consider the user interface, integration with the development
environment, and many other aspects of the tools, which are important for a
user to efficiently and correctly understand a weakness report.

Teams ran their tools against the test sets in July 2010. The tools continue
to progress rapidly, so some observations from the SATE data may already be
out of date.

Because of the stated limitations, SATE should not be interpreted as a tool
testing exercise. The results should not be used to make conclusions
regarding which tools are best for a particular application or the general
benefit of using static analysis tools.

