For selected tool warnings, we analyzed up to three of the
following characteristics. First, we associated together warnings
that refer to the same weakness. Second, we assigned severity to
warnings when we disagreed with the severity assigned by the
tool. Often, we gave a lower severity to indicate that in our
view, the warning was not relevant to security. Third, we analyzed
correctness of the warnings. During the analysis phase, we
marked the warnings as true or false positive. Later, we decided
not to use the true/false positive markings. Instead, we marked
as "confirmed" the warnings that we determined to be correctly
reporting a weakness. We marked as "unconfirmed" the rest of the
warnings that we analyzed or associated. In particular, this
category includes the warnings that we analyzed but were not sure
whether they were correct.

Below are the criteria that we used for associating warnings that
refer to the same weakness and also for marking correctness and
severity of the warnings. We marked severity of a warning whenever
we disagreed with the tool.

Correctness and severity are orthogonal. Confirmed means that we
determined that the warning correctly reports a weakness. Severity
attempts to address security relevance.

I. Criteria for analysis of correctness

In our analysis we assumed that

* A tool has (or should have) perfect knowledge of control/data
flow that is explicitly in the code.

  ** For example, if a tool reports an error caused by unfiltered
input, but in fact the input is filtered correctly, mark it as
false.

* If the input is filtered, but the filtering is not complete,
mark it as true. This is often the case for cross-site scripting
weaknesses.

  ** If a warning says that a function can be called with a bad
parameter, but in the test case it is always called with safe
values, mark the warning as false.

* A tool does not know about context or environment and may
assume the worst case.

  ** For example, if a tool reports a weakness that is caused by
unfiltered input from command line or from local files, mark it
as true. The reason is that the test cases are general purpose
software, and we did not provide any environmental information to
the participants.

II. Criteria for analysis of severity

We used an ordinal scale of 1 to 5, with 1 - the highest severity.
We assigned severity 4 or 5 to warnings that were not likely to be
security relevant.

We focused our analysis on issues with severity 1 and 2. We left
the severity assigned by the tool when we agreed with the tool.
We assigned severity to a warning when we disagreed with the
tool.

Specifically, we downgraded severity in these cases:

* A warning applies to functionality which may or may not be used
securely. If the tool does not analyze the use of the functionality
in the specific case, but provides a generic warning, we
downgrade the severity to 4 or 5. For example, we downgrade
severity of general warnings about use of getenv.

* A weakness is unlikely to be exploitable in the usage context.
Note that the tool does not know about the environment, so it is
correct in reporting such issues.

  ** For example, if input comes from configuration file during
installation, we downgrade severity.

  ** We assume that regular users cannot be trusted, so we do not
downgrade severity if input comes from a user with regular login
credentials.

* We believe that a class of weaknesses is less relevant to security.

III. Criteria for associating warnings

Tool warnings may refer to the same weakness. In this case, we
associated them, so that any analysis for one warning applied to
every warning.

The following criteria apply to weaknesses that can be described
using source-to-sink paths. A source is where user input can enter
a program. A sink is where the input is used.

* If two warnings have the same sink, but the sources are two
different variables, do not associate these warnings.

* If two warnings have the same source and sink, but paths are
different, associate these warnings, unless the paths involve
different filters.

* If the tool reports only the sink, and two warnings refer to
the same sink and use the same weakness name, associate these
warnings, since we may have no way of knowing which variable they
refer to.

