DetectionScore.pl User Manual
DetectionScore.pl User Manual

The Generic Detection Task Scorer


Usage:

DetectionScore.pl -K key_file <Options> DetectionOutput

The 'DetectionScore.pl' program will score the output generated by an automatic detection system. The program assumes the following about the performed detection task:

Since many detection applications fit these general contraints, this program will score the detection performance for a variety of detection tasks.

The program requires two inputs, a answer key file (via the -K option), and a detection system output file. The answer key file defines the compared objects and whether or not the two objects are presumed equivalent. The system output file records the system's decision and scores for the decisions.

The program computes a variety of performance statistics, and then generates a scoring report and optionally Decision Error Tradeoff (DET) graphs.

Detection Performance Assessment

Detection performance is characterized in terms of the probability of miss and false alarm errors (Pmiss and Pfa). The error probabilities are then combined into a single detection cost Cdet, by assigning costs to miss and false alarm errors:

Cdet = Cmiss * Pmiss * Ptarget + Cfa * Pfa * (1-Ptarget) where
  • Cmiss and Cfa are the cost of a Miss and False Alarm respectively,
  • Pmiss and Pfa are the conditional probabilities of a Miss and False Alarm respectively, and
  • Ptarget are the a priori target probabilities.
Cdet is the bottom-line representation of detection performance that is used to judge systems. Unfortunately, the value of Cdet is also a function of application parameters. Specifically, Cdet is a function of the costs of detection errors and a priori target probabilities. Because of this, and in order to provide a more intuitively meaningful measure of system performance, Cdet will be normalized so that (Cdet)norm can be no less than one without extracting information from the source data. This is done as follows: (Cdet)norm = Cdet / MIN( Cmiss * Ptarget, Cfa * (1-Ptarget)) Thus the absolute value of (Cdet)norm is a direct measure of the value of the system.

Using these formulas, performance is measures in two ways, decision weighted or block weighted. Decision weighted performance, (sometimes called pooled or macro performance), weights each decision equally. These are global performance statistics, but no mean or variance can be associated with the performance variability. Block weighted performance computes decision weighted performance statistics on subsets, or blocks, of the test set, and then reports the mean of those statistics. The advantage of block weighted statistics is it has a reduced variance. The subsets can be, and often are, non-uniform in size.

The following <Options> are recognized by the program:

-C Cmiss:Cfa -> Set the cost of a missed detection and the cost of a false alarm to 'Cmiss' and 'Cfa' respectively. These numbers are used in the detection cost function. Default values are Cmiss=1.0 and Cfa=0.1 ;
-D Detail -> Write internally organize evaluation corpus and pertinent statistics for debugging purposes. This report, though voluminous, is intended to help researchers debug their internal versions of evaluation code.
-N TaskID,BlockID,DecisionID -> Define the names used in the reports. 'TaskID' is the detection task name, 'Link' is the default. 'BlockID' is the name that describes the block divisions, 'Topic' is the default. 'DecisionID' describes what individual decisions are made on, 'Story' is the default.
-P Ptarget -> Use Ptarget for the detection cost functions.
-r Report -> Write the summary report to 'Report' rather than STDOUT, the default.
-S -> If this flag is used, system output entries not present in the key file are ignored during scoring.
-v num -> Set the verbose level to 'num'. Default 1.
==0 None, ==1 Normal, >5 Slight, >10 way too much, >15 not even funny
Options that apply to the DET plots:
-d DETfile -> Create a DET plot in GNUplot format with the file root 'DETfile'. The program makes several files each with additional extensions. The file 'DETfile'.plt is a command file for GNUplot and can be printed using the command "gnuplot 'DETfile'.plt | lpr".
-t title -> Set the title line for the plot to 'title'.
-p -> Produce a single story-weighted DET line trace.
-w -> Produce a single topic-weighted DET line trace. This is the default option.
-n -> Add 90% confidence intervals to the topic-weighted DET graph.
-Z uncompress -> Specify the command for uncompressing the system output files prior to scoring. The decompression applies to ONLY the system decision files, not the file lists. The command is executed by opening a pipe command if the system output file ends with a .Z or .gz suffix. The command is required to read a compressed stream from STDIN, and write the uncompressed stream to STDOUT.

Program Input

Answer key file format

The answer key file defines the presumed correct answers for all of the detection object pairs. The file consists of a header record, followed a data record for each detection object pair. With the exception of the header record, any text following a '#' is ignored.

The BNF structure of the key file is:

<HEADER_LINE>
<DETECTION_OBJECT>
<DETECTION_OBJECT>
...

Where:

<HEADER_LINE> :== # LINK_DETECTION
'LINK_DETECTION' is the expected value. However, if another symbol is used, an ignorable warning message will be generated.
<DETECTION_OBJECT> :== <OBJECT> <OBJECT> <TRUTH> <BLOCKID>
<OBJECT> :== STRING
A text string identifying the object to be compared. The program does not derive any meaning from this string, except to cross reference the key entries to the system output.
<TRUTH> :== TARGET | NONTARGET
Specify whether or not the two objects are equivalent, a 'TARGET', or not a 'NONTARGET'.
<BLOCK> :== STRING
Specify which 'block' this detection pair belongs to. The STRING will be sorted numerically in the reports, so care should be taken to choose appropriate strings. If your detection evaluation does not support the notion, specify an identical value for all pairs.
The following is an excerpt from a detection answer key file.
# LINK_DETECTION
#
# Record format : ': : TARGET|NONTARGET '
#
APW19980104.0002 NYT19980104.0098 NONTARGET 44
APW19980104.0012 NYT19980105.0840 NONTARGET 33

Detection System Output Format

The detection systems must output an answer line matching each line in the key file. The first record in this file will contain two fields which specify information that applies globally to the whole file. Comment lines begin with the '#' character, and any text following a '#' is ignored. The exception to this rule is the first comment line can optionally contain a long description of the system under test. This description will be included in the scoring report along side the <SYSTEM> value described below. After the initial comment line, blank lines are treated as comments.

The BNF structure of the detection system output file is:

<SYSTEM> <DEF_PERIOD>
<DECISION_LINE>
<DECISION_LINE>
...

Where:

<SYSTEM> :== System is an alphanumeric character string that uniquely identifies the system being tested. (E.g., CDM_P05-8.v37)
<DEF_PERIOD> :== The deferral period before before decisions are made. This field exists in support of the TDT3 evaluation. It must not be omitted, however the program will issue a warning about non-standard deferral values.
<DECISION_LINE> :== <OBJECT> <OBJECT> <DECISION> <SCORE>
<OBJECT> :== STRING
A text string identifying the object to be compared. The program does not derive any meaning from this string, except to cross reference the key entries to the system output.
<DECISION> :== YES | NO
The decision is YES if the system believes that the to objects are equivalent, and NO otherwise.
<SCORE> :== NUMBER
A real number which indicates how confident the system is that the two objects are equivalent. High scores indicate strong belief, low scores indicate weak belief.
The following is an excerpt from a detection system output file.
# Artificial sld results, Errors,
Errors 10
APW19980104.0002 NYT19980104.0098 YES 0.00609613178059631
APW19980104.0017 VOA19980106.2100.0060 YES 0.999469214386953
APW19980104.0017 NYT19980107.0513 YES 0.7777613437

Example Output Report

-------------------------------------------------------------------------------
------------------  Detection Task Performance Report  ------------------

Command line:   /data/data2/TDT99/Software/.....
Execution Date: Fri Aug  6 11:13:14 EDT 1999

Story Weighted Story Link Detection:     P(Miss)           = 0.0730
                                         P(Fa)             = 0.0094
                                         CLink             = 0.0024
                                         Norm(CLink)       = 0.1191

Topic Weighted Story Link Detection:     P(Miss)           = 0.4311
                                         P(Fa)             = 0.0098
                                         CLink             = 0.0096
                                         Norm(CLink)   *   = 0.4793

  *   Primary Evaluation Metric

DET Graph Minimum Detection Cost Analysis:
     Story Weighted Minimum CLink = 0.0183 at P(Miss) = 0.8102 and P(Fa) = 0.0216
     Topic Weighted Minimum CLink = 0.0190 at P(Miss) = 0.9228 and P(Fa) = 0.0055

                   | # Corr  # Miss  # Corr    # Fa      ||                           | Norm   
   Topic           | Link    Link    ! Link    ! Link    || P(Miss)  P(Fa)   CLink    | CLink  
   -----           | ------  ------  --------  --------  || -------  -----   -------  | -------
   1               |   59       1      59         1      || 0.0167   0.0167  0.0020   | 0.0983 
   7               |   11       1     107         1      || 0.0833   0.0093  0.0026   | 0.1287 
   13              |    9       1     109         1      || 0.1000   0.0091  0.0029   | 0.1445 
   15              |    0       1     118         1      || 1.0000   0.0084  0.0208   | 1.0412 
   23              |   11       1     107         1      || 0.0833   0.0093  0.0026   | 0.1287 
   32              |    0       1     118         1      || 1.0000   0.0084  0.0208   | 1.0412 
   33              |    1       1     117         1      || 0.5000   0.0085  0.0108   | 0.5415 
   37              |    1       1     117         1      || 0.5000   0.0085  0.0108   | 0.5415 
   44              |    0       1     118         1      || 1.0000   0.0084  0.0208   | 1.0412 
   77              |   35       1      83         1      || 0.0278   0.0119  0.0017   | 0.0861 
   -----           | ------  ------  --------  --------  || -------  -----   -------  | -------
   Sums            |  127      10    1053        10      ||                           |        
   Story Weighted  |                                     || 0.0730   0.0094  0.0024   | 0.1191 
   Topic Weighted  |                                     || 0.4311   0.0098  0.0096   | 0.4793 


Key File:                ../indexes_small/sld_SRC=nwt+bnasr_TEST:SL=eng,CL=nat.key
System Output File:      sld_SRC=nwt+bnasr_TEST:SL=eng,CL=nat.sld
Cost Function Parameters:
              Ptarget  = 0.02
              Cmiss    = 1
              Cfa      = 0.1

Detection Performance Calculations:
    System Identifier:   Errors   Description: 'Artificial sld results, Errors,'
    Deferral Period:     10