The 'DetectionScore.pl' program will score the output generated by an automatic detection system. The program assumes the following about the performed detection task:
The program requires two inputs, a answer key file (via the -K option), and a detection system output file. The answer key file defines the compared objects and whether or not the two objects are presumed equivalent. The system output file records the system's decision and scores for the decisions.
The program computes a variety of performance statistics, and then generates a scoring report and optionally Decision Error Tradeoff (DET) graphs.
Detection performance is characterized in terms of the probability of miss and false alarm errors (Pmiss and Pfa). The error probabilities are then combined into a single detection cost Cdet, by assigning costs to miss and false alarm errors:
Using these formulas, performance is measures in two ways,
decision weighted or block weighted. Decision weighted
performance, (sometimes called pooled or macro performance), weights
each decision equally. These are global performance statistics, but
no mean or variance can be associated with the performance
variability. Block weighted performance computes decision weighted
performance statistics on subsets, or blocks, of the test set, and
then reports the mean of those statistics. The advantage of block
weighted statistics is it has a reduced variance. The subsets can be,
and often are, non-uniform in size.
The following <Options> are recognized by the program:
| -C Cmiss:Cfa | -> | Set the cost of a missed detection and the cost of a false alarm to 'Cmiss' and 'Cfa' respectively. These numbers are used in the detection cost function. Default values are Cmiss=1.0 and Cfa=0.1 ; |
| -D Detail | -> | Write internally organize evaluation corpus and pertinent statistics for debugging purposes. This report, though voluminous, is intended to help researchers debug their internal versions of evaluation code. |
| -N TaskID,BlockID,DecisionID | -> | Define the names used in the reports. 'TaskID' is the detection task name, 'Link' is the default. 'BlockID' is the name that describes the block divisions, 'Topic' is the default. 'DecisionID' describes what individual decisions are made on, 'Story' is the default. |
| -P Ptarget | -> | Use Ptarget for the detection cost functions. |
| -r Report | -> | Write the summary report to 'Report' rather than STDOUT, the default. |
| -S | -> | If this flag is used, system output entries not present in the key file are ignored during scoring. |
| -v num | -> | Set the verbose level to 'num'. Default 1. ==0 None, ==1 Normal, >5 Slight, >10 way too much, >15 not even funny |
| | ||
| -d DETfile | -> | Create a DET plot in GNUplot format with the file root 'DETfile'. The program makes several files each with additional extensions. The file 'DETfile'.plt is a command file for GNUplot and can be printed using the command "gnuplot 'DETfile'.plt | lpr". |
| -t title | -> | Set the title line for the plot to 'title'. |
| -p | -> | Produce a single story-weighted DET line trace. |
| -w | -> | Produce a single topic-weighted DET line trace. This is the default option. |
| -n | -> | Add 90% confidence intervals to the topic-weighted DET graph. |
| -Z uncompress | -> | Specify the command for uncompressing the system output files prior to scoring. The decompression applies to ONLY the system decision files, not the file lists. The command is executed by opening a pipe command if the system output file ends with a .Z or .gz suffix. The command is required to read a compressed stream from STDIN, and write the uncompressed stream to STDOUT. |
The BNF structure of the key file is:
Where:
| <HEADER_LINE> | :== | # LINK_DETECTION 'LINK_DETECTION' is the expected value. However, if another symbol is used, an ignorable warning message will be generated. |
| <DETECTION_OBJECT> | :== | <OBJECT> <OBJECT> <TRUTH> <BLOCKID> |
| | :== | STRING A text string identifying the object to be compared. The program does not derive any meaning from this string, except to cross reference the key entries to the system output. |
| | :== | TARGET | NONTARGET Specify whether or not the two objects are equivalent, a 'TARGET', or not a 'NONTARGET'. |
| | :== | STRING Specify which 'block' this detection pair belongs to. The STRING will be sorted numerically in the reports, so care should be taken to choose appropriate strings. If your detection evaluation does not support the notion, specify an identical value for all pairs. |
# LINK_DETECTION # # Record format : ': : TARGET|NONTARGET ' # APW19980104.0002 NYT19980104.0098 NONTARGET 44 APW19980104.0012 NYT19980105.0840 NONTARGET 33
The BNF structure of the detection system output file is:
Where:
| <SYSTEM> | :== | System is an alphanumeric character string that uniquely identifies the system being tested. (E.g., CDM_P05-8.v37) |
| <DEF_PERIOD> | :== | The deferral period before before decisions are made. This field exists in support of the TDT3 evaluation. It must not be omitted, however the program will issue a warning about non-standard deferral values. |
| <DECISION_LINE> | :== | <OBJECT> <OBJECT> <DECISION> <SCORE> |
| | :== | STRING A text string identifying the object to be compared. The program does not derive any meaning from this string, except to cross reference the key entries to the system output. |
| | :== | YES | NO The decision is YES if the system believes that the to objects are equivalent, and NO otherwise. |
| | :== | NUMBER A real number which indicates how confident the system is that the two objects are equivalent. High scores indicate strong belief, low scores indicate weak belief. |
# Artificial sld results, Errors, Errors 10 APW19980104.0002 NYT19980104.0098 YES 0.00609613178059631 APW19980104.0017 VOA19980106.2100.0060 YES 0.999469214386953 APW19980104.0017 NYT19980107.0513 YES 0.7777613437
-------------------------------------------------------------------------------
------------------ Detection Task Performance Report ------------------
Command line: /data/data2/TDT99/Software/.....
Execution Date: Fri Aug 6 11:13:14 EDT 1999
Story Weighted Story Link Detection: P(Miss) = 0.0730
P(Fa) = 0.0094
CLink = 0.0024
Norm(CLink) = 0.1191
Topic Weighted Story Link Detection: P(Miss) = 0.4311
P(Fa) = 0.0098
CLink = 0.0096
Norm(CLink) * = 0.4793
* Primary Evaluation Metric
DET Graph Minimum Detection Cost Analysis:
Story Weighted Minimum CLink = 0.0183 at P(Miss) = 0.8102 and P(Fa) = 0.0216
Topic Weighted Minimum CLink = 0.0190 at P(Miss) = 0.9228 and P(Fa) = 0.0055
| # Corr # Miss # Corr # Fa || | Norm
Topic | Link Link ! Link ! Link || P(Miss) P(Fa) CLink | CLink
----- | ------ ------ -------- -------- || ------- ----- ------- | -------
1 | 59 1 59 1 || 0.0167 0.0167 0.0020 | 0.0983
7 | 11 1 107 1 || 0.0833 0.0093 0.0026 | 0.1287
13 | 9 1 109 1 || 0.1000 0.0091 0.0029 | 0.1445
15 | 0 1 118 1 || 1.0000 0.0084 0.0208 | 1.0412
23 | 11 1 107 1 || 0.0833 0.0093 0.0026 | 0.1287
32 | 0 1 118 1 || 1.0000 0.0084 0.0208 | 1.0412
33 | 1 1 117 1 || 0.5000 0.0085 0.0108 | 0.5415
37 | 1 1 117 1 || 0.5000 0.0085 0.0108 | 0.5415
44 | 0 1 118 1 || 1.0000 0.0084 0.0208 | 1.0412
77 | 35 1 83 1 || 0.0278 0.0119 0.0017 | 0.0861
----- | ------ ------ -------- -------- || ------- ----- ------- | -------
Sums | 127 10 1053 10 || |
Story Weighted | || 0.0730 0.0094 0.0024 | 0.1191
Topic Weighted | || 0.4311 0.0098 0.0096 | 0.4793
Key File: ../indexes_small/sld_SRC=nwt+bnasr_TEST:SL=eng,CL=nat.key
System Output File: sld_SRC=nwt+bnasr_TEST:SL=eng,CL=nat.sld
Cost Function Parameters:
Ptarget = 0.02
Cmiss = 1
Cfa = 0.1
Detection Performance Calculations:
System Identifier: Errors Description: 'Artificial sld results, Errors,'
Deferral Period: 10