TDT3det.pl User Manual
TDT3det.pl User Manual

TDT3 Detection Task Scoring


Usage:

TDT3det.pl -R Rootdir -i TDT3_det_index <Options> Det_system_output

The 'TDT3det.pl' program will score the output generated by a TDT3 detection system. The program requires the directory path, 'Rootdir', to the LDC's TDT3 Test corpus. The corpus must be in the same structure as released by the LDC, with all file formats identical to their original form. The program uses the index file TDT3_det_index, provided with the test corpus and described below, to load the appropriate data from the corpus and to verify the completeness of the Det_system_output file.

Upon completion of the load, the detection decisions are scored, and a report is generated. The scoring of an Automatic Topic Detecting System (ATDS) consists of two phases, first, the decisions output by the ATDS must be mapped onto the story boundaries annotated in the reference corpus. Then, topic sets are built for both the reference and hypothesis data and the best scoring correspondences between the ref and hyp clusters, according to the "Detection Cost Function" (DCF), are scored.

The TDT3det.pl program can use two methods of mapping topic decisions to reference stories, majority vote or impulse vote. The two methods differ in the meaning implied by decision marker output by the ATDS.

Majority Vote Each story decision is computed as the majority of decisions over all words (or time) in that story, and each story score is computed as the average score over all words (time) in that story. Ties in computing the decisions are broken by choosing the decision with the maximum score. Impulse Vote Each story decision is made by selection the decision with the maximum score within the boundaries of the story. In the event no decisions were made for the story, the decision is 'NO' with a score of -infinity.

Once the hypothesized topic decisions and scores are assigned to the ref stories, topic clusters are built for both the ref and hyp topic sets. Since no direct correspondences exist between the ref and hyp topic identifiers, a detection cost function is used to find a mapping. The detection cost function can be found in the The Topic Detection and Tracking (TDT3) Evaluation Plan (In Microsoft Word). The options '-P' and '-C' modify the behavior of the cost function.

The program concludes it processing by first generating a detection performance report and then optionally a Decision Error Tradeoff (DET) graph. As part of the report, the program computes the Cost-Based YDZ measure at several ratios of the cost of a story examination to the cost of missing a story.

The following <Options> are recognized by the program:

-C Cmiss:Cfa -> Set the cost of a missed detection and the cost of a false alarm to 'Cmiss' and 'Cfa' respectively. These numbers are used in the detection cost function. Default values are Cmiss=1.0 and Cfa=0.1 ;
-D Detail -> Write internally organize evaluation corpus and pertinent statistics for debugging purposes. This report, though voluminous, is intended to help researchers debug their internal versions of evaluation code.
-j topicrel[:topicrel]* -> Specify alternative topic relevance files via the command line. More than one can be specified by concatenating the file names using a colon ':' separator.
-L -> After Loading the database, dump it to stdout and exit.
-m func -> Set the system output to story mapping function to either 'majority' or 'impulse'. Default is 'majority'.
-P P(topic) -> Use P(topic) for the detection cost function to map topic clusters. Default is 0.02.
-r Report -> Write the summary report to 'Report' rather than STDOUT, the default.
-s -> Use all available speedups. Currently, the only speedups involve NOT using 'nsmgls' and 'SGMLS.pm' parser and PERL libraries to read the TDT3 Corpus files.
-S SubsetFile -> Compute detection performance over the subsets defined in 'SubsetFile'. See the documentation below for a description of the file's format.
-E SubsetFile -> Compute performance excluding source files in the subset definition file. The application of this filter is global, in that the source file is ignored prior to establishing subsets defined by the -S option. NOTE: Only the first set defined in the subset definition file is used for the filter. All others are ignored.
-v num -> Set the verbose level to 'num'. Default 1.
==0 None, ==1 Normal, >5 Slight, >10 way too much, >15 not even funny
Options that apply to the DET plots:
-d DETfile -> Create a DET plot in GNUplot format with the file root 'DETfile'. The program makes several files each with additional extensions. The file 'DETfile'.plt is a command file for GNUplot and can be printed using the command "gnuplot 'DETfile'.plt | lpr". The default plot produces a line trace for each line in the 'Trk_system_output_list' list. The options below modify this.
-t title -> Set the title line for the plot to 'title'.
-T Topic_regexp -> Restrict the topics for which the index files are created using the PERL regular expression 'Topic_regexp'. The default is to use all occurring annotated topics. There are a number of macro names for defined topic sets that may be used in place of regular expressions, they are:
Macro name Equivalent Expression
TDT98_Train 20+([1-9]|[12][0-9]|3[0-7])
TDT98_DevTest 20+(3[89]|[45][0-9]|6[0-6])
TDT98_EvalTest 20+(6[7-9]|[89][0-9]|100)
TDT99_mul 20+(1|2|5|7|13|15|20|23|39|44|48|57|70|71|76|85|88|89|91|96)

Detection Task Index File Format

The index file for the detection file is as follows. The first line in the index file is a header line. The line indicates the TDT3 task, 'DETECTION' in this case, and the type of pointer used to mark segment changes. Each subsequent data record in the file will identify a source file to process. These records will have one field and be separated with a newline.

The BNF structure of the detection index file is:

<HEADER_LINE>
<SOURCE>
<SOURCE>
...

Where:

<HEADER_LINE> :== # DETECTION <POINTER_TYPE>
<POINTER_TYPE> :== RECID | TIME
A POINTER_TYPE is the type of boundaries to be output by the system. The possible values are RECID for text stream detection or TIME for audio detection.
<SOURCE> :== TDT3 corpus filename with directory and extension names relative to the TDT3 root directory specified on the command line.
The following is an excerpt from a detection task index file.
# DETECTION RECID
tkntext/19980301_0553_0719_APW_ENG.tkn
tkntext/19980301_1014_1116_APW_ENG.tkn
tkntext/19980301_1403_1529_APW_ENG.tkn
tkntext/19980301_2139_2341_APW_ENG.tkn
tkntext/19980302_0630_0651_APW_ENG.tkn

Detection Task System Output Format

The Topic Identification task is to detect topics and then to hypothesize points in the source stream where they are discussed. Topic Identification systems will perform this task by recording information about these hypothesized points in a file, one record for each putative discussion of a topic, written in ASCII format. The first record in this file will contain four fields which specify information that applies globally to the whole file. Comment lines begin with the '#' character, and any text following a '#' is ignored. The exception to this rule is the first comment line can optionally contain a long description of the system under test. This description will be included in the scoring report along side the <SYSTEM> value described below. After the initial comment line, blank lines are treated as comments.

The BNF structure of the detection system output file is:

<SYSTEM> <BOUNDARIES> <DEF_PERIOD> <POINTER_TYPE>
<DECISION_LINE>
<DECISION_LINE>
...

Where:

<SYSTEM> :== System is an alphanumeric character string that uniquely identifies the system being tested. (E.g., CDM_P05-8.v37)
<BOUNDARIES> :== Boundaries is either YES or NO, where YES indicates that story boundaries are supplied to the system being tested and NO indicates that they are not.
<DEF_PERIOD> :== The deferral period before before decisions are made. Permissible values defined by the TDT3 test specification.
<POINTER_TYPE> :== RECID | TIME
POINTER_TYPE is the type of boundaries to be output by the system. The possible values are RECID for text stream detection or TIME for audio detection.
<DECISION_LINE> :== <TOPIC> <SOURCE> <POINTER> <DECISION> <SCORE>
<TOPIC> :== TDT3 detection system defined topic identifier.
<SOURCE> :== TDT3 corpus filename with directory and extension names relative to the TDT3 root directory specified on the command line.
<POINTER> :== POINTER is a hypothesized decision point. For text files, Pointer is the index number of the first word in the hypothesized segment, in the range {1, 2, . . .}. For audio files, Boundary is the time of the beginning of the segment {0.0, . . .}. (It isn't necessary to output the beginning of the first segment.) The hypothesized Boundary points must occur in chronological order.
<DECISION> :== Decision is either YES or NO, where YES indicates that the system believes that the story being processed discusses the target topic, and NO indicates not.
<SCORE> :== Score is a real number which indicates how confident the system is that the story being processed discusses the associated topic. More positive values indicate greater confidence.
The following is an excerpt from a detection system output file.
# Degenerate detection results, Errors, RECID
Errors NO 10 RECID
176 asrtext/19980307_1130_1200_CNN_HDL.asr 1 NO 0.209696400470383
177 asrtext/19980307_1130_1200_CNN_HDL.asr 66 NO 0.0652269882628861
178 asrtext/19980307_1130_1200_CNN_HDL.asr 254 NO 0.00793375334496783
179 asrtext/19980307_1130_1200_CNN_HDL.asr 318 NO 0.000951696625507463
170 asrtext/19980307_1130_1200_CNN_HDL.asr 675 NO 2.05484371633647e-10
144 asrtext/19980307_1130_1200_CNN_HDL.asr 762 YES 0.999999999524702

Example Output Report

-------------------------------------------------------------------------------
-------------------  TDT Detection Task Performance Report  ------------------


Story Weighted Detection: P(Miss)       = 0.1143
                          P(Fa)         = 0.0028
                          Cdet          = 0.0050

Topic Weighted Detection: P(Miss)       = 0.0723
                          P(Fa)         = 0.0028
                          Cdet          = 0.0042

Detection Performance Calculations:

    Ref.   Hyp.    # Ref  # Sys.  # Corr  # Miss  # Fa   | # Test                           
    Topic  Topic   Story  Story   Story   Story   Story  | Story   P(Miss)  P(Fa)   Cdet    
    -----  -----   -----  -----   ------  ------  -----  | ------  -------  -----   --------
    40     140       3      6       3       0       3    | 3085    0.0000   0.0010  0.0010  
    41     141      13     25      12       1      13    | 3085    0.0769   0.0042  0.0057  
    42     142      17     21      14       3       7    | 3085    0.1765   0.0023  0.0058  
    44     144      24     45      21       3      24    | 3085    0.1250   0.0078  0.0102  
    46     146       3      4       3       0       1    | 3085    0.0000   0.0003  0.0003  
    52     152       5      6       4       1       2    | 3085    0.2000   0.0006  0.0046  
    53     153       3      9       3       0       6    | 3085    0.0000   0.0019  0.0019  
    56     156       2     14       2       0      12    | 3085    0.0000   0.0039  0.0038  
    =====  =====   =====  =====   ======  ======  =====  | ======  =======  =====   ========
    Story  Weight                                                  0.1143   0.0028  0.0050  
    Topic  Sums     70    130      62       8      68    | 24680   
    Topic  Means     8.8   16.2     7.8     1.0     8.5  | 3085.0  0.0723   0.0028  0.0042  


Cost Based YDZ Calculations:

    Cexam  Cmiss   Ccluster    Cmin    Cmax        Cnorm   
    -----  -----   --------    ----    ----        -----   
    1      1       384.70      126.00  24749.65    0.010506
    1      10      662.36      126.00  25376.47    0.021242
    1      100     3438.97     126.00  31644.67    0.105111
    1      1000    31205.04    126.00  94326.68    0.329924
    1      10000   308865.76   126.00  721146.77   0.428198
    1      100000  3085472.90  126.00  6989347.75  0.441444


Detection Performance by Test Subset:

    Sub-      || Ref.   Hyp.    # Ref  # Sys.  # Corr  # Miss  # Fa   | # Test                           
    set       || Topic  Topic   Story  Story   Story   Story   Story  | Story   P(Miss)  P(Fa)   Cdet    
    ----      || -----  -----   -----  -----   ------  ------  -----  | ------  -------  -----   --------
    Audio     || 40     140       2      4       2       0       2    | 1446    0.0000   0.0014  0.0014  
    Audio     || 41     141      10     16       9       1       6    | 1446    0.1000   0.0042  0.0061  
    Audio     || 42     142      13     19      10       3       6    | 1446    0.2308   0.0042  0.0087  
    Audio     || 44     144      13     20      11       2       7    | 1446    0.1538   0.0049  0.0079  
    Audio     || 46     146       3      4       3       0       1    | 1446    0.0000   0.0007  0.0007  
    Audio     || 52     152       0      1       0       0       1    | 1446    0.0000   0.0007  0.0007  
    Audio     || 53     153       2      5       2       0       3    | 1446    0.0000   0.0021  0.0020  
    Audio     || 56     156       2      5       2       0       3    | 1446    0.0000   0.0021  0.0020  
    ====      || =====  =====   =====  =====   ======  ======  =====  | ======  =======  =====   ========
    Audio     || Story  Weight                                        |         0.1333   0.0025  0.0051  
    Audio     || Topic  Sums     45     74      39       6      29    | 11568   
    Audio     || Topic  Means     5.6    9.2     4.9     0.8     3.6  | 1446.0  0.0606   0.0025  0.0037  
    
    
    Newswire  || 40     140       1      2       1       0       1    | 1639    0.0000   0.0006  0.0006  
    Newswire  || 41     141       3     10       3       0       7    | 1639    0.0000   0.0043  0.0042  
    Newswire  || 42     142       4      5       4       0       1    | 1639    0.0000   0.0006  0.0006  
    Newswire  || 44     144      11     28      10       1      17    | 1639    0.0909   0.0104  0.0121  
    Newswire  || 46     146       0      0       0       0       0    | 1639    0.0000   0.0000  0.0000  
    Newswire  || 52     152       5      6       4       1       1    | 1639    0.2000   0.0006  0.0046  
    Newswire  || 53     153       1      4       1       0       3    | 1639    0.0000   0.0018  0.0018  
    Newswire  || 56     156       0      9       0       0       9    | 1639    0.0000   0.0055  0.0054  
    ====      || =====  =====   =====  =====   ======  ======  =====  | ======  =======  =====   ========
    Newswire  || Story  Weight                                        |         0.0800   0.0030  0.0045  
    Newswire  || Topic  Sums     25     64      23       2      39    | 13112   
    Newswire  || Topic  Means     3.1    8.0     2.9     0.2     4.9  | 1639.0  0.0364   0.0030  0.0037  
    
    


Execution parameters:

Index File:              ../indexes_devtest/det_nwt+asr.ndx
System Output File:      det_nwt+asr.det
Pointer Type:            RECID
Topic cluster Mapping Function:
              P(topic) = 0.02
              Cmiss    = 1
              Cfa      = 1
              Topic inclusion by: hard_decision

Detection Performance Calculations:
    System Identifier:   Errors 'Degenerate detection results, Errors, RECID'
    Deferral Period:     10

System Output to Story Mapping Function:  'majority'

----------------  End of TDT Detection Task Performance Report  ---------------
-------------------------------------------------------------------------------
Successful Completion