The 'TDT3det.pl' program will score the output generated by a TDT3 detection system. The program requires the directory path, 'Rootdir', to the LDC's TDT3 Test corpus. The corpus must be in the same structure as released by the LDC, with all file formats identical to their original form. The program uses the index file TDT3_det_index, provided with the test corpus and described below, to load the appropriate data from the corpus and to verify the completeness of the Det_system_output file.
Upon completion of the load, the detection decisions are scored, and a report is generated. The scoring of an Automatic Topic Detecting System (ATDS) consists of two phases, first, the decisions output by the ATDS must be mapped onto the story boundaries annotated in the reference corpus. Then, topic sets are built for both the reference and hypothesis data and the best scoring correspondences between the ref and hyp clusters, according to the "Detection Cost Function" (DCF), are scored.
The TDT3det.pl program can use two methods of mapping topic decisions to reference stories, majority vote or impulse vote. The two methods differ in the meaning implied by decision marker output by the ATDS.
Once the hypothesized topic decisions and scores are assigned to the ref stories, topic clusters are built for both the ref and hyp topic sets. Since no direct correspondences exist between the ref and hyp topic identifiers, a detection cost function is used to find a mapping. The detection cost function can be found in the The Topic Detection and Tracking (TDT3) Evaluation Plan (In Microsoft Word). The options '-P' and '-C' modify the behavior of the cost function.
The program concludes it processing by first generating a detection performance report and then optionally a Decision Error Tradeoff (DET) graph. As part of the report, the program computes the Cost-Based YDZ measure at several ratios of the cost of a story examination to the cost of missing a story.
The following <Options> are recognized by the program:
| -C Cmiss:Cfa | -> | Set the cost of a missed detection and the cost of a false alarm to 'Cmiss' and 'Cfa' respectively. These numbers are used in the detection cost function. Default values are Cmiss=1.0 and Cfa=0.1 ; | ||||||||||
| -D Detail | -> | Write internally organize evaluation corpus and pertinent statistics for debugging purposes. This report, though voluminous, is intended to help researchers debug their internal versions of evaluation code. | ||||||||||
| -j topicrel[:topicrel]* | -> | Specify alternative topic relevance files via the command line. More than one can be specified by concatenating the file names using a colon ':' separator. | ||||||||||
| -L | -> | After Loading the database, dump it to stdout and exit. | ||||||||||
| -m func | -> | Set the system output to story mapping function to either 'majority' or 'impulse'. Default is 'majority'. | ||||||||||
| -P P(topic) | -> | Use P(topic) for the detection cost function to map topic clusters. Default is 0.02. | ||||||||||
| -r Report | -> | Write the summary report to 'Report' rather than STDOUT, the default. | ||||||||||
| -s | -> | Use all available speedups. Currently, the only speedups involve NOT using 'nsmgls' and 'SGMLS.pm' parser and PERL libraries to read the TDT3 Corpus files. | ||||||||||
| -S SubsetFile | -> | Compute detection performance over the subsets defined in 'SubsetFile'. See the documentation below for a description of the file's format. | ||||||||||
| -E SubsetFile | -> | Compute performance excluding source files in the subset definition file. The application of this filter is global, in that the source file is ignored prior to establishing subsets defined by the -S option. NOTE: Only the first set defined in the subset definition file is used for the filter. All others are ignored. | ||||||||||
| -v num | -> | Set the verbose level to 'num'. Default 1. ==0 None, ==1 Normal, >5 Slight, >10 way too much, >15 not even funny | ||||||||||
| | ||||||||||||
| -d DETfile | -> | Create a DET plot in GNUplot format with the file root 'DETfile'. The program makes several files each with additional extensions. The file 'DETfile'.plt is a command file for GNUplot and can be printed using the command "gnuplot 'DETfile'.plt | lpr". The default plot produces a line trace for each line in the 'Trk_system_output_list' list. The options below modify this. | ||||||||||
| -t title | -> | Set the title line for the plot to 'title'. | ||||||||||
| -T Topic_regexp | -> | Restrict the topics for which the index files
are created using the PERL regular expression 'Topic_regexp'. The default
is to use all occurring annotated topics. There are a number of macro names
for defined topic sets that may be used in place of regular expressions,
they are:
| ||||||||||
The BNF structure of the detection index file is:
Where:
| <HEADER_LINE> | :== | # DETECTION <POINTER_TYPE> |
| | :== | RECID | TIME A POINTER_TYPE is the type of boundaries to be output by the system. The possible values are RECID for text stream detection or TIME for audio detection. |
| <SOURCE> | :== | TDT3 corpus filename with directory and extension names relative to the TDT3 root directory specified on the command line. |
# DETECTION RECID tkntext/19980301_0553_0719_APW_ENG.tkn tkntext/19980301_1014_1116_APW_ENG.tkn tkntext/19980301_1403_1529_APW_ENG.tkn tkntext/19980301_2139_2341_APW_ENG.tkn tkntext/19980302_0630_0651_APW_ENG.tkn
The BNF structure of the detection system output file is:
Where:
| <SYSTEM> | :== | System is an alphanumeric character string that uniquely identifies the system being tested. (E.g., CDM_P05-8.v37) |
| <BOUNDARIES> | :== | Boundaries is either YES or NO, where YES indicates that story boundaries are supplied to the system being tested and NO indicates that they are not. |
| <DEF_PERIOD> | :== | The deferral period before before decisions are made. Permissible values defined by the TDT3 test specification. |
| <POINTER_TYPE> | :== | RECID | TIME POINTER_TYPE is the type of boundaries to be output by the system. The possible values are RECID for text stream detection or TIME for audio detection. |
| <DECISION_LINE> | :== | <TOPIC> <SOURCE> <POINTER> <DECISION> <SCORE> |
| | :== | TDT3 detection system defined topic identifier. |
| | :== | TDT3 corpus filename with directory and extension names relative to the TDT3 root directory specified on the command line. |
| | :== | POINTER is a hypothesized decision point. For text files, Pointer is the index number of the first word in the hypothesized segment, in the range {1, 2, . . .}. For audio files, Boundary is the time of the beginning of the segment {0.0, . . .}. (It isn't necessary to output the beginning of the first segment.) The hypothesized Boundary points must occur in chronological order. |
| | :== | Decision is either YES or NO, where YES indicates that the system believes that the story being processed discusses the target topic, and NO indicates not. |
| | :== | Score is a real number which indicates how confident the system is that the story being processed discusses the associated topic. More positive values indicate greater confidence. |
# Degenerate detection results, Errors, RECID Errors NO 10 RECID 176 asrtext/19980307_1130_1200_CNN_HDL.asr 1 NO 0.209696400470383 177 asrtext/19980307_1130_1200_CNN_HDL.asr 66 NO 0.0652269882628861 178 asrtext/19980307_1130_1200_CNN_HDL.asr 254 NO 0.00793375334496783 179 asrtext/19980307_1130_1200_CNN_HDL.asr 318 NO 0.000951696625507463 170 asrtext/19980307_1130_1200_CNN_HDL.asr 675 NO 2.05484371633647e-10 144 asrtext/19980307_1130_1200_CNN_HDL.asr 762 YES 0.999999999524702
-------------------------------------------------------------------------------
------------------- TDT Detection Task Performance Report ------------------
Story Weighted Detection: P(Miss) = 0.1143
P(Fa) = 0.0028
Cdet = 0.0050
Topic Weighted Detection: P(Miss) = 0.0723
P(Fa) = 0.0028
Cdet = 0.0042
Detection Performance Calculations:
Ref. Hyp. # Ref # Sys. # Corr # Miss # Fa | # Test
Topic Topic Story Story Story Story Story | Story P(Miss) P(Fa) Cdet
----- ----- ----- ----- ------ ------ ----- | ------ ------- ----- --------
40 140 3 6 3 0 3 | 3085 0.0000 0.0010 0.0010
41 141 13 25 12 1 13 | 3085 0.0769 0.0042 0.0057
42 142 17 21 14 3 7 | 3085 0.1765 0.0023 0.0058
44 144 24 45 21 3 24 | 3085 0.1250 0.0078 0.0102
46 146 3 4 3 0 1 | 3085 0.0000 0.0003 0.0003
52 152 5 6 4 1 2 | 3085 0.2000 0.0006 0.0046
53 153 3 9 3 0 6 | 3085 0.0000 0.0019 0.0019
56 156 2 14 2 0 12 | 3085 0.0000 0.0039 0.0038
===== ===== ===== ===== ====== ====== ===== | ====== ======= ===== ========
Story Weight 0.1143 0.0028 0.0050
Topic Sums 70 130 62 8 68 | 24680
Topic Means 8.8 16.2 7.8 1.0 8.5 | 3085.0 0.0723 0.0028 0.0042
Cost Based YDZ Calculations:
Cexam Cmiss Ccluster Cmin Cmax Cnorm
----- ----- -------- ---- ---- -----
1 1 384.70 126.00 24749.65 0.010506
1 10 662.36 126.00 25376.47 0.021242
1 100 3438.97 126.00 31644.67 0.105111
1 1000 31205.04 126.00 94326.68 0.329924
1 10000 308865.76 126.00 721146.77 0.428198
1 100000 3085472.90 126.00 6989347.75 0.441444
Detection Performance by Test Subset:
Sub- || Ref. Hyp. # Ref # Sys. # Corr # Miss # Fa | # Test
set || Topic Topic Story Story Story Story Story | Story P(Miss) P(Fa) Cdet
---- || ----- ----- ----- ----- ------ ------ ----- | ------ ------- ----- --------
Audio || 40 140 2 4 2 0 2 | 1446 0.0000 0.0014 0.0014
Audio || 41 141 10 16 9 1 6 | 1446 0.1000 0.0042 0.0061
Audio || 42 142 13 19 10 3 6 | 1446 0.2308 0.0042 0.0087
Audio || 44 144 13 20 11 2 7 | 1446 0.1538 0.0049 0.0079
Audio || 46 146 3 4 3 0 1 | 1446 0.0000 0.0007 0.0007
Audio || 52 152 0 1 0 0 1 | 1446 0.0000 0.0007 0.0007
Audio || 53 153 2 5 2 0 3 | 1446 0.0000 0.0021 0.0020
Audio || 56 156 2 5 2 0 3 | 1446 0.0000 0.0021 0.0020
==== || ===== ===== ===== ===== ====== ====== ===== | ====== ======= ===== ========
Audio || Story Weight | 0.1333 0.0025 0.0051
Audio || Topic Sums 45 74 39 6 29 | 11568
Audio || Topic Means 5.6 9.2 4.9 0.8 3.6 | 1446.0 0.0606 0.0025 0.0037
Newswire || 40 140 1 2 1 0 1 | 1639 0.0000 0.0006 0.0006
Newswire || 41 141 3 10 3 0 7 | 1639 0.0000 0.0043 0.0042
Newswire || 42 142 4 5 4 0 1 | 1639 0.0000 0.0006 0.0006
Newswire || 44 144 11 28 10 1 17 | 1639 0.0909 0.0104 0.0121
Newswire || 46 146 0 0 0 0 0 | 1639 0.0000 0.0000 0.0000
Newswire || 52 152 5 6 4 1 1 | 1639 0.2000 0.0006 0.0046
Newswire || 53 153 1 4 1 0 3 | 1639 0.0000 0.0018 0.0018
Newswire || 56 156 0 9 0 0 9 | 1639 0.0000 0.0055 0.0054
==== || ===== ===== ===== ===== ====== ====== ===== | ====== ======= ===== ========
Newswire || Story Weight | 0.0800 0.0030 0.0045
Newswire || Topic Sums 25 64 23 2 39 | 13112
Newswire || Topic Means 3.1 8.0 2.9 0.2 4.9 | 1639.0 0.0364 0.0030 0.0037
Execution parameters:
Index File: ../indexes_devtest/det_nwt+asr.ndx
System Output File: det_nwt+asr.det
Pointer Type: RECID
Topic cluster Mapping Function:
P(topic) = 0.02
Cmiss = 1
Cfa = 1
Topic inclusion by: hard_decision
Detection Performance Calculations:
System Identifier: Errors 'Degenerate detection results, Errors, RECID'
Deferral Period: 10
System Output to Story Mapping Function: 'majority'
---------------- End of TDT Detection Task Performance Report ---------------
-------------------------------------------------------------------------------
Successful Completion