The 'TDT3fsd.pl' program will score the output generated by a TDT3 First Story Detection (FSD) system. The program requires the directory path, 'Rootdir', to the LDC's TDT3 Test corpus. The corpus must be in the same structure as released by the LDC, with all file formats identical to their original form. The program uses the index file TDT3_fsd_index, provided with the test corpus and described below, to load the appropriate data from the corpus and to verify the completeness of the Fsd_system_output file.
Upon completion of the load, the first story detection decisions are scored, and a report is generated. The scoring of an Automatic First Story Detection System (AFSDS) consists of two phases, first, the decisions output by the AFSDS must be mapped onto the story boundaries annotated in the reference corpus. Then a topic set is built for each annotated topic. The first on-topic story for each topic is considered the first story for that topic, and the remaining on-topic stories are considered possible false alarms. The program computes performance metrics constrained to only the annotated topics.
The TDT3fsd.pl program can use two methods of mapping topic decisions to reference stories, majority vote or impulse vote. The two methods differ in the meaning implied by decision marker output by the AFSDS.
The program concludes it processing by first generating a first story detection performance report and then optionally a Decision Error Tradeoff (DET) graph.
The following <Options> are recognized by the program:
| -C Cmiss:Cfa | -> | Set the cost of a missed detection and the cost of a false alarm to 'Cmiss' and 'Cfa' respectively. These numbers are used in the detection cost function. Default values are Cmiss=1.0 and Cfa=0.1 ; | ||||||||||
| -D Detail | -> | Write internally organize evaluation corpus and pertinent statistics for debugging purposes. This report, though voluminous, is intended to help researchers debug their internal versions of evaluation code. | ||||||||||
| -E SubsetFile | -> | Compute performance excluding source files in the subset definition file. NOTE: Only the first set defined in the subset definition file is used for the filter. All others are ignored. | ||||||||||
| -j topicrel[:topicrel]* | -> | Specify alternative topic relevance files via the command line. More than one can be specified by concatenating the file names using a colon ':' separator. | ||||||||||
| -k FSD_key | -> | Write the loaded FSD answer key into 'FSD_key'. See the key format below. | ||||||||||
| -K FSD_key | -> | Rather than generate the FSD answer key from the TDT corpus, Read the 'FSD_key' file, and use it. See the key format below. | ||||||||||
| -m func | -> | Set the system output to story mapping function to either 'majority' or 'impulse'. Default is 'majority'. | ||||||||||
| -P P(topic) | -> | Use P(topic) for the detection cost function. Default is 0.02. | ||||||||||
| -r Report | -> | Write the summary report to 'Report' rather than STDOUT, the default. | ||||||||||
| -s | -> | Use all available speedups. Currently, the only speedups involve NOT using 'nsmgls' and 'SGMLS.pm' parser and PERL libraries to read the TDT3 Corpus files. | ||||||||||
| -v num | -> | Set the verbose level to 'num'. Default 1. ==0 None, ==1 Normal, >5 Slight, >10 way too much, >15 not even funny | ||||||||||
| -o LBL | -> | Treat the stories annotated as level 'LBL' as on topic. The default value is 'YES', but the value can also be 'YES+BRIEF', or 'BRIEF'. | ||||||||||
| | ||||||||||||
| -d DETfile | -> | Create a DET plot in GNUplot format with the file root 'DETfile'. The program makes several files each with additional extensions. The file 'DETfile'.plt is a command file for GNUplot and can be printed using the command "gnuplot 'DETfile'.plt | lpr". | ||||||||||
| -t title | -> | Set the title line for the plot to 'title'. | ||||||||||
| -p | -> | Produce a single story-weighted DET line trace. | ||||||||||
| -w | -> | Produce a single topic-weighted DET line trace. This is the default option. | ||||||||||
| -n | -> | Add 90% confidence intervals to the topic-weighted DET graph. | ||||||||||
| -T Topic_regexp | -> | Restrict the topics for which the index files
are created using the PERL regular expression 'Topic_regexp'. The default
is to use all occurring annotated topics. There are a number of macro names
for defined topic sets that may be used in place of regular expressions,
they are:
| ||||||||||
The BNF structure of the first story detection index file is:
Where:
| <HEADER_LINE> | :== | # FIRST_STORY <POINTER_TYPE> |
| | :== | RECID | TIME A POINTER_TYPE is the type of boundaries to be output by the system. The possible values are RECID for text stream first story detection or TIME for audio first story detection. |
| <SOURCE> | :== | TDT3 corpus filename with directory and extension names relative to the TDT3 root directory specified on the command line. |
# FIRST_STORY RECID tkntext/19980301_0553_0719_APW_ENG.tkn tkntext/19980301_1014_1116_APW_ENG.tkn tkntext/19980301_1403_1529_APW_ENG.tkn tkntext/19980301_2139_2341_APW_ENG.tkn tkntext/19980302_0630_0651_APW_ENG.tkn
The BNF structure of the first story detection system output file is:
Where:
| <SYSTEM> | :== | System is an alphanumeric character string that uniquely identifies the system being tested. (E.g., CDM_P05-8.v37) |
| <BOUNDARIES> | :== | Boundaries is either YES or NO, where YES indicates that story boundaries are supplied to the system being tested and NO indicates that they are not. |
| <DEF_PERIOD> | :== | The deferral period before before decisions are made. Permissible values defined by the TDT3 test specification. |
| <POINTER_TYPE> | :== | RECID | TIME POINTER_TYPE is the type of boundaries to be output by the system. The possible values are RECID for text stream first story detection or TIME for audio first story detection. |
| <DECISION_LINE> | :== | <SOURCE> <POINTER> <DECISION> <SCORE> |
| | :== | TDT3 corpus filename with directory and extension names relative to the TDT3 root directory specified on the command line. |
| | :== | POINTER is a hypothesized decision point. For text files, Pointer is the index number of the first word in the hypothesized segment, in the range {1, 2, . . .}. For audio files, Boundary is the time of the beginning of the segment {0.0, . . .}. (It isn't necessary to output the beginning of the first segment.) The hypothesized Boundary points must occur in chronological order. |
| | :== | Decision is either YES or NO, where YES indicates that the system believes that the story being processed discusses the target topic, and NO indicates not. |
| | :== | Score is a real number which indicates how confident the system is that the story being processed discusses the associated topic. More positive values indicate greater confidence. |
# Degenerate first story detection results, Errors, RECID Errors NO 10 RECID asrtext/19980307_1130_1200_CNN_HDL.asr 1 NO 0.209696400470383 asrtext/19980307_1130_1200_CNN_HDL.asr 66 NO 0.0652269882628861 asrtext/19980307_1130_1200_CNN_HDL.asr 254 NO 0.00793375334496783 asrtext/19980307_1130_1200_CNN_HDL.asr 318 NO 0.000951696625507463 asrtext/19980307_1130_1200_CNN_HDL.asr 675 NO 2.05484371633647e-10 asrtext/19980307_1130_1200_CNN_HDL.asr 762 YES 0.999999999524702
The file begins with a the tag <FSD_KEY> and ends with it's closing tag, </FSD_KEY>. Within the <FSD_KEY> tag, the following hierarchy exists:
| <FSD_KEY> | Contains | <TOPIC> +; |
| <TOPIC> | Contains | ( <TARG_STORY> | <NONTARG_STORY> ) + |
| <FSD_KEY> | None | ||
| <TOPIC> | id | -> | The topic identifier string, e.g. 1,33,45 |
| <TARG_STORY> | docno | -> | The TDT Document number |
| <NONTARG_STORY> | docno | -> | The TDT Document number |
<FSD_KEY> <TOPIC id=1> <TARG_STORY docno=APW19980194.0286> <NONTARG_STORY docno=NYT19980104.0111> <NONTARG_STORY docno=NYT19980104.0206> </TOPIC> <TOPIC id=15> <NONTARG_STORY docno=NYT19980108.0793> </TOPIC> </FSD_KEY>
-------------------------------------------------------------------------------
------------- TDT First Story Detection Task Performance Report ------------
Command line: ../../TDT3eval_v1.1/TDT3fsd.pl -R fsd_root -i ....
Execution Date: Fri May 28 08:05:20 EDT 1999
Story Weighted First Story Detection: P(Miss) = 0.3333
P(Fa) = 0.2222
Cfsd = 0.2244
Topic Weighted First Story Detection: P(Miss) = 0.3333
P(Fa) = 0.2500
Cfsd * = 0.3020
* Primary Evaluation Metric
First Story Detection Performance Calculations:
Ref. | # Corr # Miss # Corr # Fa ||
Topic # First # !First | First First ! First ! First || P(Miss) P(Fa) Cfsd
----- ------- -------- | ------ ------- ------ ------- || ------- ----- ------
71 1 2 | 1 0 2 0 || 0.0000 0.0000 0.0000
74 1 2 | 1 0 1 1 || 0.0000 0.5000 0.4900
76 1 2 | 0 1 2 0 || 1.0000 0.0000 0.0200
77 1 1 | 0 1 1 0 || 1.0000 0.0000 0.0200
78 1 1 | 1 0 0 1 || 0.0000 1.0000 0.9800
79 0 1 | -- -- 1 0 || -- 0.0000 --
80 1 0 | 1 0 -- -- || 0.0000 -- --
----- ------- -------- | ------ ------- ------ ------- || ------- ----- ------
Sums 6 9 | 4 2 7 2 ||
Story Weighted | || 0.3333 0.2222 0.2244
Topic Weighted | 0.7 0.3 1.2 0.3 || 0.3333 0.2500 0.3020
LDC TDT Corpus Root Dir: fsd_root
Index File: sys5.ndx
System Output File: sys5.fsd
Pointer Type: RECID
Cost Function Parameters:
P(topic) = 0.02
Cmiss = 1
Cfa = 1
Detection Performance Calculations:
System Identifier: det_boundary_DEF=10
Deferral Period: 10
System Output to Story Mapping Function: 'majority'
---------------- End of TDT Detection Task Performance Report ---------------
-------------------------------------------------------------------------------