TDT3fsd.pl User Manual
TDT3fsd.pl User Manual

TDT3 First Story Detection Task Scoring


Usage:

TDT3fsd.pl -R Rootdir -i TDT3_fsd_index <Options> Fsd_system_output

The 'TDT3fsd.pl' program will score the output generated by a TDT3 First Story Detection (FSD) system. The program requires the directory path, 'Rootdir', to the LDC's TDT3 Test corpus. The corpus must be in the same structure as released by the LDC, with all file formats identical to their original form. The program uses the index file TDT3_fsd_index, provided with the test corpus and described below, to load the appropriate data from the corpus and to verify the completeness of the Fsd_system_output file.

Upon completion of the load, the first story detection decisions are scored, and a report is generated. The scoring of an Automatic First Story Detection System (AFSDS) consists of two phases, first, the decisions output by the AFSDS must be mapped onto the story boundaries annotated in the reference corpus. Then a topic set is built for each annotated topic. The first on-topic story for each topic is considered the first story for that topic, and the remaining on-topic stories are considered possible false alarms. The program computes performance metrics constrained to only the annotated topics.

The TDT3fsd.pl program can use two methods of mapping topic decisions to reference stories, majority vote or impulse vote. The two methods differ in the meaning implied by decision marker output by the AFSDS.

Majority Vote Each story decision is computed as the majority of decisions over all words (or time) in that story, and each story score is computed as the average score over all words (time) in that story. Ties in computing the decisions are broken by choosing the decision with the maximum score. Impulse Vote Each story decision is made by selection the decision with the maximum score within the boundaries of the story. In the event no decisions were made for the story, the decision is 'NO' with a score of -infinity.

The program concludes it processing by first generating a first story detection performance report and then optionally a Decision Error Tradeoff (DET) graph.

The following <Options> are recognized by the program:

-C Cmiss:Cfa -> Set the cost of a missed detection and the cost of a false alarm to 'Cmiss' and 'Cfa' respectively. These numbers are used in the detection cost function. Default values are Cmiss=1.0 and Cfa=0.1 ;
-D Detail -> Write internally organize evaluation corpus and pertinent statistics for debugging purposes. This report, though voluminous, is intended to help researchers debug their internal versions of evaluation code.
-E SubsetFile -> Compute performance excluding source files in the subset definition file. NOTE: Only the first set defined in the subset definition file is used for the filter. All others are ignored.
-j topicrel[:topicrel]* -> Specify alternative topic relevance files via the command line. More than one can be specified by concatenating the file names using a colon ':' separator.
-k FSD_key -> Write the loaded FSD answer key into 'FSD_key'. See the key format below.
-K FSD_key -> Rather than generate the FSD answer key from the TDT corpus, Read the 'FSD_key' file, and use it. See the key format below.
-m func -> Set the system output to story mapping function to either 'majority' or 'impulse'. Default is 'majority'.
-P P(topic) -> Use P(topic) for the detection cost function. Default is 0.02.
-r Report -> Write the summary report to 'Report' rather than STDOUT, the default.
-s -> Use all available speedups. Currently, the only speedups involve NOT using 'nsmgls' and 'SGMLS.pm' parser and PERL libraries to read the TDT3 Corpus files.
-v num -> Set the verbose level to 'num'. Default 1.
==0 None, ==1 Normal, >5 Slight, >10 way too much, >15 not even funny
-o LBL -> Treat the stories annotated as level 'LBL' as on topic. The default value is 'YES', but the value can also be 'YES+BRIEF', or 'BRIEF'.
Options that apply to the DET plots:
-d DETfile -> Create a DET plot in GNUplot format with the file root 'DETfile'. The program makes several files each with additional extensions. The file 'DETfile'.plt is a command file for GNUplot and can be printed using the command "gnuplot 'DETfile'.plt | lpr".
-t title -> Set the title line for the plot to 'title'.
-p -> Produce a single story-weighted DET line trace.
-w -> Produce a single topic-weighted DET line trace. This is the default option.
-n -> Add 90% confidence intervals to the topic-weighted DET graph.
-T Topic_regexp -> Restrict the topics for which the index files are created using the PERL regular expression 'Topic_regexp'. The default is to use all occurring annotated topics. There are a number of macro names for defined topic sets that may be used in place of regular expressions, they are:
Macro name Equivalent Expression
TDT98_Train 20+([1-9]|[12][0-9]|3[0-7])
TDT98_DevTest 20+(3[89]|[45][0-9]|6[0-6])
TDT98_EvalTest 20+(6[7-9]|[89][0-9]|100)
TDT99_mul 20+(1|2|5|7|13|15|20|23|39|44|48|57|70|71|76|85|88|89|91|96)

First Story Detection Task Index File Format

The index file for the first story detection file is as follows. The first line in the index file is a header line. The line indicates the TDT3 task, 'FIRST_STORY' in this case, and the type of pointer used to mark segment changes. Each subsequent data record in the file will identify a source file to process. These records will have one field and be separated with a newline.

The BNF structure of the first story detection index file is:

<HEADER_LINE>
<SOURCE>
<SOURCE>
...

Where:

<HEADER_LINE> :== # FIRST_STORY <POINTER_TYPE>
<POINTER_TYPE> :== RECID | TIME
A POINTER_TYPE is the type of boundaries to be output by the system. The possible values are RECID for text stream first story detection or TIME for audio first story detection.
<SOURCE> :== TDT3 corpus filename with directory and extension names relative to the TDT3 root directory specified on the command line.
The following is an excerpt from a first story detection task index file.
# FIRST_STORY RECID
tkntext/19980301_0553_0719_APW_ENG.tkn
tkntext/19980301_1014_1116_APW_ENG.tkn
tkntext/19980301_1403_1529_APW_ENG.tkn
tkntext/19980301_2139_2341_APW_ENG.tkn
tkntext/19980302_0630_0651_APW_ENG.tkn

First Story Detection Task System Output Format

The task of First Story Detection task is to detect in a chronologically order stream of stories the first story that discusses an event. FSD systems will perform this task by recording information about these hypothesized points in a file, one record for each putative discussion of a new event, written in ASCII format. The first record in this file will contain four fields which specify information that applies globally to the whole file. Comment lines begin with the '#' character, and any text following a '#' is ignored. The exception to this rule is the first comment line can optionally contain a long description of the system under test. This description will be included in the scoring report along side the <SYSTEM> value described below. After the initial comment line, blank lines are treated as comments.

The BNF structure of the first story detection system output file is:

<SYSTEM> <BOUNDARIES> <DEF_PERIOD> <POINTER_TYPE>
<DECISION_LINE>
<DECISION_LINE>
...

Where:

<SYSTEM> :== System is an alphanumeric character string that uniquely identifies the system being tested. (E.g., CDM_P05-8.v37)
<BOUNDARIES> :== Boundaries is either YES or NO, where YES indicates that story boundaries are supplied to the system being tested and NO indicates that they are not.
<DEF_PERIOD> :== The deferral period before before decisions are made. Permissible values defined by the TDT3 test specification.
<POINTER_TYPE> :== RECID | TIME
POINTER_TYPE is the type of boundaries to be output by the system. The possible values are RECID for text stream first story detection or TIME for audio first story detection.
<DECISION_LINE> :== <SOURCE> <POINTER> <DECISION> <SCORE>
<SOURCE> :== TDT3 corpus filename with directory and extension names relative to the TDT3 root directory specified on the command line.
<POINTER> :== POINTER is a hypothesized decision point. For text files, Pointer is the index number of the first word in the hypothesized segment, in the range {1, 2, . . .}. For audio files, Boundary is the time of the beginning of the segment {0.0, . . .}. (It isn't necessary to output the beginning of the first segment.) The hypothesized Boundary points must occur in chronological order.
<DECISION> :== Decision is either YES or NO, where YES indicates that the system believes that the story being processed discusses the target topic, and NO indicates not.
<SCORE> :== Score is a real number which indicates how confident the system is that the story being processed discusses the associated topic. More positive values indicate greater confidence.
The following is an excerpt from a first story detection system output file.
# Degenerate first story detection results, Errors, RECID
Errors NO 10 RECID
asrtext/19980307_1130_1200_CNN_HDL.asr 1 NO 0.209696400470383
asrtext/19980307_1130_1200_CNN_HDL.asr 66 NO 0.0652269882628861
asrtext/19980307_1130_1200_CNN_HDL.asr 254 NO 0.00793375334496783
asrtext/19980307_1130_1200_CNN_HDL.asr 318 NO 0.000951696625507463
asrtext/19980307_1130_1200_CNN_HDL.asr 675 NO 2.05484371633647e-10
asrtext/19980307_1130_1200_CNN_HDL.asr 762 YES 0.999999999524702

FSD Key File Format

The key file is an alternative technique for specifying the stories to be evaluation over during the scoring process. The file is a simple, sgml-style file that specifies the topics, for which topic weighted performance is computed, and the stories for each topic that are considered to be targets or nontargets, the first story for the topic or subsequent on-topic stories respectively.

The file begins with a the tag <FSD_KEY> and ends with it's closing tag, </FSD_KEY>. Within the <FSD_KEY> tag, the following hierarchy exists:

<FSD_KEY> Contains <TOPIC> +;
<TOPIC> Contains ( <TARG_STORY> | <NONTARG_STORY> ) +
The attributes for each tag type are:
<FSD_KEY> None
<TOPIC> id -> The topic identifier string, e.g. 1,33,45
<TARG_STORY> docno -> The TDT Document number
<NONTARG_STORY> docno -> The TDT Document number
The following is an example FSD key file:
<FSD_KEY>
<TOPIC id=1>
<TARG_STORY docno=APW19980194.0286>
<NONTARG_STORY docno=NYT19980104.0111>
<NONTARG_STORY docno=NYT19980104.0206>
</TOPIC>
<TOPIC id=15>
<NONTARG_STORY docno=NYT19980108.0793>
</TOPIC>
</FSD_KEY>

Example Output Report

-------------------------------------------------------------------------------
-------------  TDT First Story Detection Task Performance Report  ------------

Command line:   ../../TDT3eval_v1.1/TDT3fsd.pl -R fsd_root -i ....
Execution Date: Fri May 28 08:05:20 EDT 1999

Story Weighted First Story Detection: P(Miss)       = 0.3333
                                      P(Fa)         = 0.2222
                                      Cfsd          = 0.2244

Topic Weighted First Story Detection: P(Miss)       = 0.3333
                                      P(Fa)         = 0.2500
                                      Cfsd     *    = 0.3020

  *   Primary Evaluation Metric


First Story Detection Performance Calculations:

    Ref.                               | # Corr  # Miss   # Corr   # Fa     ||                        
    Topic           # First  # !First  | First   First    ! First  ! First  || P(Miss)  P(Fa)   Cfsd  
    -----           -------  --------  | ------  -------  ------   -------  || -------  -----   ------
    71                 1        2      |    1       0        2        0     || 0.0000   0.0000  0.0000
    74                 1        2      |    1       0        1        1     || 0.0000   0.5000  0.4900
    76                 1        2      |    0       1        2        0     || 1.0000   0.0000  0.0200
    77                 1        1      |    0       1        1        0     || 1.0000   0.0000  0.0200
    78                 1        1      |    1       0        0        1     || 0.0000   1.0000  0.9800
    79                 0        1      |   --      --        1        0     ||   --     0.0000    --  
    80                 1        0      |    1       0       --       --     || 0.0000     --      --  
    -----           -------  --------  | ------  -------  ------   -------  || -------  -----   ------
    Sums               6        9      |    4       2        7        2     ||                        
    Story Weighted                     |                                    || 0.3333   0.2222  0.2244
    Topic Weighted                     |    0.7     0.3      1.2      0.3   || 0.3333   0.2500  0.3020


LDC TDT Corpus Root Dir: fsd_root
Index File:              sys5.ndx
System Output File:      sys5.fsd
Pointer Type:            RECID
Cost Function Parameters:
              P(topic) = 0.02
              Cmiss    = 1
              Cfa      = 1

Detection Performance Calculations:
    System Identifier:   det_boundary_DEF=10
    Deferral Period:     10

System Output to Story Mapping Function:  'majority'

----------------  End of TDT Detection Task Performance Report  ---------------
-------------------------------------------------------------------------------