TDT3trk.pl User Manual
TDT3trk.pl User Manual

TDT3 Tracking Task Scoring


Usage:

TDT3trk.pl -R Rootdir -I TDT3_trk_index_list <Options> Trk_system_output_list

The 'TDT3trk.pl' program will score the output generated by a TDT3 tracking system. The program requires the directory path, 'Rootdir', to the LDC's TDT3 Test corpus. The corpus must be in the same structure as released by the LDC, with all file formats identical to their original form.

The tracking task requires independent system runs for each target topic. Since there are a number of topics in each test, file lists are used to specify indexes and system outputs. A file list is an ASCII file of filenames, each name separated with a newline. Comment lines begin with the '#' character and any text after the '#' is ignored.

The program reads 'TDT3_trk_index_list', a file list containing the names of TDT3 Tracking index files provide with the test corpus. The index files are used to load the appropriate data from the corpus and to verify the completeness of the Tracking system output.

After loading the reference data, the 'Trk_system_output_list' file list, which contains filenames of tracking system output, one file per topic, is loaded and scored. The order if topic results files does not matter, the program matches topic specific results to the topic specific index. The output format of the tracking system generated output is specified below.

In order to score a tracking system, the decisions output by the Automatic Topic Tracking System (ATTS) must be mapped onto the story boundaries annotated in the reference corpus. The TDT3trk.pl program can use two methods, majority vote or impulse vote for determining this mapping. The two methods differ in the meaning implied by decision marker output by the ATTS.

Majority Vote Each story decision is computed as the majority of decisions over all words (or time) in that story, and each story score is computed as the average score over all words (time) in that story. Ties in computing the decisions are broken by choosing the decision with the maximum score. Impulse Vote Each story decision is made by selection the decision with the maximum score within the boundaries of the story. In the event no decisions were made for the story, the decision is 'NO' with a score of -infinity.

The following <Options> are recognized by the program:

-C Cmiss:Cfa -> Set the cost of a missed detection and the cost of a false alarm to 'Cmiss' and 'Cfa' respectively. These numbers are used in the tracking cost function. Default values are Cmiss=1.0 and Cfa=0.1 ;
-D Detail -> Write internally organize evaluation corpus and pertinent statistics for debugging purposes. This report, though voluminous, is intended to help researchers debug their internal versions of evaluation code.
-E SubsetFile -> Compute performance excluding source files in the subset definition file. The application of this filter is global, in that the source file is ignored prior to establishing subsets defined by the -U option. NOTE: Only the first set defined in the subset definition file is used for the filter. All others are ignored.
-j topicrel[:topicrel]* -> Specify alternative topic relevance files via the command line. More than one can be specified by concatenating the file names using a colon ':' separator.
-m func -> Set the system output to story mapping function to either 'majority' or 'impulse'. Default is 'majority'.
-P P(topic) -> Use P(topic) for the tracking cost function. Default is 0.02.
-r Report -> Write the summary report to 'Report' rather than STDOUT, the default.
-s -> Use all available speedups. Currently, the only speedups involve NOT using 'nsmgls' and 'SGMLS.pm' parser and PERL libraries to read the TDT3 Corpus files.
-S -> Skip the source files in the system output that were NOT loaded via the index files. Using this option in conjunction with modified/reduced index files provides the capability of computing performance statistics of subsets of an evaluation set. See the FAQ entry regarding performance statistics of tracking evaluation subsets.
-U SSDFile -> Use the Source file Subset Definition file 'SSDFile' to generate performance statistics on subsets of the tracked source files. The subsets are independent, and unlimited in number.
-v num -> Set the verbose level to 'num'. Default 1.
==0 None, ==1 Normal, >5 Slight, >10 way too much, >15 not even funny
-Z uncompress -> Specify the command for uncompressing the system output files prior to scoring. The decompression applies to ONLY the system decision files, not the file lists. The command is executed by opening a pipe command if the system output file ends with a .Z or .gz suffix. The command is required to read a compressed stream from STDIN, and write the uncompressed stream to STDOUT.
-o LBL -> Treat the stories annotated as level 'LBL' as on topic. The default value is 'YES', but the value can also be 'YES+BRIEF', or 'BRIEF'.
Options that apply to the DET plots:
-d DETfile -> Create a DET plot in GNUplot format with the file root 'DETfile'. The program makes several files each with additional extensions. The file 'DETfile'.plt is a command file for GNUplot and can be printed using the command "gnuplot 'DETfile'.plt | lpr". The default plot produces a line trace for each line in the 'Trk_system_output_list' list. See the discussion below on DET Plots for additional information. The options below modify this.
-t title -> Set the title line for the plot to 'title'.
-n -> If the topic weighted DET trace is plotted, the 90% confidence interval will also be plotted when this option is used.
-p -> Produce a single story-weighted DET line trace for all the system output files in 'Trk_system_output_list'. This will only be made if Nt is constant for all the system outputs. *
-w -> Produce a single topic-weighted DET line trace for all the system output files in 'Trk_system_output_list'. This will only be made if Nt is constant for all the system outputs. *
-e -> Produce the default output of a line trace for each line in 'Trk_system_output_list'. *
-f -> Force the program to make a pooled plot even if Nt isn't constant.
-u 1|Many -> Also produce a DET plot for the subsets defined via the -U option. Either '1' or 'many' may be used as an argument. The argument '1' produces one DET plot containing a single pooled or topic weighted DET line, (depending on the use of the -p and -w options), for each subset. The root filename for this plot will be 'DETFile'_subsets.

The 'Many' argument builts separate DET plot file for each subset. The plotted traces are controlled by the -p, -t, -e, and -n options. The root filename for the plots will be 'DETFile'_subset=<SubsetHeading>.

Tracking Task Index File Format

The index file for the tracking evaluation has four parts, they are: 1) a header line, 2) training story designation, 3) discriminative training story designation, and 4) test source file designation.

The BNF structure of the tracking index file is:

<HEADER_LINE>
<TRAIN_STORY>
<TRAIN_STORY>
...
<DISCRIMINATE_TRAIN_STORY>
<DISCRIMINATE_TRAIN_STORY>
...
<SOURCE>
<SOURCE>
...

Where:

<HEADER_LINE> :== # TRACKING <POINTER_TYPE> TOPIC=N
'N' is the topic number under test.
<POINTER_TYPE> :== RECID | TIME
A POINTER_TYPE is the type of boundaries to be output by the system. The possible values are RECID for text stream tracking or TIME for audio tracking.
<TRAIN_STORY> :== # Training_docno=Nt <DOCNO> <DOCFILE>
Nt is the ordinal number of training story. For Nt = X training conditions, use the stories numbered 1 through and including the Nt = X story.
<DOCNO> :== Document number from the Corpus
<DOCFILE> :== Corpus filename with directory and filename extensions relative to the corpus root directory of the tokenized file that contains the training story's text.
<SOURCE> :== <DOCFILE> <STARTPOSITION>
<DOCFILE> is the tokenized text file in the same format as above.
<STARTPOSITION> :== A RECID or TIME indicating the starting point of the evaluation. System outputs before this position are ignored.
The following is an excerpt from a tracking task index file.

# TRACKING RECID TOPIC=39
#
# Training stories
# Training_docno=1 APW19980304.0300 tkntext/19980304_0555_0642_APW_ENG.tkn
# Training_docno=2 NYT19980303.0324 tkntext/19980303_2112_2200_NYT_NYT.tkn
# Training_docno=3 PRI19980303.2000.0432 tkntext/19980303_2000_2100_PRI_TWD.tkn
# Training_docno=4 CNN19980303.1130.0639 tkntext/19980303_1130_1200_CNN_HDL.tkn
# Training_docno=5 NYT19980302.0439 tkntext/19980302_2052_2146_NYT_NYT.tkn
# Training_docno=6 PRI19980302.2000.3319 tkntext/19980302_2000_2100_PRI_TWD.tkn
# Training_docno=7 PRI19980302.2000.2038 tkntext/19980302_2000_2100_PRI_TWD.tkn
...
#
# Discriminate_Training_docno=1 APW19980301.0161 tkntext/19980301_0553_0719_APW_ENG.tkn
# Discriminate_Training_docno=2 APW19980301.0171 tkntext/19980301_0553_0719_APW_ENG.tkn
...
asrtext/19980302_1830_1900_ABC_WNT.asr 1
asrtext/19980304_1130_1200_CNN_HDL.asr 1
asrtext/19980304_1600_1630_CNN_HDL.asr 1

Tracking Task System Output Format

The Topic Tracking task is to hypothesize points in the source stream where the target topic is discussed. Topic tracking systems will perform this task by outputting information about these hypothesized points to a file, one record for each putative discussion of the target topic, written in ASCII format. The first record in this file will contain five fields which specify information that applies globally to the whole file. Comment lines begin with the '#' character, and any text following a '#' is ignored. The exception to this rule is the first comment line can optionally contain a long description of the system under test. This description will be included in the scoring report along side the <SYSTEM> value described below.

The BNF structure of the segmentation system output file is:

<HEADER_LINE>
<DECISION_LINE>
<DECISION_LINE>
...

Where:

<HEADER_LINE> :== <SYSTEM> <BOUNDARIES> <Nt> <TOPIC> <POINTER_TYPE>
<SYSTEM> :== System is an alphanumeric character string that uniquely identifies the system being tested. (E.g., CDM_P05-8.v37)
<BOUNDARIES> :== Boundaries is either YES or NO, where YES indicates that story boundaries are supplied to the system being tested and NO indicates that they are not.
<Nt> :== Number of training topics used.
<TOPIC> :== TOPIC is the topic id under test.
<POINTER_TYPE> :== RECID | TIME
POINTER_TYPE is the type of boundaries to be output by the system. The possible values are RECID for text stream detection or TIME for audio detection.
<DECISION_LINE> :== <SOURCE> <POINTER> <DECISION> <SCORE>
<SOURCE> :== TDT3 corpus filename with directory and extension names relative to the TDT3 root directory specified on the command line.
<POINTER> :== POINTER is a hypothesized decision point. For text files, Pointer is the index number of the first word in the hypothesized segment, in the range {1, 2, . . .}. For audio files, Boundary is the time of the beginning of the segment {0.0, . . .}. (It isn't necessary to output the beginning of the first segment.) The hypothesized Boundary points must occur in chronological order.
<DECISION> :== Decision is either YES or NO, where YES indicates that the system believes that the story being processed discusses the target topic, and NO indicates not.
<SCORE> :== Score is a real number which indicates how confident the system is that the story being processed discusses the associated topic. More positive values indicate greater confidence.
The following is an excerpt from a tracking system output file.
# Degnerate Tracking Results, Errors, Brecid
corrtrack YES 16 39 RECID
asrtext/19980302_1830_1900_ABC_WNT.asr 1 NO 0.0341789461672306
asrtext/19980302_1830_1900_ABC_WNT.asr 72 NO 0.247018221765757
asrtext/19980302_1830_1900_ABC_WNT.asr 581 NO 0.23052775207907
asrtext/19980302_1830_1900_ABC_WNT.asr 606 NO 0.00333382189273834
asrtext/19980302_1830_1900_ABC_WNT.asr 948 NO 0.919592500664294
asrtext/19980302_1830_1900_ABC_WNT.asr 1019 NO 0.93581769708544
asrtext/19980302_1830_1900_ABC_WNT.asr 1092 NO 0.471104943193495
asrtext/19980302_1830_1900_ABC_WNT.asr 1186 NO 0.0925928736105561

Example Output Report

-------------------------------------------------------------------------------
--------------------  TDT Tracking Task Performance Report  ------------------


Story Weighted (Pooled) Tracking: P(Miss)       = 0.0000
                                  P(Fa)         = 0.0991

Topic Weighted Tracking:          P(Miss)       = 0.0000
                                  P(Fa)         = 0.0939

Tracking Performance Calculations:

    Filename        Topic  Train   Test    Corr    Corr    Miss    F/A     Pct.    Pct.  
                           Story   Story   Det.    ! Det.  Story   Story   Miss    F/A   
    --------        -----  -----   ------  ------  ------  ------  ------  ------  ------
    trk_nwt_39.trk  39     16        1200      11    1070       0     119  0.0000  0.1001
    trk_nwt_42.trk  42     16          59       0      54       0       5  0.0000  0.0847
    trk_nwt_44.trk  44     16         126       2     112       0      12  0.0000  0.0968
    ========        =====  ======  ======  ======  ======  ======  ======  ======  ======
    Sums                             1385      13    1236       0     136                
    Means                             461       4     412       0      45  0.0000  0.0939

Execution parameters:

LDC TDT Corpus Root Dir: ../../..
Index File list:         trk_nwt_indexes
    Index Files:             ../indexes_devtest/trk_nwt_44.ndx
                             ../indexes_devtest/trk_nwt_39.ndx
                             ../indexes_devtest/trk_nwt_42.ndx
System Output File List: trk_nwt_outputs
    System Output File:      trk_nwt_39.trk  Name: corrtrack  Desc: Degnerate Tracking Results, Errors, Brecid
    System Output File:      trk_nwt_42.trk  Name: corrtrack  Desc: Degnerate Tracking Results, Errors, Brecid
    System Output File:      trk_nwt_44.trk  Name: corrtrack  Desc: Degnerate Tracking Results, Errors, Brecid
Pointer Type:            RECID
System Output to Story Mapping Function:  'majority'

-----------------  End of TDT Tracking Task Performance Report  ---------------
-------------------------------------------------------------------------------
Preparing DET Curve.

DET Plots

In general, DET line traces are computed by sorting a set of decisions, including on-topic and off-topic stories, by score. Using the scores, an artificial decision threshold is swept through the range of scores, and for each story, the probabilities of miss and false alarm are computed. The line trace then is plotted by connecting those points.

This program computes three types of DET line traces, they are by-topic, story-weighted (formerly pooled), and topic-weighted.