TDT3eval Revision History
TDT3eval Revision History
Version 2.4, released August 23, 2004


The follow is the revision history for the TDT3eval package. The TDT3eval package is the successor to the TDT2eval package. The TDT2 revision log contains the history of the TDT2eval package.

Version 1.0, Released May 21, 1999

Programs have been converted to following TDT3 filenames:
TDT2 Name TDT3 Name
TDT2BuildIndex.pl TDT3BuildIndex.pl
TDT2seg.pl TDT3seg.pl
TDT2trk.pl TDT3trk.pl
TDT2det.pl TDT3det.pl
TDT2.pm TDT3.pm
The following changes have been made since TDT2eval Version 0.6.
TDT3.pm
  • Added "use strict" to improve error checking.
  • Converted all glob aliases to references.
  • Added code to the topic relevance table loader to make sure there is only one entry for a topic/document annotation. Version 0.6 did not do this check and as a consequence, some documents were improperly ignored during scoring. This will slightly change the detection performance.
  • Corrected problems with the automatic boundary scoring procedures. The mapping function to map story decisions onto the reference story segmentation was making mistakes.
    1. On the next to the last hyp story, if it occurred before the last reference story boundary, that mapping was incorrect,
    2. For a reference story with multiple hypothesis story units, (at least three stories), the mapped hypothesis cluster was always the last story within the reference story boundaries.
  • Changed the tracking scoring structure to be more memory efficient.
TDT3seg.pl
  • Added "use strict" to improve error checking.
  • Converted all glob aliases to references.
  • Relaxed the checking of system output deferral times. Warnings are printed rather that fatal errors.
  • Modified the accepted index file format. There are additional fields that specify the source file and language conditions of the test.
  • Modified the default evaluation frame size to be 75 IFF the evaluation source language is Mandarin and the POINTER type is a recid.
  • For the evaluation of Mandarin ASR, the RECIDs are in terms of words, but the evaluation frame size for Mandarin is in terms of characters. Therefore, the program converts the word-based RECIDs into character-bases RECIDs by reading the tokenized text file.
TDT3det.pl
  • Added "use strict" to improve error checking.
  • Converted all glob aliases to references.
  • Added a -S option to define independent subsets over which scores are computed.
  • Modified the summary report it include story weighted Pmiss, Pfa and Cdet.
  • Added the computation of Cost Weighted YDZ metrics.
  • Designated primary evaluation metric.
  • Relaxed the checking of system output deferral periods. Warnings are printed rather that fatal errors.
  • Modifications to TDT3.pm changed the scoring of the 1998 TDT2 evaluation. This document documents the changes in detection performance between TDT2eval V0.6 and TDT3eval V1.0
  • Defined MACROs for common topic sets. These macros can be used with the -T option to specify topic sets.
TDT3trk.pl
  • Added "use strict" to improve error checking.
  • Converted all glob aliases to references.
  • Modified the storage structure for the scores of each document to be more memory efficient. The structure is an array, rather than a hash list, but there is a hash table to tell what values the array cells correspond to.
  • Designate primary evaluation metric.
  • Topic weighted DET curves are now an option.
  • The problem with mapping automatic story boundaries onto reference story boundaries does change the results of the '98 TDT2 evaluation. The overall scores for the CMU no boundary test changed as follows:
    Metric TDT2eval_v0.6 TDT3eval_v1.0
    Story Weighted Pmiss 0.4050 0.4138
    Story Weighted Pfa 0.0041 0.0040
    Story Weighted Ctrack 0.0122 0.0122
    Topic Weighted Pmiss 0.3820 0.3850
    Topic Weighted Pfa 0.0047 0.0046
    Topic Weighted Ctrack 0.0122 0.0122
TDT3BuildIndex.pl
  • Mostly re-written to support the TDT3 Evaluation.
  • Defined MACROs for common topic sets. These macros can be used with the -T option to specify topic sets.

Version 1.1, Released June 9, 1999

TDT3.pm
  • Corrected the activation of the "speedup" code.
TDT3seg.pl
  • Minor documentation change
  • Corrected default Cost of False Alarm to 0.3
TDT3det.pl
  • Minor documentation changes
TDT3trk.pl
  • Minor documentation changes
TDT3fsd.pl
  • First Introduction.
TDT3BuildIndex.pl
  • Cleaned up the residual problems with the FSD keys being generated.
  • Limited the number of FSD index files to only the english data.
  • Corrected the use of the topic regular expression.
  • Modified the calls to load the story boundaries and topic relevance tables to try a number
  • Added the generation of the evaluation auxiliary information file.

Version 1.2, Released June 17, 1999

TDT3.pm
  • No Changes.
TDT3seg.pl
  • No Changes.
TDT3det.pl
  • No Changes.
TDT3trk.pl
  • Corrected a problem with scoring the mandarin machine translation data. (.mtr and .mta files weren't handled properly.)
TDT3fsd.pl
  • No Changes.
TDT3BuildIndex.pl
  • No Changes.

Version 1.3, Released July 23, 1999

TDT3.pm
  • Reduced the number of computations for the DET plots. The previous DET plots computed a point for each document. The new plot computes a point when one of two conditions occur:
    1. the next highest scoring point is an "on topic", or
    2. the next highest scoring point is "off topic", but the following point is "on topic".
    Condition 2 preserves the "step" structure of the graphs so that the graphs look identical.

    When using 1.3 score tracking results of the first 1999 dry run, there was a 66% reduction in DET computation runtime, for an overall reduction of 55%. In terms of disk space for the DET plots, there was a 99% reduction from 128Mb to 1.8Mb. In terms of RAM during execution, version 1.3 used 300 Mb, while version 1.2 used 718 Mb

TDT3seg.pl
  • Corrected a problem with the -C option.
  • Changed the documentation to say the default Cost of a false alarm is 0.3.
TDT3det.pl
  • Changed the cost of a false alarm to 0.1.
  • Corrected a problem with the -C option.
TDT3trk.pl
  • Changed the cost of a false alarm to 0.1.
  • Corrected a problem with the -C option.
  • The trial ensemble DET plot generation procedure was modified, see changes to 'TDT3.pm'.
TDT3fsd.pl
  • Changed the cost of a false alarm to 0.1.
  • Corrected a problem with the -C option.
TDT3BuildIndex.pl
  • No Changes.

Version 1.4, Released August 12, 1999

- Added the FAQ to the documentation.
TDT3.pm
  • modified 'Add_Topic_Into_TDTref()' function to try first try to find a topic relevance in the TDT root directory, (by prepending the directory name), or second by just trying the name. This permits specification of the topic relevance file on the command line.
TDT3seg.pl
  • Incorporated changes provided by Paul van Mulbregt that computes precision and recall if the '-p option is given and the evaluation frame size is set to 1 via the '-f 1' option.
  • Added computations and reports of normalized cost.
  • Fixed a bug to handle .as1 transcripts.
TDT3det.pl
  • Added the '-j' option to specify a topic relevance file from the command line.
  • Sorted the output report numerically by the reference topic.
  • Modified the DET graphs in that topic specific performance points that are off the scale are indicated by squares on the graph border.
  • Added computations and reports of normalized cost.
  • Fixed a bug to handle .as1 transcripts.
TDT3trk.pl
  • Added the '-j' option to specify a topic relevance file from the command line.
  • Added computations and reports of normalized cost.
  • Fixed a bug to handle .as1 transcripts.
TDT3fsd.pl
  • Added the '-j' option to specify a topic relevance file from the command line.
  • Added the -p, -w, -n options to manipulate the DET plots.
  • Added computations and reports of normalized cost.
  • Fixed a bug to handle .as1 transcripts.
  • Added the option -k and -K to write and read FSD key files.
DetectionScore.pl
  • Debut of this script.
TDT3BuildIndex.pl
  • Added support for the story link detection task.
  • Added the -S option to specify the ASR file extensions.

Version 1.5, Released September 20, 1999

TDT3.pm
  • No Changes.
TDT3seg.pl
  • Modest changes to accomodate new corpus release format.
TDT3det.pl
  • Modest changes to accomodate new corpus release format.
TDT3trk.pl
  • Modest changes to accomodate new corpus release format.
TDT3fsd.pl
  • Modest changes to accomodate new corpus release format.
DetectionScore.pl
  • No Changes.
TDT3BuildIndex.pl
  • Modest changes to accomodate new corpus release format.
  • Modified the sampling function for the randomized story link detection files. The previous version used a 14 day window for the off-topic storyies. This version used an exponential function.

Version 1.6, Released September 24, 1999

TDT3.pm
  • No Changes.
TDT3seg.pl
  • Minor changes to parse and check filenames in the system output.
TDT3det.pl
  • Minor changes to parse and check filenames in the system output.
TDT3trk.pl
  • Minor changes to parse and check filenames in the system output.
  • Releaxed the constraint that Nt must be 1, 2, 4, 8, or 16.
  • Modified the reader of system output files. The files can now be compress a priori and the decompressed during the evaluation run. The -Z option permits specification of the uncompression utility.
TDT3fsd.pl
  • Minor changes to parse and check filenames in the system output.
DetectionScore.pl
  • Modified the scores reported as link detection, not story link detection.
TDT3BuildIndex.pl
  • Included PRI, MNB and NBC in the list of broadcast news sources.

Version 1.7, Released October 27, 1999

TDT3.pm
  • Modified the topic macros defined in $main::TopicSets
TDT3seg.pl
  • Reversed some the previous checks to the filenames in the system output.
TDT3det.pl
  • Reversed some the previous checks to the filenames in the system output.
  • Added a check to make sure each source file appears once in the system output.
  • re-worked the code to read a subset definition file.
TDT3trk.pl
  • Reversed some the previous checks to the filenames in the system output.
  • Added code to make the program read compressed system output files.
  • Added code to ingore the system output of a partitial source file if it is the first source file for a topic, and the first decision begins after recid or time of 1.
  • Added code to score source file subsets, via the -U argument.
  • Added the option -u to produce subset-conditioned DET graphs.
TDT3fsd.pl
  • Reversed some the previous checks to the filenames in the system output.
  • Added a check to make sure each source file appears once in the system output.
DetectionScore.pl
  • No Changes.
TDT3BuildIndex.pl
  • Changed the tracking index generation proceedures. The code randomly selects the target stories for a partiular language IF there are more that Ntmax possible training stories before the division between test and training data.
  • Corrected an undiscovered bug in the generation of randomized link detection index files. The previous version selected an entirely different set of documents for the different source conditions, (nwt+bnasr and nwt+bnman). The modifications results in exactly the same set of test documents regardless of source condition.
  • Changed Story Link Detection to Link Detection.

Version 1.8, Released January 21, 2000

TDT3.pm
  • Slight modification to load the tdt3 topic relavance table.
TDT3seg.pl
  • No Changes.
TDT3det.pl
  • No Changes.
TDT3trk.pl
  • No Changes.
TDT3fsd.pl
  • No Changes.
DetectionScore.pl
  • Corrected the -t option to make it work.
  • Modified the key file reader. Pairs marked as "OTHER", (rather that TARGET or NONTARGET), are excluded from the performance calculations. The pairs are checked form, and thus still required, in the system output.
TDT3BuildIndex.pl
  • Added a command line options -l and -L to use the link database file to generate the index files.
  • Modified the handling of the ABC data to default to not use either 'ccap' or 'fdch' extensions.
  • The link annotation database now has three possible values, "Y", "N", or "BRIEF" which are translated into "TARGET", "NONTARGET", and "OTHER" respectively in the outputted answer key.

Version 1.9, Released August 14, 2000

TDT3.pm
  • Repaired a bug in 'Find_system_score_for_doc()'. It prevented scoring using automatic segmentations.,
  • Added code to produce a threshold plot. The plot's x axis is the system-provided decision scores. Plotted values are Pmiss vs score, Pfa vs score, normalize cost version score, along with the minimum cost point located on the graph with it's coordinates indicated.
TDT3seg.pl
  • No Changes.
TDT3det.pl
  • No Changes.
TDT3trk.pl
  • Added the normalized tracking cost to the minimum DET cost printout.
  • Threshhold plots are produced when DET plots are produced.
TDT3fsd.pl
  • Added the normalized tracking cost to the minimum DET cost printout.
DetectionScore.pl
  • Added the normalized tracking cost to the minimum DET cost printout.
  • Threshhold plots are produced when DET plots are produced.
TDT3BuildIndex.pl
  • Added the -y option. Tracking index files include the certified off-topic story designation.
  • Link detection index files are generated for English as well as Mandarin and Multilingual test conditions

Version 2.0, Released August 2

TDT3.pm
  • No Changes.
TDT3seg.pl
  • No Changes.
TDT3det.pl
  • No Changes.
TDT3trk.pl
  • No Changes.
TDT3fsd.pl
  • No Changes.
DetectionScore.pl
  • No Changes.
TDT3BuildIndex.pl
  • No Changes.

Version 2.1, Released Nov. 3, 2000

TDT3.pm
  • No Changes.
TDT3seg.pl
  • No Changes.
TDT3det.pl
  • Corrected the code that interpreted the annotations. It missinterpreted stories annotated with NO to be annotated with YES.
TDT3trk.pl
  • Corrected the code that interpreted the annotations. It missinterpreted stories annotated with NO to be annotated with YES.
  • Corrected a warning generated during the building of a threshhold plot.
  • Modified the computation of topic weighted tracking cost. Previously, the formulation used the average of the tracking cost for each topic. This was acceptable unless topics existed with no "on-topic" stories in the test set. The formulation was changed to use the average of the topics Pmiss statistics and the average of the topic Pfa statistics. This way, if topic has no on-topic test stories, the topic weighted Pmiss disregards the topic while the topic weighted Pfa does not.
TDT3fsd.pl
  • Corrected a warning generated during the building of a threshhold plot.
DetectionScore.pl
  • Permitted compressed input files for the key and system input files.
TDT3BuildIndex.pl
  • Modified the code that loads in the certified offtopics to not die if the database is missing.
  • Modified the tracking index files to shorten the file names, and to generate the experiment control files.
  • Modified the link detection index files to better sample the monolingual and crosslingual link space.

Version 2.2, Released May 29, 2001

TDT3.pm
  • No Changes.
TDT3seg.pl
  • No Changes.
TDT3det.pl
  • No Changes.
TDT3trk.pl
  • Fixed an error in the subset scoring table. The problem was in dealing with topics that had no on-topic stories.
  • Modified the program to accept decision coordinants using docno's in the system output file. This implementation assumes the docno's are the reference docno's. Future releases may use alternative boundary files.
  • Modified a significant portion of the script to fix problems illuminated by 'use strict'
TDT3fsd.pl
  • No Changes.
DetectionScore.pl
  • No Changes.
TDT3BuildIndex.pl
  • No Changes.

Version 2.3, Released October 22, 2001

TDT3.pm
  • Modified the Rescale_Trail_data functions to work.
  • Modified Load_Boundaries_Into_TDTRef() to accept a subset definition file so that a list of source files could be excluded from scoring.
TDT3seg.pl
  • Added the ability to exclude a set of source files from scoring via '-E'.
TDT3det.pl
  • Added the ability to exclude a set of source files from scoring via '-E'.
TDT3trk.pl
  • Added the ability to produce minimum DET plots using the -a option
  • Added the ability to exclude a set of source files from scoring via '-E'.
TDT3fsd.pl
  • Added the ability to exclude a set of source files from scoring via '-E'.
DetectionScore.pl
  • Added the -S option.
TDT3BuildIndex.pl
  • Modified calls to Load_Boundaries_Into_TDTRef().

Version 2.4, Released August 23, 2004

TDT3.pm
TDT3seg.pl
TDT3det.pl
  • No Change
TDT3trk.pl
  • No Change
TDT3fsd.pl
  • Added the option -o to specify what topic labels are on-topic.
DetectionScore.pl
  • Added a detailed output file that is a merge of the system and key file
TDT3BuildIndex.pl
  • Added a flag to output either HTD or Topic Detection index files.
  • Segmentation index files only output IFF there are BN source files
  • Link Detection index files include mono-lingual topics as long as there are enough on-topic stories in a language.
  • -M filters either NWT or BN files out of the source files. <\LI>

Version 2.5, Released September 1, 2004

TDT3BuildIndex.pl
  • Added the Story List Index file for the HTD task.

Version 2.6, Released October 22, 2004

DetectionScore.pl
  • Added code to calculate TREC utility for supervised adaptive tracking.
  • Corrected calculation of PMiss, Cost and Norm(Cost) when there are no ontopic storyies for a topic. The topic weighted Cost and Norm(Cost) were changes also.
TDT3trk.pl
  • Added code to calculate TREC utility for supervised adaptive tracking.

Possible Future Changes

Known Bugs

The TDT3BuildIndex.pl program only builds the default index files.

The segmentation scoring procedures based on time pointers has the following problem. When reference story has a duration in time, but not in lexical tokens, the scoring program ingores that stories boundaries. Consequently, the denominators for calculation P(miss) and P(fa) are slightly off (18 seconds in 260000 seconds). Since the discrepancy is small, and the current TDT evaluation will not be using time pointers, this problem is left for future resolution.