TDT3 Index files

TDT3 Index files
Date: Fri Aug 25 14:29:51 EDT 2000


This directory contains TDT3 index files. This directory and HTML file was automatically generated by TDT3BuildIndex.pl, Version 2.0. The command executed to generate this file was: /data/data2/TDT/Software/TDT3eval_v2.0/TDT3BuildIndex.pl -t 4 -n 2 -v 3 -s -S English=as1,Mandarin=as0 -L LnkDB -T TDT99_mul -R /data/data2/TDT/Dryrun2000/TDT2_release3.1.patch1 -r 393826261 -f index.flist -O . -a ccap -y 2000 The index files define the processing sequence of tokenized text files and other required evaluation information. See the TDT3 Evaluation Specification for a description of the index files.

Auxiliary Information File

The evaluation specification declares certain side information to be available to the automatic systems. This information is contained in the auxiliary inforamtion file aux_info.ndx

Segmentation Index Files

Source Condition Source Language Index Filename
BNews ASR Transcripts English seg_SR=bnasr_TE=eng,nat.ndx
Mandarin seg_SR=bnasr_TE=man,nat.ndx
BNews Manual Transcripts English seg_SR=bnman_TE=eng,nat.ndx
Mandarin seg_SR=bnman_TE=man,nat.ndx

Detection Index Files

Source Condition Source Language Content Language Index Filename
NWT + BNews ASR Trans. Multilingual Native det_SR=nwt+bnasr_TE=mul,nat.ndx
English det_SR=nwt+bnasr_TE=mul,eng.ndx
Mandarin Native det_SR=nwt+bnasr_TE=man,nat.ndx
English det_SR=nwt+bnasr_TE=man,eng.ndx
English Native det_SR=nwt+bnasr_TE=eng,nat.ndx
English Not Defined by the Eval. Spec.
NWT + BNews Manual Trans. Multilingual Native det_SR=nwt+bnman_TE=mul,nat.ndx
English det_SR=nwt+bnman_TE=mul,eng.ndx
Mandarin Native det_SR=nwt+bnman_TE=man,nat.ndx
English det_SR=nwt+bnman_TE=man,eng.ndx
English Native det_SR=nwt+bnman_TE=eng,nat.ndx
English Not Defined by the Eval. Spec.

Link Detection Index Files

Source Language Content Language Source Condition Index/Key Filenames
Multilingual Native NWT + BNews ASR Trans. ./lnk_SR=nwt+bnasr_TE=mul,nat.ndx ./lnk_SR=nwt+bnasr_TE=mul,nat.key
NWT + BNews Manual Trans. ./lnk_SR=nwt+bnman_TE=mul,nat.ndx ./lnk_SR=nwt+bnman_TE=mul,nat.key
English NWT + BNews ASR Trans. ./lnk_SR=nwt+bnasr_TE=mul,eng.ndx ./lnk_SR=nwt+bnasr_TE=mul,eng.key
NWT + BNews Manual Trans. ./lnk_SR=nwt+bnman_TE=mul,eng.ndx ./lnk_SR=nwt+bnman_TE=mul,eng.key

First Story Detection Index Files

Source Condition Source Language Content Language Index Filename
NWT + BNews ASR Trans. English Native fsd_SR=nwt+bnasr_TE=eng,nat.ndx
NWT + BNews Manual Trans. English Native fsd_SR=nwt+bnman_TE=eng,nat.ndx

Tracking Index Files

There is only one test language condition for the tracking evaluation, which is multilingual tracking. The variations are on broadcast source, test content language, and training story source language. For each evalution test and training condition, there is a individual index file for each test topic. Due to the large number of index files, all tracking Index files are stored in a single directory, 'trk_ndx', and experiment control files identify which topic index files consitute an evalulation. (Note this is a new format as of August, 2000).
Source Condition TEST TRAIN Nt Experiment Control Files
Source Language Source Language Content Language
NWT + BNews ASR Trans. MultiLingual Native English Nt=1 trk_SR=nwt+bnasr_TR=eng_TE=mul,nat_Nt=1.ctl
Nt=2 trk_SR=nwt+bnasr_TR=eng_TE=mul,nat_Nt=2.ctl
Nt=4 trk_SR=nwt+bnasr_TR=eng_TE=mul,nat_Nt=4.ctl
Nt=V trk_SR=nwt+bnasr_TR=eng_TE=mul,nat_Nt=V.ctl
Mandarin Nt=1 trk_SR=nwt+bnasr_TR=man_TE=mul,nat_Nt=1.ctl
Nt=2 trk_SR=nwt+bnasr_TR=man_TE=mul,nat_Nt=2.ctl
Nt=4 trk_SR=nwt+bnasr_TR=man_TE=mul,nat_Nt=4.ctl
Nt=V trk_SR=nwt+bnasr_TR=man_TE=mul,nat_Nt=V.ctl
English English Nt=1 trk_SR=nwt+bnasr_TR=eng_TE=mul,eng_Nt=1.ctl
Nt=2 trk_SR=nwt+bnasr_TR=eng_TE=mul,eng_Nt=2.ctl
Nt=4 trk_SR=nwt+bnasr_TR=eng_TE=mul,eng_Nt=4.ctl
Nt=V trk_SR=nwt+bnasr_TR=eng_TE=mul,eng_Nt=V.ctl
Mandarin Nt=1 trk_SR=nwt+bnasr_TR=man_TE=mul,eng_Nt=1.ctl
Nt=2 trk_SR=nwt+bnasr_TR=man_TE=mul,eng_Nt=2.ctl
Nt=4 trk_SR=nwt+bnasr_TR=man_TE=mul,eng_Nt=4.ctl
Nt=V trk_SR=nwt+bnasr_TR=man_TE=mul,eng_Nt=V.ctl
NWT + BNews Manual Trans. MultiLingual Native English Nt=1 trk_SR=nwt+bnman_TR=eng_TE=mul,nat_Nt=1.ctl
Nt=2 trk_SR=nwt+bnman_TR=eng_TE=mul,nat_Nt=2.ctl
Nt=4 trk_SR=nwt+bnman_TR=eng_TE=mul,nat_Nt=4.ctl
Nt=V trk_SR=nwt+bnman_TR=eng_TE=mul,nat_Nt=V.ctl
Mandarin Nt=1 trk_SR=nwt+bnman_TR=man_TE=mul,nat_Nt=1.ctl
Nt=2 trk_SR=nwt+bnman_TR=man_TE=mul,nat_Nt=2.ctl
Nt=4 trk_SR=nwt+bnman_TR=man_TE=mul,nat_Nt=4.ctl
Nt=V trk_SR=nwt+bnman_TR=man_TE=mul,nat_Nt=V.ctl
English English Nt=1 trk_SR=nwt+bnman_TR=eng_TE=mul,eng_Nt=1.ctl
Nt=2 trk_SR=nwt+bnman_TR=eng_TE=mul,eng_Nt=2.ctl
Nt=4 trk_SR=nwt+bnman_TR=eng_TE=mul,eng_Nt=4.ctl
Nt=V trk_SR=nwt+bnman_TR=eng_TE=mul,eng_Nt=V.ctl
Mandarin Nt=1 trk_SR=nwt+bnman_TR=man_TE=mul,eng_Nt=1.ctl
Nt=2 trk_SR=nwt+bnman_TR=man_TE=mul,eng_Nt=2.ctl
Nt=4 trk_SR=nwt+bnman_TR=man_TE=mul,eng_Nt=4.ctl
Nt=V trk_SR=nwt+bnman_TR=man_TE=mul,eng_Nt=V.ctl

Subset Definitions Files

For the tracking and detection tasks, the evaluation conditions involve pooling source texts from languages. The subset definition files below provide a way to compute performance statistics on multiple, independent 'subsets' of an evaluation run. The 'standard' divisions are to divide the data by source texts, Newswire and Broadcast News, and by the test source language, English or Mandarin.

Currently, only the tracking and detection evaluation scripts support the subset definition file. To use a subset definition file, add the command line argument '-U SubsetFile' to the tracking evaluation script 'TDT3trk.pl', or for the detection evaluation script 'TDT3det.pl', add the command line option '-S SubsetFile'.

Source Condition Test Source Language Test Content Language Sourcefile Subset Definition Filename
NWT + BNews ASR Trans.
or
NWT + BNews Manual Trans.
Multilingual Native
or
English
Subsets_TE=mul.ssd
NWT + BNews ASR Trans.
or
NWT + BNews Manual Trans.
Mandarin Native
or
English
Subsets_TE=man.ssd
NWT + BNews ASR Trans.
or
NWT + BNews Manual Trans.
English Native
or
English
Subsets_TE=eng.ssd