| TDT2seg.pl V0.2 |
- Modified the segmentation scoring to be MUCH faster. Removed the old
'delta' based code in favor of the new. Runtimes went from:
| Source | v0.1 (sec) |
v0.2 (sec) | Rel. Improvement |
| nw | 840 | 67 | 92% |
| bn | 166 | 37 | 78% |
| bsr | 164 | 37 | 77% |
- During the above re-code, a bug was found in the previous delta-based
scorer. The bug resulted in the final evaluation frame to not be
checked. The bug had a neglible effect on scores, (something like
0.000001 for P(miss) and/or P(fa)).
- Disregard non-story regions during scoring
- Eval Frame size a command line arg
- Blank lines after the initial comment lines in the system output file are treated as comments.
- The index file format was changed in that file names are
explicitly indicate the directory and extension names.
- Added computation of Cseg measure.
- Examples updated.
|
| TDT2trk.pl V0.2 |
- Added DET plot outputs, options -d, -a, -e, -f
- Modified nomenclature to story rather than document
- Reversed report output so that important numbers are at top
- Eval function that maps system output to reference topics
now uses point markers that mark the beginning of a
score/decision point which is continued to the next marker.
The default mapping function is the average over words.
- Small speed up in the evaluation function. Runtimes on error example
set went from 76 sec. to 67 sec. A small gain, but runtime will
increase linearly with the number of system output points.
- Update manual page and reports, refer to story not document.
- Extensively modified the tracking index file. Changes include:
adding a list of discriminate training stories in the training epoch,
a start recid was added to the source file record, and the
full pathname within the TDT2 corpus is specified for the testing
source file.
- Modified the scorer to exclude partial source files from scoring based
on the start record number in the index file
- Modify program to handle new index file format.
- Blank lines after the initial comment lines in the system output file are treated as comments.
- Update examples.
- Corrected a problem that breif documents were excluded from all topic scorings.
rather than the topic for which it was judged brief.
- Added computation of Ctrack measure.
|
| TDT2det.pl V0.2 |
- Blank lines after the initial comment lines in the system output file are treated as comments.
- stories marked BRIEF, or stories marked YES for multiple topics are excluded from scoring.
- Correct percent false alarm to be (#fa)/(Nstory-(#stories on topic given the topic)).
- Eval function that maps system output to reference topics
now uses point markers that mark the beginning of a
score/decision point which is continued to the next marker.
The default mapping function is the average over words.
- Ref to hyp cluster mapping function optimized by a
cost function, command line configurable.
- Modified nomenclature to story rather than document.
- Allow reference topic cluster to map to NULL Hypothesis cluster.
- Added a DET plot capability
- Updated examples.
|
| TDT2.pl V0.2 |
- Added Min/Max and Find_system_score_for_doc() functions.
- Added DET ploting functions, ppndf(), write_gnuplot_DET_header(),
write_tics(), and Compute_DET_points().
|
| TDT2BuildIndex.pl V0.1 |
- This is a new script, the script generates evaluation index files
given a file list.
|