TDT3eval FAQ
TDT3eval FAQ


Questions

  1. What are the command lines that NIST used for the evaluations?
  2. How can I make the program run faster?
  3. How do I make topic weighted DET graphs?
  4. How do I compute tracking performance on subsets of an evaluation set.
  5. How do I compute topic detection performance on subsets of an evaluation set.
  6. How do I manually control the evaluated documents in the FSD evaluation.
  7. How do I specify which topics the detection and FSD evaluations score.

Answers

  1. ANSWER TO: What are the command lines that NIST used for the evaluations?

  2. ANSWER TO: How can I make the evaluations run faster? Each of the scoring modules have an option '-s' that sets the program to run for speed. The option disables the use if SGML parsers to read in the TDT data base, thus sidestepping corpora validation steps. What could go wrong? It is possible that a corrupt TDT corpus would be undetected. The author's suggestion is to run a test set through the programs without the -s option once to validate the corpus, and then use the -s option thereafter since the corpus has been verified once.

  3. ANSWER TO: How do I make topic weighted DET graphs? Topic weighted DET graphs are only supported for the Tracking and First Story Detection evaluation tasks. First, use the '-d DETFILE' option to generate a DET plot, and then add the '-w' option to generate a topic-weighted DET trace. NIST generates the topic weighted DET graphs for the evaluation. The command lines above provide an example.

  4. ANSWER TO: How do I compute tracking performance on subsets of an evaluation set. There are two steps to accomplish this operation:

    1. Build a set of topic index files including only the source files that you want to evaluate.
    2. Run TDT3trk.pl using the new index files, adding the command line option -S.

    The -S option tells the program to ignore any system outputs for source files that have not been specified in the index file. The other options, such as the DET plots, apply as normal.

  5. ANSWER TO: How do I compute topic detection performance on subsets of an evaluation set. A simple modification of the index files will not accomplish a subsetting operation (such as for the tracking evaluation.) The detection scoring proceedure first "maps" each reference topic cluster to a least cost hypothesized topic cluster. If the subsetting is done prior to the mapping, then the mapped clusters will change drastically between subsets. Therefore, the detection scoring program accepts a source file subset definition file via the -S SUBSET_FILE" option. The subset file defines any number of (possibly overlapping) subsets of source files. The link above goes to the TDT3det manual page description of the source file subset file.

    The net effect of using the -S option is the generate additional tables in the scoring report. The DET plot will not reflect additional performance points in light of this option's use.

  6. ANSWER TO: How do I manually control the evaluated documents in the FSD evaluation. The -K FSD_Key option of the TDT3fsd.pl program specifies an FSD key file which designates the stories to be evaluated. The key file can be generated according to the documented specifications, or automaically using 'TDT3fsd.pl' itself via the -k FSD_Key option. The '-k' option dumps the currently loaded FSD key, so one could, generate an answer key for an evaluation set, using the '-k' option, modify the output key file, and then rescore the evaluation set using the modified answer key using the '-K FSD_Key' option.

  7. ANSWER TO: How do I specify which topics the detection and FSD evaluations score. By default, the detection and first story detection evaluation scripts score all topics for which there are on-topic documents in the test collection. Using the '-T regexp' option in the commands for TDT3det. and TDT3fsd.pl, one can limit to evaluated topics. The 'regexp' arguement is a PERL regular expression that is 'matched' against the topic ids. These expressions can become complicated, so there are 4 pre-programmed macros for the common topic sets. The macros are as follows:

    Macro name Equivalent Expression
    TDT98_Train 20+([1-9]|[12][0-9]|3[0-7])
    TDT98_DevTest 20+(3[89]|[45][0-9]|6[0-6])
    TDT98_EvalTest 20+(6[7-9]|[89][0-9]|100)
    TDT99_mul 20+(1|2|5|7|13|15|20|23|39|44|48|57|70|71|76|85|88|89|91|96)

    A typical use of the macros would be to use the TDT98 training and devtest topics from the TDT2 corpus. The command line argument for this case would be: '-T TDT98_train|TDT98_DevTest'.