File:  aldistsm-1.2/README.txt
Date:  4/16/99

This directory contains the source code, documentation, and
example/test data for my software to do split/merge alignment
of two strings of words (or phones).

The main program, "tald3e_sm" (compiled from "tald3e_sm_export.c"),
reads a file of pairs of strings to be aligned, marked as "REF:"
and "HYP:", optionally preceded by a line giving an i.d. for the
sentence, marked as "ID:", and writes out the aligned strings
(with i.d.) into another file.

All of the functions and data structures are made available here
so that you can package it differently if you want to.  The code
was written in ANSI C on a SUN Workstation and checked for
errors using Purify (TM).  In addition, the latest version was
tested by running a parallel test between it and the previous
version, using 2000 REF/HYP pairs from a recent evaluation. 

The installation procedure is basic: just execute

   gcc -lm -o tald3e_sm tald3e_sm_export.c

To test the program, execute the batch procedure "test1.bat".
(With the phone files supplied, you will get a warning message
about "the specification of a lower pcodeset" being overridden,
which you can ignore.)  When it has run successfully, look at
the output report file "test1.rpt" to check parameter values
and the split/merges found, and compare the output alignment
file "x_sm.aln" with the version produced here at
NIST ("x_sm_NIST.aln") to see if identical results were
produced.  There should also be no substantial differences
between the output file "test1.rpt" produced by your run and
the matching file "test1_NIST.rpt" produced here.

New words found in the input alignment but not in the
word-level pcode file "gpengw1.pcd" will get a pronunciation
produced by a default text-to-phone function.  These new
word prons will be written out in a file named "*.ADD",
and although it's not necessary, you can correct them
and merge them back into your version of "gpengw1.pcd"
if you want a little more accuracy the next time you run.

The 5th command-line parameter is a figure-of-merit
threshold used by the split/merge detecting logic
that you may want to play with.  For each candidate
split or merge, a figure of merit is calculated that
is proportional to the likelihood that the split/merge
is genuine; if this is greater than the threshold,
the split/merge candidate is accepted.  In previous
work (first citation below), a value of 1.7 was found
to be optimal.

Version 1.2 (aka release 6) has been generalized so that
it can be used to align strings of phones instead of
words.  To see how to do this, execute the batch
procedure "test2.bat" as you did "test1.bat" and examine
it and its outputs.  The program should work on any string
of phones that are defined in the data file "phon1ax.pcd". 
If you want to see what phonological distance the program
calculates between the REF and HYP strings, either run
it with the last command line parameter (debug msg level)
set to 1, or change the line

   "if (db_level > 0) printf("%s Dph=%d, Err rate = %f\n",..."
to "if (db_level > -1) printf("%s Dph=%d, Err rate = %f\n",..."

  and re-compile.  The splitting and merging logic doesn't
work very well (yet) with phone strings, but the phonological
distance calculated looks pretty good.

One additional change made in release 6 is the addition to
the list of allowable final consonant clusters of the one
heard in the formal pronounciation of "attempts".


For a few more details, see:

"Further Studies in Phonological Scoring", by W. M. Fisher,
 J. Fiscus, A. Martin, D. S. Pallett, and M. A. Przybocki,
 Proceedings of the Spoken Language Systems Technology Workshop,
 January 22-25, 1995, Austin, TX (sponsored by ARPA), Morgan
 Kaufmann Publishers, San Francisco, ISBN 1-55860-374-3, pp. 181-186.
 
"Better Alignment Procedures for Speech Recognition Evaluation",
 by W. M. Fisher and J.G. Fiscus, IEEE International Conference
 on Acoustics, Speech, and Signal Processing 1993, pp. II-59 - II-62.

 Version 1.2, released 4/16/99, includes code changes and additional
 data needed for the program to do alignment of strings composed
 of either words or phones, and a miniscule tweak of the syllabification
 parameters.

 Version 1.1, released 1/21/99, includes code changes to do dynamic
 instead of static memory allocation, to handle "words" up to 256
 characters long and "sentences" up to about 2800 words long (if
 enough memory is available), and a few minor bug fixes.

 Version 1.0, released 6/08/98, incorporates bug fixes, some cosmetic
 changes, command-line parameter changes, and a greatly improved default
 TTP function that's used to get the prons for words not in the dictionary.
 It uses a file of TTP rules and decides to resort to "spell-mode"
 pronunciation (as in "FBI") iff the pronunciation produced by the general
 rules is unpronounceable (because it can't be syllabified). 

 Version 0.1, released 12/05/97, includes fixes to one bug that
 would be triggered if two candidate s/m's are close enough
 to conflict, and some memory allocation and initialization problems.
 In addition, the alignment test case that triggered this bug
 has been added to the proof test set.

 - Bill Fisher / NIST (william.fisher@nist.gov)	
