(algorithm)
Definition: A measure of similarity between two strings. The Jaro measure is the weighted sum of percentage of matched characters from each file and transposed characters. Winkler increased this measure for matching initial characters, then rescaled it by a piecewise function, whose intervals and weights depend on the type of string (first name, last name, street, etc.).
Generalization (I am a kind of ...)
string matching with errors.
See also Levenshtein distance, phonetic coding.
Note: For "piecewise function", see the definition in MathWorld or answers from Dr. Math.
Author: PEB
Winkler and Thibaudeau paper abstract (HTML) and full paper (PDF).
William E. Winkler and Yves Thibaudeau, An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census, Statistical Research Report Series RR91/09, U.S. Bureau of the Census, Washington, D.C., 1991.
Matthew A. Jaro, UNIMATCH: A Record Linkage System: User's Manual, Technical Report, U.S. Bureau of the Census, Washington, D.C., 1976.
Matthew A. Jaro, Advances in Record-linkage Methodology a Applied to Matching the 1985 Census of Tampa, Florida, Journal of the American Statistical Association, 89:414-420.
If you have suggestions, corrections, or comments, please get in touch with Paul E. Black.
Entry modified 29 November 2006.
HTML page formatted Wed Nov 29 13:55:11 2006.
Cite this as:
Paul E. Black, "Jaro-Winkler", in
Dictionary of Algorithms and Data
Structures [online], Paul E. Black, ed.,
U.S. National Institute of
Standards and Technology. 29 November 2006. (accessed TODAY)
Available from: http://www.nist.gov/dads/HTML/jaroWinkler.html